Master Thesis Defense

Inflection of Sign Language Manuals and Non-manuals of the body via Rigid Geometrical Transformations

  • Press "space" to navigate forward (also can use arrow keys.)
  • Press "t" to play 3D animations. Yes, it is 3D animation not just video.
  • You can use your mouse to change the camera angle in 3D animation.
  • You can use scroll wheel to zoom in and out in 3D animation.
  • You can change the page zoom by pressing "ctrl" and "+" or "-" keys to adjust the presentation.
  • Best viewed in Chrome or Chromium based browsers.

Master Thesis Defense

Inflection of Sign Language Manuals and Non-manuals of the body via Rigid Geometrical Transformations

Shailesh Mishra [1][2],

23-11-2023

Primary Advisior: Dr. Fabrizio Nunnari [2]

Examiner: Dr. Patrick Gebhard [2]

Supervisor: Prof. Dr. Antonio KrĂĽger [1][2]

[1] Saarland University, [2] DFKI, German Research Center for Artificial Intelligence, Cognitive Assistants, Affective Computing Group

What is Sign Language?

"Inflection"

The person go home

The person go home

The person is going home

Inflection in Sign Language

(press "t" to play 3D animation)

A sign is being used in

citation (dictionary) form.

Now the same sign is used in

inflected form

Contextualized gloss: NICHT

Why are the current systems built like this?

(press "t" to play 3D animation)

(Heute bis auf Weiteres kein Zugverkehr)

Non Inflected Sentence

Original Sentence

Motion capture data is hard to modify!

It is impossible to capture all combination of inflections

What is the State of the Art?

Manual/Procedural Animation Generation

Simax [1]

Procedural Animation

JASigning[2]

Current avatar animation system

End-to-End Deep Learning Based Animation Generation

Procedural Animation

End to end animation generation [1]

Procedural Animation

Progressive Transformers [2]

Motion Capture Based Approach

Procedural Animation

Animation generated using Motion Capture and Motion Elements [1] [2]

Vector Based Model

Procedural Animation

Animation using Vector Based Model [1][2]

Prosody Based Methods

Timing Inflection

Our contribution

Increment on the motion capture based System

Introduction to Inflection Parameters

Parameters of functions that transform animation data of a sign to contextulize it

Classification of the Inflections

Classification of the Inflections

Simplified Ideas

Input Data

Gloss: NICHT

Input data (Source)

Dictionary

Target Data (Target)

Sentence

Trajectory Matching

Goal: Minimize the error

Error: L2-norm

Error: 0

Computation of Average Orientation Delta

Computation of Average Translation Offset

Mathematical Formulation

Computation of Shoulder, Torso and Head Parameters

hand.png

Shoulders

\[ \begin{aligned} \vec{\Delta}_{Sh} &= \frac{1}{N}\sum^{N} |\vec{X_{t}} - \vec{X_{s}}| \end{aligned} \]

Head

\[ \begin{aligned} \vec{\theta}_{He} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]

hand.png
hand.png

Torso

\[ \begin{aligned} \vec{\Delta}_{T} &= \frac{1}{N}\sum^{N} |\vec{X_{t}} - \vec{X_{s}}| \end{aligned} \]

\[ \begin{aligned} \vec{\theta}_{T} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]

Orientation

\[ \begin{aligned} \vec{\theta}_{Ha} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]

Trajectory

\[ \begin{aligned} \text{Translation}, \vec{T} & = (t_{x}, t_{y}, t_{z})^{T}\\ \text{Rotation}, \vec{R} & = (\theta_{x}, \theta_{y},\theta_{z})^{T}\\ \text{Scale}, \vec{S} & = (s_{x}, s_{y}, s_{z})^{T} \end{aligned} \]

\[ \begin{aligned} \text{Error}, E = \arg\min_{\vec{T}, \vec{R}, \vec{S}} \left\| \vec{X}_{t} - \vec{X}_{s}^{'} \right\|_{2} \end{aligned} \]

hand.png

Architecture: Online Sentence Generator

mms.png
a1.png
parser.png
a2.png
a11.png
block.png
a8.png
a3.png
t1.png
json.png
a4.png
config.png
a5.png
targets.png
a6.png
controller.png
a9.png
t2.png
a10.png
glue.png
a12.png

Animation Result

(press "t" to play 3D animation)

(Heute bis auf Weiteres kein Zugverkehr)

Non Inflected Sentence

Inflected Sentence

Original Sentence

Quantitative Results

Metric One: Mean Per Joint Position Error (MPJPE)

$$ \begin{aligned} E_{\text{MPJPE}} = \frac{1}{N} \sum_{i=1}^{N} \| p_{\text{pred}}(i) - p_{\text{gt}}(i) \|_2 \end{aligned} $$

Metric Two: Mean Per Joint Angular Error (MPJAE)

$$ \begin{aligned} E_{\text{MPJAE}} &= \frac{1}{3N} \sum_{i=1}^{3N} \left| m_{\text{pred}}(i) - m_{\text{gt}}(i) \mod \pm 180 \right| \end{aligned} $$

Metric Three: Normalized Power Spectrum Similarity (NPSS)

$$ \begin{aligned} NPSS &= \frac{\sum_{i} \sum_{j} p_{i, j}*\textrm{emd}_{i,j}}{\sum_{i} \sum_{j} p_{i, j}}\\ \end{aligned} $$

Qualitative Result

Preliminary User Study

Experts ($N = 6$) who are fluent in German Sign Language

Naturalness

Grammatical Correctness

Understandability

User Study: Naturalness

User Study: Understandability

User Study: Grammatical Correctness

Key Observation

  1. No facial data
  2. Timing of playback
  3. Extreme torso movements

Limitations

Timing Issue

(press "t" to play 3D animation)

(Heute bis auf Weiteres kein Zugverkehr)

Glosses with Resampled Duration

Glosses with Original Duration

Complicated Trajectories

Self Collisons

Future Work

Trajectory Transitions

Recognizing inflections

Accomodation of hold and simultaneous playback

Thank You