Master Thesis Defense

Inflection of Sign Language Manuals and Non-manuals of the body via Rigid Geometrical Transformations

Press "space" to navigate forward (also can use arrow keys.)
Press "t" to play 3D animations. Yes, it is 3D animation not just video.
You can use your mouse to change the camera angle in 3D animation.
You can use scroll wheel to zoom in and out in 3D animation.
You can change the page zoom by pressing "ctrl" and "+" or "-" keys to adjust the presentation.
Best viewed in Chrome or Chromium based browsers.

Master Thesis Defense

Inflection of Sign Language Manuals and Non-manuals of the body via Rigid Geometrical Transformations

Shailesh Mishra [1][2],

23-11-2023

Primary Advisior: Dr. Fabrizio Nunnari [2]

Examiner: Dr. Patrick Gebhard [2]

Supervisor: Prof. Dr. Antonio Krüger [1][2]

[1] Saarland University, [2] DFKI, German Research Center for Artificial Intelligence, Cognitive Assistants, Affective Computing Group

What is Sign Language?

"Inflection"

The person go home

The person is going home

Inflection in Sign Language

(press "t" to play 3D animation)

A sign is being used in

citation (dictionary) form.

Now the same sign is used in

inflected form

Contextualized gloss: NICHT

Why are the current systems built like this?

(press "t" to play 3D animation)

(Heute bis auf Weiteres kein Zugverkehr)

Non Inflected Sentence

Original Sentence

Motion capture data is hard to modify!

It is impossible to capture all combination of inflections

What is the State of the Art?

Manual/Procedural Animation Generation

Simax [1]

JASigning[2]

Current avatar animation system

[1] SiMAX, The Sign Language Avatar System
[2] Elliott, R., Bueno, J., Kennaway, R., and Glauert, J. Towards the Integration of Synthetic SL Animation with Avatars into Corpus Annotation Tools: 4th Workshop on the Representation and Processing of Sign Languages.

End-to-End Deep Learning Based Animation Generation

End to end animation generation [1]

Progressive Transformers [2]

[1] sign.mt
[2] Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer International Publishing, 2020.

Motion Capture Based Approach

Animation generated using Motion Capture and Motion Elements [1] [2]

[1] Gibet, S., Courty, N., Duarte, K., and Naour, T. L. The signcom system for data-driven animation of interactive virtual signers: Methodology and evaluation. Ksii Transactions on Internet and Information Systems (2011).
[2] Gibet, S., Lefebvre-Albaret, F., Hamon, L., Brun, R., and Turki, A. Interactive editing in french sign language dedicated to virtual signers: requirements and challenges. Universal Access in the Information Society (2015).

Vector Based Model

Animation using Vector Based Model [1][2]

[1] Lu, P., & Huenerfauth, M. (2012, June). Learning a vector-based model of American Sign Language inflecting verbs from motion-capture data. In Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies (pp. 66-74).
[2] Huenerfauth, M., Lu, P., & Kacorri, H. (2015, September). Synthesizing and evaluating animations of American sign language verbs modeled from motion-capture data. In Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies (pp. 22-28).

Prosody Based Methods

[1] Adamo-Villani, N., & Wilbur, R.B. (2015). ASL-Pro: American Sign Language Animation with Prosodic Elements. Interacción.

Timing Inflection

[1] De Martino, J. M., Silva, I. R., Bolognini, C. Z., Costa, P. D. P., Kumada, K. M. O., Coradine, L. C., ... & De Conti, D. F. (2017). Signing avatars: making education more inclusive. Universal access in the information society, 16, 793-808.

Our contribution

Increment on the motion capture based System

Introduction to Inflection Parameters

Parameters of functions that transform animation data of a sign to contextulize it

Classification of the Inflections

Simplified Ideas

Input Data

Gloss: NICHT

Input data (Source)

Dictionary

Target Data (Target)

Sentence

Trajectory Matching

Goal: Minimize the error

Error: L2-norm

Error: 0

Computation of Average Orientation Delta

Computation of Average Translation Offset

Mathematical Formulation

Computation of Shoulder, Torso and Head Parameters

Shoulders

\[ \begin{aligned} \vec{\Delta}_{Sh} &= \frac{1}{N}\sum^{N} |\vec{X_{t}} - \vec{X_{s}}| \end{aligned} \]

Head

\[ \begin{aligned} \vec{\theta}_{He} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]

Torso

\[ \begin{aligned} \vec{\Delta}_{T} &= \frac{1}{N}\sum^{N} |\vec{X_{t}} - \vec{X_{s}}| \end{aligned} \]

\[ \begin{aligned} \vec{\theta}_{T} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]

Orientation

\[ \begin{aligned} \vec{\theta}_{Ha} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]

Trajectory

\[ \begin{aligned} \text{Translation}, \vec{T} & = (t_{x}, t_{y}, t_{z})^{T}\\ \text{Rotation}, \vec{R} & = (\theta_{x}, \theta_{y},\theta_{z})^{T}\\ \text{Scale}, \vec{S} & = (s_{x}, s_{y}, s_{z})^{T} \end{aligned} \]

\[ \begin{aligned} \text{Error}, E = \arg\min_{\vec{T}, \vec{R}, \vec{S}} \left\| \vec{X}_{t} - \vec{X}_{s}^{'} \right\|_{2} \end{aligned} \]

Architecture: Online Sentence Generator

Animation Result

(press "t" to play 3D animation)

(Heute bis auf Weiteres kein Zugverkehr)

Non Inflected Sentence

Inflected Sentence

Original Sentence

Quantitative Results

Metric One: Mean Per Joint Position Error (MPJPE)

$$ \begin{aligned} E_{\text{MPJPE}} = \frac{1}{N} \sum_{i=1}^{N} \| p_{\text{pred}}(i) - p_{\text{gt}}(i) \|_2 \end{aligned} $$

Metric Two: Mean Per Joint Angular Error (MPJAE)

$$ \begin{aligned} E_{\text{MPJAE}} &= \frac{1}{3N} \sum_{i=1}^{3N} \left| m_{\text{pred}}(i) - m_{\text{gt}}(i) \mod \pm 180 \right| \end{aligned} $$

Metric Three: Normalized Power Spectrum Similarity (NPSS)

$$ \begin{aligned} NPSS &= \frac{\sum_{i} \sum_{j} p_{i, j}*\textrm{emd}_{i,j}}{\sum_{i} \sum_{j} p_{i, j}}\\ \end{aligned} $$

Qualitative Result

Preliminary User Study

Experts ($N = 6$) who are fluent in German Sign Language

Naturalness

Grammatical Correctness

Understandability

User Study: Naturalness

User Study: Understandability

User Study: Grammatical Correctness

Key Observation

No facial data
Timing of playback
Extreme torso movements

Limitations

Timing Issue

(press "t" to play 3D animation)

(Heute bis auf Weiteres kein Zugverkehr)

Glosses with Resampled Duration

Glosses with Original Duration

Inflection of Sign Language Manuals and Non-manuals of the body via Rigid Geometrical Transformations

Inflection of Sign Language Manuals and Non-manuals of the body via Rigid Geometrical Transformations

What is Sign Language?

"Inflection"

The person go home

The person go home

The person is going home

Inflection in Sign Language

Why are the current systems built like this?

What is the State of the Art?

Manual/Procedural Animation Generation

End-to-End Deep Learning Based Animation Generation

Motion Capture Based Approach

Vector Based Model

Prosody Based Methods

Timing Inflection

Our contribution

Increment on the motion capture based System

Introduction to Inflection Parameters

Classification of the Inflections

Classification of the Inflections

Simplified Ideas

Input Data

Trajectory Matching

Computation of Average Orientation Delta

Computation of Average Translation Offset

Mathematical Formulation

Computation of Shoulder, Torso and Head Parameters

Shoulders

Head

Torso

Orientation

Trajectory

Architecture: Online Sentence Generator

Animation Result

Quantitative Results

Qualitative Result

Preliminary User Study

User Study: Naturalness

User Study: Understandability

User Study: Grammatical Correctness

Key Observation

Limitations

Timing Issue

Complicated Trajectories

Self Collisons

Future Work

Trajectory Transitions

Recognizing inflections

Accomodation of hold and simultaneous playback

Thank You