Master Thesis Defense
Master Thesis Defense
23-11-2023
Primary Advisior: Dr. Fabrizio Nunnari [2]
Examiner: Dr. Patrick Gebhard [2]
Supervisor: Prof. Dr. Antonio KrĂĽger [1][2]
[1] Saarland University, [2] DFKI, German Research Center for Artificial Intelligence, Cognitive Assistants, Affective Computing Group
A sign is being used in citation (dictionary) form. Now the same sign is used in inflected form Contextualized gloss: NICHT
(Heute bis auf Weiteres kein Zugverkehr) Non Inflected Sentence Original Sentence Motion capture data is hard to modify!
It is impossible to capture all combination of inflections
Simax [1]
JASigning[2]
Current avatar animation system
[1] SiMAX, The Sign Language Avatar System [2] Elliott, R., Bueno, J., Kennaway, R., and Glauert, J. Towards the Integration of Synthetic SL Animation with Avatars into Corpus Annotation Tools: 4th Workshop on the Representation and Processing of Sign Languages.
End to end animation generation [1]
Progressive Transformers [2]
[1] sign.mt
[2] Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer International Publishing, 2020.
Animation generated using Motion Capture and Motion Elements [1] [2]
[1] Gibet, S., Courty, N., Duarte, K., and Naour, T. L. The signcom system for data-driven animation of interactive virtual signers: Methodology and evaluation. Ksii Transactions on Internet and Information Systems (2011).
[2] Gibet, S., Lefebvre-Albaret, F., Hamon, L., Brun, R., and Turki, A. Interactive editing in french sign language dedicated to virtual signers: requirements and challenges. Universal Access in the Information Society (2015).
Animation using Vector Based Model [1][2]
[1] Lu, P., & Huenerfauth, M. (2012, June). Learning a vector-based model of American Sign Language inflecting verbs from motion-capture data. In Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies (pp. 66-74).
[2] Huenerfauth, M., Lu, P., & Kacorri, H. (2015, September). Synthesizing and evaluating animations of American sign language verbs modeled from motion-capture data. In Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies (pp. 22-28).
[1] Adamo-Villani, N., & Wilbur, R.B. (2015). ASL-Pro: American Sign Language Animation with Prosodic Elements. InteracciĂłn.
[1] De Martino, J. M., Silva, I. R., Bolognini, C. Z., Costa, P. D. P., Kumada, K. M. O., Coradine, L. C., ... & De Conti, D. F. (2017). Signing avatars: making education more inclusive. Universal access in the information society, 16, 793-808.
Parameters of functions that transform animation data of a sign to contextulize it
[1] Nunnari, F., Mishra, S., & Gebhard, P. (2023, June). Augmenting Glosses with Geometrical Inflection Parameters for the Animation of Sign Language Avatars. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (pp. 1-5). IEEE.
Gloss: NICHT
Input data (Source)
Dictionary
Target Data (Target)
Sentence
Goal: Minimize the error
Error: L2-norm
Error: 0
\[ \begin{aligned} \vec{\Delta}_{Sh} &= \frac{1}{N}\sum^{N} |\vec{X_{t}} - \vec{X_{s}}| \end{aligned} \]
\[ \begin{aligned} \vec{\theta}_{He} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]
\[ \begin{aligned} \vec{\Delta}_{T} &= \frac{1}{N}\sum^{N} |\vec{X_{t}} - \vec{X_{s}}| \end{aligned} \]
\[ \begin{aligned} \vec{\theta}_{T} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]
\[ \begin{aligned} \vec{\theta}_{Ha} &= \vec{R_{s}}^{-1}\vec{R_{t}}\\ \end{aligned} \]
\[ \begin{aligned} \text{Translation}, \vec{T} & = (t_{x}, t_{y}, t_{z})^{T}\\ \text{Rotation}, \vec{R} & = (\theta_{x}, \theta_{y},\theta_{z})^{T}\\ \text{Scale}, \vec{S} & = (s_{x}, s_{y}, s_{z})^{T} \end{aligned} \]
\[ \begin{aligned} \text{Error}, E = \arg\min_{\vec{T}, \vec{R}, \vec{S}} \left\| \vec{X}_{t} - \vec{X}_{s}^{'} \right\|_{2} \end{aligned} \]
(Heute bis auf Weiteres kein Zugverkehr)
Non Inflected Sentence
Inflected Sentence
Original Sentence
Metric One: Mean Per Joint Position Error (MPJPE)
$$ \begin{aligned} E_{\text{MPJPE}} = \frac{1}{N} \sum_{i=1}^{N} \| p_{\text{pred}}(i) - p_{\text{gt}}(i) \|_2 \end{aligned} $$
Metric Two: Mean Per Joint Angular Error (MPJAE)
$$ \begin{aligned} E_{\text{MPJAE}} &= \frac{1}{3N} \sum_{i=1}^{3N} \left| m_{\text{pred}}(i) - m_{\text{gt}}(i) \mod \pm 180 \right| \end{aligned} $$
Metric Three: Normalized Power Spectrum Similarity (NPSS)
$$ \begin{aligned} NPSS &= \frac{\sum_{i} \sum_{j} p_{i, j}*\textrm{emd}_{i,j}}{\sum_{i} \sum_{j} p_{i, j}}\\ \end{aligned} $$
Experts ($N = 6$) who are fluent in German Sign Language
Naturalness
Grammatical Correctness
Understandability
(Heute bis auf Weiteres kein Zugverkehr)
Glosses with Resampled Duration
Glosses with Original Duration