Du

Horaire -

Lieu Amphithéâtre Hexagone, Campus Universitaire de Luminy 1309 Marseille

IaH, Thèses et HDR

Generation of Facial Nonverbal Behavior for Socially Interactive Agents: A Convolutional Generative Adversarial Approach

PhD Thesis Defense. Thesis supervision: Magalie Ochs – Professor, Aix-Marseille University (Supervisor); Nicolas Sabouret – Professor, Université Paris-Saclay, LISN (Co-supervisor); Brian Ravenet – Associate Professor (HDR), Université Paris-Saclay, LISN (Co-advisor)

Orateur : Alice Delbosc

Jury

  • Stefan Kopp – Professor, Bielefeld University (Reviewer)
  • Jonas Beskow – Professor, KTH Royal Institute of Technology (Reviewer)
  • Catherine Pelachaud – Professor, Sorbonne University (Examiner)
  • Rachel McDonnell – Professor, Trinity College Dublin (Examiner)
  • Zerrin Yumak – Associate Professor, Utrecht University (Examiner)
  • Stéphane Ayache – Professor, Aix-Marseille University (Guest member)
  • Sébastien Biaudet – Chief Technology Officer, DAVI, The Humanizers (Guest member)

Abstract

To communicate, humans naturally combine gestures, gaze, head movements, and facial expressions during face-to-face interaction. Socially Interactive Agents (SIAs) aim to reproduce these multimodal behaviors to facilitate human-machine communication. Among nonverbal cues, facial behaviors are particularly critical: they contribute to intelligibility, naturalness, affective expression, and impression formation, but can also trigger uncanniness when inappropriate or poorly synchronized. This thesis focuses on the automatic generation of believable facial nonverbal behaviors for SIAs, encompassing head movements, gaze direction, and facial expressions. 

Several challenges must be addressed to achieve this goal, beginning with the joint generation of facial modalities in a manner consistent with their natural coordination in human communication. The first contribution of this thesis is FaceGen, an encoder-decoder model based on convolutional generative adversarial networks, designed to jointly synthesize head motion, gaze, and FACS-based facial expressions during speaking phases. FaceGen generates facial nonverbal signals directly from the speech signal, which are then used to animate a virtual agent. Trained and evaluated on the TRUENESS corpus, featuring professional actors enacting dyadic interactions involving ordinary sexism and racism, the model is validated using both objective and subjective evaluations. Results show that our modeling choices significantly enhance the perceived believability of the agent and its coordination with speech. 

A second challenge lies in modeling how affective and interactional factors shape facial behavior, while enabling explicit control over the affective attitude expressed by the agent. The thesis addresses this through FaceAttGen, an extension of FaceGen formulated as a conditional generative model that produces affective facial nonverbal behaviors during both speaking and listening phases. FaceAttGen is original in its ability to be conditioned on social attitudes while generating facial behaviors that remain affectively appropriate to the unfolding interaction context. Using a semisupervised learning strategy, the model learns to reproduce two contrasted social attitudes: hot anger and conciliation. Objective evaluations validate the architectural extensions introduced in this model, and subjective studies confirm its ability to shape the affective variability of the generated behaviors. 

A further contribution of the thesis concerns the development of an objective evaluation framework that better aligns with human perceptual judgments of believability and appropriateness than the commonly used objective metrics in the field. To this end, we propose an evaluation methodology that combines multiple metrics into a composite score. Results from a perceptual study, compared with the objective measures, show that this composite framework correlates more strongly with human judgments than existing metrics and supports the optimization of model architectures and hyperparameters.  Finally, the ethical dimension of generated behaviors is examined, with a particular focus on gender bias. After demonstrating the persistence of such biases both in real data and in the outputs of generative models, the thesis introduces FairGenderGen, a model that generates facial nonverbal behaviors from speech while attenuating gender bias through gradient-reversal domain adaptation.

Publications

  • Communication dans un congrès

    Alice Delbosc, Magalie Ochs, Nicolas Sabouret, Brian Ravenet, Stéphane Ayache. Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent. 25th ACM International Conference on Multimodal Interaction (ICMI’23), Oct 2023, Paris, France. ⟨10.1145/3610661.3616547⟩. ⟨hal-04206768⟩

  • Communication dans un congrès

    Alice Delbosc, Magalie Ochs, Nicolas Sabouret, Brian Ravenet, Stéphane Ayache. Mitigation of gender bias in automatic facial non-verbal behaviors generation. ICMI ’24: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, Nov 2024, San Jose Costa Rica, France. pp.284-292, ⟨10.1145/3678957.3685732⟩. ⟨hal-04725479⟩

    CPUCognition Perception et Usages

    Année de publication

    Disponible en libre accès

  • Communication dans un congrès

    Alice Delbosc, Magalie Ochs, Nicolas Sabouret, Brian Ravenet, Stéphane Ayache. Conflict management training in the workplace through simulation with socio-affective embodied conversational agent. WACAI 2024 – Workshop sur les “Affects, Compagnons Artificiels et Interactions” (ACAI), Jun 2024, Bordeaux, France. ⟨hal-04842566⟩

    CPUCognition Perception et Usages

    Année de publication

    Disponible en libre accès

  • Communication dans un congrès

    Alice Delbosc, Marjorie Armando, Nicolas Sabouret, Brian Ravenet, Stéphane Ayache, et al.. Analyzing gender bias in the non-verbal behaviors of generative systems. The first workshop on Discrimination at the International Conference on Intelligent Virtual Agent (IVA), Sep 2024, Glasgow, United Kingdom. ⟨hal-04725441⟩

    CPUCognition Perception et Usages

    Année de publication

    Disponible en libre accès

Contact

Lieu de l'événement