M3

Models, Methods, and Multilingualism (M3)

Coordination : Gilles ADDA

The research focus of the Models, Methods, and Multilingualism team is on developing models and methods to help both the discovery of fundamental properties of language and the implementation of efficient systems to process it. We are interested in language in all its dimensions and modalities but strongly emphasize the multilingual dimension. The methods and models developed by the team are diverse by nature: computational (neural models, stochastic or symbolic methods), linguistic (language typology, linguistic diversity, and universals, affects), or societal (accessibility, nudges, language preservation, processing of underresourced languages and dialects). One of the team’s common aims is to relate language universals to the characteristics of language diversity and variation within a unified vision of linguistic and statistical (“automatic processing”) modeling of languages.

We can illustrate the team’s activity through a structured set of themes and sub-themes:

Universals in multilingual language modeling

Keywords: Linguistic diversity and universality in modeling; Representation of oral languages; Universal phonetic modeling and representation; Unified multilingual modeling and automatic identification of idiomaticity; Large multilingual and multimodal language models; Generative AI; Universal and cultural models of affects; Syntax of oral languages; Quantitative typology; Comparable corpora; Accessibility; Evaluation and resources; Multilingual generic systems: speech recognition, text generation, speech synthesis.

Methods and models for under-resourced languages

Keywords: Documentation of under-resourced languages; Scientific policies for endangered languages, Ethical and societal impact; Automatic processing of under-resourced languages; Massively multilingual models and interlingual transfer; Portability from a well-resourced to an under-resourced language.

Machine learning for NLP

Keywords: Machine learning and inference algorithms for structured prediction; Weak or unsupervised learning; Continuous learning; Representation learning and meta-learning; Learning in context of affective interactions.

Corpus linguistics, interlingual and intralingual variation

Keywords: Accents, dialects, and varieties: dialectometry (geoprosody) and linguistic cartography; Speaking styles; Variation of prosodic codes between languages and cultures (symbolic codes and socially coded attitudes); Expressive and multimodal prosody: illocutions, attitudes, social affects; Voice, voice strength, vocal quality, social uses.

Modeling of affective behaviors

Keywords: Automatic learning and detection of affective behaviors from paralinguistic and linguistic cues; Adaptation of large acoustic and linguistic models to emotion detection; Detection of abnormal behaviors and nudges in interaction; Ethical and societal impact of affects modeling and nudges.

Coordination

  • Sciences et Technologies des Langues

    M3

    Adda Gilles

    Head of M3

    Engineer and researcher

Membres de l’équipe

Publications

  • Thèse

    Hui-Syuan Yeh. Prompt-based Relation Extraction for Pharmacovigilance. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG097⟩. ⟨tel-04968043⟩

    STL, STL

    Year of publication

    Available in free access

  • Rapport

    Sylvain Bouveret, Aurélie Bugeau, Frenoux Emmanuelle, Julien Lefevre, Laurent Lefèvre, et al.. Quiz sur les impacts environnementaux du numérique. EcoInfo. 2025, pp.1-5. ⟨hal-04960328v1⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Camille Challant. Représentation formelle avec AZee et contraintes grammaticales pour la langue des signes française. Théorie et langage formel [cs.FL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG086⟩. ⟨tel-04957486⟩

    STL, STL

    Year of publication

    Available in free access

  • Article dans une revue

    Zheng Zhang, Brian Denton, Xiaolan Xie. Branch and Price for Chance-Constrained Bin Packing. INFORMS Journal on Computing, 2020, 32 (3), pp.547-564. ⟨10.1287/ijoc.2019.0894⟩. ⟨hal-04941861⟩

    ILES, STL

    Year of publication

  • Communication dans un congrès

    Simon Devauchelle, David Doukhan, Lucas Ondel Yang, Benjamin Élie, Albert Rilliard. Estimation automatique de caractéristiques acoustiques pour l’étude diachronique du français oral dans les médias. Atelier DAHLIA: DigitAl Humanities and cuLtural herItAge: data and knowledge management and analysis, Claudia Marinica; Fabrice Guillet; Florent Laroche, Jan 2025, Strasbourg, France. ⟨hal-04938377⟩

    STL, STL

    Year of publication

    Available in free access

  • Article dans une revue

    Rémi Uro, David Doukhan. Pendant le confinement, le temps de parole des femmes a baissé à la télévision et à la radio. La revue des médias, 2020. ⟨hal-04906221⟩

    STL, TLP

    Year of publication

  • Communication dans un congrès

    Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. “Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Apr 2025, Albuquerque, United States. ⟨hal-04938811⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. RNA-TorsionBERT: leveraging language models for RNA 3D torsion angles prediction. Bioinformatics, 2025, 41 (1), pp.btaf004. ⟨10.1093/bioinformatics/btaf004⟩. ⟨hal-04911519⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Marion Ficher, Tom Bauer, Anne-Laure Ligozat. A comprehensive review of the end-of-life modeling in LCAs of digital equipment. International Journal of Life Cycle Assessment, 2024, 30 (1), pp.20-42. ⟨10.1007/s11367-024-02367-x⟩. ⟨hal-04924691⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Atilla Kaan Alkan. Natural Language Processing for Analyzing Messages of Astrophysical Observations. Artificial Intelligence [cs.AI]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG114⟩. ⟨tel-04928511⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. Has AlphaFold3 achieved success for RNAs?. 2025. ⟨hal-04911522⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Léa-Marie Lam-Yee-Mui. Modélisations pour la reconnaissance de la parole à données contraintes. Traitement du signal et de l’image [eess.SP]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG075⟩. ⟨tel-04918814⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. Has AlphaFold 3 achieved success for RNA?. Acta crystallographica Section D : Structural biology [1993-..], 2025, 81 (2), pp.49–62. ⟨10.1107/S2059798325000592⟩. ⟨hal-04919467⟩

    STL

    Year of publication

  • Thèse

    Rémi Uro. Détection et Caractérisation des Interruptions dans les Interactions Orales pour la Description du Comportement des Femmes et des Hommes dans les Contenus Audiovisuels. Informatique [cs]. Université paris saclay, 2024. Français. ⟨NNT : ⟩. ⟨tel-04916505⟩

    STL

    Year of publication

  • Chapitre d'ouvrage

    Philippe Boula de Mareüil, Plínio A. Barbosa. Picos melódicos pretônicos em final de enunciado no português brasileiro: um estudo quantitativo. Dermeval da Hora; Ángela Helmer. Interseções Linguísticas: Estudos Diversos, Líquido Editorial, pp.71-85, 2023, ALFAL, 9786599924804. ⟨hal-04893646⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Douglas Teodoro, Nona Naderi, Anthony Yazdani, Boya Zhang, Alban Bornet. A Scoping Review of Artificial Intelligence Applications in Clinical Trial Risk Assessment. 2025. ⟨hal-04913991⟩

    STL

    Year of publication

  • Pré-publication, Document de travail

    Omar Adjali, Olivier Ferret, Sahar Ghannay, Hervé Le Borgne. Entity-aware cross-modal pretraining for Knowledge-Based Visual Question Answering. 2024. ⟨cea-04910767⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Paritosh Sharma. Sign Language synthesis by a decreasing granularity system from AZee. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG092⟩. ⟨tel-04908078⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Laetitia Biscarrat, David Doukhan, Cyril Grouin. De Loft Story aux Marseillais à Dubaï : apport des méthodes d’analyse automatique pour la description des évolutions du dispositif télévisuel. Colloque ”La téléréalité, entre média, événement et société”, part of 89e Congrès de l’Association canadienne-française pour l’avancement des sciences (ACFAS), Association canadienne-française pour l’avancement des sciences (ACFAS), 2022, Montreal, Canada. ⟨hal-04906923⟩

    STL

    Year of publication

  • Communication dans un congrès

    Laetitia Biscarrat, David Doukhan, Cyril Grouin. De Loft Story aux Marseillais à Dubaï : 20 ans de télé-réalité, 20 ans de sexisme ? Apport des méthodes d’analyse automatique pour une approche comparative. Première journée d’études de l’Arcom, ARCOM, Nov 2022, Paris, France. ⟨hal-04905959⟩

    STL, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Rémi Uro, Marie Tahon, David Doukhan, Albert Rilliard. Comprendre les phénomènes permettant la gestion des tours de parole dans les contenus de médias audiovisuels. Journée commune AFIA-TLH / AFCP – “Extraction de connaissances interprétables pour l’étude de la communication parlée”, Corinne Fredouille; Maëva Garnier; Olivier Perrotin; Marie Tahon, Dec 2023, Avignon, France. ⟨hal-04906679⟩

    STL, TLP

    Year of publication

  • Autre publication scientifique

    Louis Estève, Kaja Dobrovoljc. A new pipeline for measuring diversity across various linguistic levels. 2025. ⟨hal-04886792⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Leticia Rebollo Couto, Albert Rilliard. Variação Pragmática e Diminutivização: intensificação e atenuação de atos expressivos e diretivos para a dublagem de animação em português, espanhol e francês. IV Colloque International VariaR 2024, Université Paul-Valéry Montpellier 3, Jun 2024, Montpellier, France. pp.43-44, ⟨10.3726/978-3-0351-0740-1⟩. ⟨hal-04874595⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Sofiya Kobylyanskaya. Towards multimodal assessment of L2 level : speech and eye tracking features in a cross-cultural setting. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG111⟩. ⟨tel-04900961⟩

    STL

    Year of publication

    Available in free access

  • Poster de conférence

    Leticia Rebollo Couto, Albert Rilliard. Variación pragmática y expresividad negativa: análisis multimodal en datos de doblaje. LingCor2024: Workshop on Spoken Corpus Linguistics, Jul 2024, Vienna, Austria. . ⟨hal-04874470⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Clémentine Bleuze, Fanny Ducel, Karën Fort, Maxime Amblard. Vers la création d’une super-intelligence » : un corpus pour étudier les revendications des articles de TALTraitement Automatique des langues. Journées de lancement LIFT 2, Nov 2024, Orléans, France. ⟨hal-04880335⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Ayoub Hammal, Benno Uthayasooriyar, Caio Corro. Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection. COLING 2025 – 31st International Conference on Computational Linguistics, Jan 2025, Abu Dhabi, United Arab Emirates. pp.1-15. ⟨hal-04877776⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Simon Devauchelle, Albert Rilliard, David Doukhan, Lucas Ondel Yang. Describing voice in French media archives: age and gender effects on pitch and articulation characteristics. XX Convegno Nazionale AISV, LFSAG (Laboratorio di Fonetica Sperimentale “Arturo Genre”) Dipartimento di Lingue e Letterature Straniere e Culture Moderne Università degli Studi di Torino, Feb 2024, Turin (Italie), Italy. ⟨hal-04874662⟩

    STL

    Year of publication

  • Communication dans un congrès

    Donna Erickson, João Antônio De Moraes, Albert Rilliard. Dimensões das atitudes prosódicas entre culturas. V Seminário Internacional de Fonologia, Universidade Federal do Rio de Janeiro, Nov 2024, Rio de Janeiro (BR), Brazil. ⟨hal-04874627⟩

    STL

    Year of publication

  • Communication dans un congrès

    Khanh-An C Quan, Camille Guinaudeau, Shin’Ichi Satoh. Evaluating VQA Models’ Consistency in the Scientific Domain. Multimedia Modelling 2025, Jan 2025, Nara, Japan. ⟨hal-04860239⟩

    STL

    Year of publication

    Available in free access