M3

Models, Methods, and Multilingualism (M3)

Coordination : Albert Rilliard

The M3 team is interested in models and methods aimed at the fundamental description and automatic processing of natural language and its multilingual dimension. This involves the development of models and methods able to process, extract, and describe relevant characteristics of these systems from datasets. We consider languages in all their uses, with special focus on under-resourced languages and comparative approaches to language description (multilingual approaches). The team questions the interactions between models and systems: How does a model help our understanding of languages? How do sets of languages differ, and at what levels? How do we use generative models to produce controlled instances of given phenomena? The implementation of explainable and adapted models allows interaction between disciplines and promotes interdisciplinary work (computer science, language sciences, sociology, psychology). The research carried out in the M3 team falls into three main axes:

A- Models and Methods

This axis focuses on learning paradigms: developing data models and algorithms (models with many or few parameters, generative or not), with a view to their application to automatic language processing and languages as structured objects. These models are typically applied to the objects studied in the other two axes. Particular attention is paid to issues related to accessibility: reflecting on the specific methods to be implemented to develop inclusive technologies for our society. Effective, sober models, adapted to the representation of specific data, are particularly sought to promote explainable and responsible approaches to the data studied. These models make it possible to create hybrid solutions that try to control generative AIArtificial Intelligence. They also provide approaches that can be used to build more ethical systems.

B- Typology, Variation, and Universals in Languages

This axis focuses on describing the characteristics of linguistic systems based on corpora. This involves automatically applying typological schemes and language comparisons according to these characteristics. Considering variation is a major point, with work on diatopic, diastratic, diaphasic, or diachronic changes within languages, or linked to language contact of under-resourced or well-described languages. Syntactic, phonological, phonetic, articulatory, and prosodic systems are considered. The representation of the differences (in terms of distances or projected onto an atlas) between the systems studied is another highlight.

C- Contextualized Behaviors for Interaction

This axis models and describes performances during situated communicational interactions, at para- and extra-linguistic levels: whether for pragmatic functions (speech acts, attitudinal nuances), emotions (affective interaction, social emotions), nudges (gentle manipulation), vocal effort (Lombard speech, voice strength), etc. It aims to propose models of behavioral changes linked to these phenomena, in order to be able to detect them, measure their variation or dynamics, and categorize them. The analysis of the acoustic-linguistic parameters of the voice (parameters derived from models, glottal source, articulatory choices, etc.) makes it possible to link performances and functions.dèles, source glottique, choix articulatoires, etc.) permet de faire le lien entre les performances et les fonctions.

News

L’équipe se compose de 9 membres permanents (chercheurs CNRS, enseignants-chercheurs à l’Université Paris-Saclay), 6 doctorants, et 9 personnes ingénieures ou chercheurs CDD. Nous entretenons des liens avec les industriels (thèses en contrat CIFRE, projets de recherche) et organisons régulièrement des manifestations scientifiques (conférence TALN, ateliers et workshops scientifiques, etc.).

Team members

Publications on HAL

  • Notice d’encyclopédie ou de dictionnaire

    Albert Rilliard. Fala, emoções e atitudes. Speech Sciences Entries, 2024, https://gepf.falar.org/entries/66. ⟨hal-05474723⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Natalia Grabar, Cyril Grouin. Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing. IMIA Yearbook of Medical Informatics, 2022, 31 (01), pp.254-260. ⟨10.1055/s-0042-1742547⟩. ⟨hal-03931852⟩

    ILES, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset. Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER. The Fifteenth biennial Language Resources and Evaluation Conference (LREC 2026), May 2026, Palma de Majorque, Spain. ⟨hal-05546569⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Clémentine Bleuze, Fanny Ducel, Maxime Amblard, Karën Fort. COCOA: Creation and Exploratory Investigation of a Corpus of Claims from NLP Articles. LREC 2026 – International Conference on Language Resources and Evaluation, ELRA Language Resources Association, May 2026, Palma de Mallorca, Spain. ⟨hal-05547842⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials. 2026. ⟨hal-05533706⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset, Khaled Zaouk, et al.. Diart: A Python Library for Real-Time Speaker Diarization. Journal of Open Source Software, 2024, 9 (99), pp.5266. ⟨10.21105/joss.05266⟩. ⟨hal-05530961⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Clémentine Bleuze, Karën Fort, Vincent P. Martin, Aurélie Névéol. Grands modèles de langue pour la détection de pathologies psychiatriques : promesses, réalité, et enjeux. Journée d’étude “LLM@hopital”, ATALA, Mar 2026, Paris, France. ⟨hal-05532823⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Iskandar Boucharenc. Hierarchical Prefixes for Long Document Representations. ECIR, Apr 2025, Lucca, Italy. pp.171-177, ⟨10.1007/978-3-031-88720-8_28⟩. ⟨hal-05530637⟩

    STL

    Year of publication

  • Communication dans un congrès

    Fanny Ducel, Aurélie Névéol, Vidit Khazanchi, Loïc Leclere, Arthur Pedrini, et al.. Code-switching as a Bias Indicator in LLMs: “The consequences are not the same para nosotros”. LREC 2026 – 15th biennial Language Resources and Evaluation Conference, May 2026, Palma De Mallorca, Spain. ⟨hal-05529786⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Oralie Cattan, Christophe Servan, Sophie Rosset. On the Usability of Transformers-based models for a French Question-Answering task. Joint Conference of the Information Retrieval Communities in Europe (CIRCLE) 2022, Jul 2022, Samatan, France. ⟨hal-03701740⟩

    ILES, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Léa Pacini, Jérôme Dupire, Isabelle Barbet, Olivier Pons, Camille Guinaudeau, et al.. Textbook’s accessibility for children with dyspraxia and visual disability. 17th International Conference of the Association for the Advancement of Assistive Technology in Europe, AAATE 2023, Association for the Advancement of Assistive Technology in Europe, Aug 2023, Paris, France. ⟨hal-04410340⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Fanny Ducel. How to define, understand and evaluate stereotypical biases in language models?. Séminaire du groupe de travail Intelligence Artificielle Sûre, Intelligible et Vérifiable (IASIV), Mar 2025, Palaiseau, France. ⟨hal-05467784⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Gustave Cortal. Natural language processing for subjectivity analysis in personal narratives. Computation and Language [cs.CL]. Université Paris-Saclay, 2026. English. ⟨NNT : 2026UPASG003⟩. ⟨tel-05501345⟩

    STL

    Year of publication

    Available in free access

  • Poster de conférence

    Julie Halbout, Annelies Braffort, Michèle Gouiffès. Annotation automatique d’un corpus de Langue des Signes Française. Rencontres Jeunes Chercheurs en Parole (RJCP), Nov 2025, Paris, France. ⟨hal-05495878⟩

    STL

    Year of publication

  • Poster de conférence

    Annelies Braffort, Michael Filhol, Michèle Gouiffès, Julie Halbout, Julie Lascar. Sign Language Processing with Linguistic Structure. BMVA Symposium on AIArtificial Intelligence for Sign Language Translation, Production, and Linguistics, Dec 2025, London, United Kingdom. ⟨hal-05495664⟩

    STL

    Year of publication

  • Communication dans un congrès

    Jules Françoise, Julie Lascar, Cyril Verrecchia, Sidonie Minodier, Michèle Gouiffès, et al.. LaboSignes : vers une IAIntelligence Artificielle participative pour la reconnaissance automatique de la Langue des Signes Française. Journée d’études AFIA-ATALA : Technologies linguistiques pour les langues peu dotées, Dec 2025, Paris, France. ⟨hal-05495906⟩

  • Communication dans un congrès

    Idrissa Mahamoudou Dicko, Nona Naderi. Biomedical hallucination detection of LLMs using Med-HALT and HaloScope frameworks. 10th Junior Conference on Data Sciences and Engineering Conference (JDSE 2025), Sep 2025, Paris, France. ⟨hal-05483690⟩

    STL

    Year of publication

    Available in free access

  • Chapitre d'ouvrage

    Philippe Boula de Mareüil, Albert Rilliard, Frédéric Vernier. Valorisation de la diversité linguistique à travers un atlas sonore. Myriam Caressa; Christophe Doubovetzky. Langue(s) et droit(s). Enjeux et paradoxes en France, L’Harmattan, pp.177-188, 2025, Logiques Juridiques, 978-2-336-55319-1. ⟨hal-05464189⟩

    AVIZ, STL

    Year of publication

  • Chapitre d'ouvrage

    Natalia Grabar, Thierry Hamon, Emmanuelle Canut. Le langage simplifié pour le public FLE : des critères linguistiques à interroger. Éducation, formation et communication. L’accompagnement des publics en exil. Problèmes de langue et modalités de communication, A paraître, 2865310019. ⟨hal-05465059⟩

    STL

    Year of publication

  • Article dans une revue

    Anjani Dhrangadhariya, Roger Hilfiker, Karl Martin Sattelmayer, Nona Naderi, Katia Giacomino, et al.. RoBuster: A Corpus Annotated with Risk of Bias Text Spans in Randomized Controlled Trials in Physiotherapy and Rehabilitation (forthcoming/in press). JMIR Formative Research, In press, ⟨10.2196/55127⟩. ⟨hal-05462769⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Fanny Ducel, Karën Fort, Aurélie Névéol. La linguistique appliquée pour une IAIntelligence Artificielle plus éthique. NéALA 2025 – Colloque sur Naturel et Artificiel en Linguistique Appliquée : une époque de paradoxes, Jul 2025, Nancy, France. ⟨hal-05457534⟩

    STL

    Year of publication

    Available in free access

  • Autre publication scientifique

    Luciana Benotti, Fanny Ducel, Karën Fort, Guido Ivetta, Zhijing Jin, et al.. Navigating Ethical Challenges in NLP: Hands-on strategies for students and researchers. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts), 2025, ⟨10.18653/v1/2025.acl-tutorials.5⟩. ⟨hal-05457524⟩

    STL

    Year of publication

    Available in free access

  • Chapitre d'ouvrage

    Simon Devauchelle, Albert Rilliard, David Doukhan, Lucas Ondel Yang. Variation of Perceived Voice Pitch Across Time Periods, Gender, and Age in French Media Archives. Valentina De Iacovo; Bianca Maria De Paolis; Daniela Mereu. The voice in the media and new technologies, 12 (004), Officinaventuno, pp.47-71, 2024, Studi Associazione Italiana Scienze della Voce, 978-88-97657-73-6. ⟨10.17469/O2112AISV000004⟩. ⟨hal-05450567⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Armand Stricker. Towards More Natural Dialogues : Integrating Chitchat Capabilities into Task-oriented Dialogue Agents. Document and Text Processing. Université Paris-Saclay, 2025. English. ⟨NNT : 2025UPASG065⟩. ⟨tel-05453281⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Mathieu Laï-King, Patrick Paroubek. Pre-training data selection for biomedical domain adaptation using journal impact metrics. 23rd Workshop on Biomedical Natural Language Processing, Aug 2024, Bangkok, Thailand. pp.363-369, ⟨10.18653/v1/2024.bionlp-1.27⟩. ⟨hal-05447036⟩

    STL

    Year of publication

    Available in free access

  • Rapport

    Adrien Berthelot, Tiago da Silva Barros, Laurent Lefèvre, Anne-Laure Ligozat, Emeline Pegon. Multi-criteria and multi-stage environmental study of Pl@ntnet service for the year 2024. Inria Lyon. 2026. ⟨hal-05448455v2⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    François Buet, Camille Guinaudeau, Cyril Grouin, Sahar Ghannay, Shin’ichi Satoh. XAI for Gender Representation in Media Analysis. 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), IEEE Signal Processing Society, Apr 2025, Hyderabad, India. pp.1-5, ⟨10.1109/ICASSP49660.2025.10888945⟩. ⟨hal-05442625⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Phrashant Khatri, Hansjörg Mixdorff, Preeti Rao, Albert Rilliard. Recognition of Audio-Visual Attitudes. 36. Konferenz Elektronische Sprachsignalverarbeitung (ESSV), Department of Speech Science and Phonetics of the Institute of Music, Media and Speech Sciences at the Martin Luther University Halle-Wittenberg in Halle/Saale; Central German Association for Speech Science and Speech Education, Mar 2025, Halle / Saale, Germany. pp.19-26. ⟨hal-05426157⟩

    STL

    Year of publication

    Available in free access

  • Poster de conférence

    Luc Pommeret, Sophie Rosset, Christophe Servan, Sahar Ghannay. AtomicEval: Evaluation Framework for Atomic Proposition Autonomy with French Propositioner. 10th Junior Conference on Data Sciences and Engineering, Sep 2025, Gif-sur-Yvette, France. . ⟨hal-05414939⟩

    STL

    Year of publication

    Available in free access