M3

Models, Methods, and Multilingualism (M3)

Coordination : Albert RILLIARD

The M3 team is interested in models and methods aimed at the fundamental description and automatic processing of natural language and its multilingual dimension. This involves the development of models and methods able to process, extract, and describe relevant characteristics of these systems from datasets. We consider languages in all their uses, with special focus on under-resourced languages and comparative approaches to language description (multilingual approaches). The team questions the interactions between models and systems: How does a model help our understanding of languages? How do sets of languages differ, and at what levels? How do we use generative models to produce controlled instances of given phenomena? The implementation of explainable and adapted models allows interaction between disciplines and promotes interdisciplinary work (computer science, language sciences, sociology, psychology). The research carried out in the M3 team falls into three main axes:

A- Models and Methods

This axis focuses on learning paradigms: developing data models and algorithms (models with many or few parameters, generative or not), with a view to their application to automatic language processing and languages as structured objects. These models are typically applied to the objects studied in the other two axes. Particular attention is paid to issues related to accessibility: reflecting on the specific methods to be implemented to develop inclusive technologies for our society. Effective, sober models, adapted to the representation of specific data, are particularly sought to promote explainable and responsible approaches to the data studied. These models make it possible to create hybrid solutions that try to control generative AI. They also provide approaches that can be used to build more ethical systems.

B- Typology, Variation, and Universals in Languages

This axis focuses on describing the characteristics of linguistic systems based on corpora. This involves automatically applying typological schemes and language comparisons according to these characteristics. Considering variation is a major point, with work on diatopic, diastratic, diaphasic, or diachronic changes within languages, or linked to language contact of under-resourced or well-described languages. Syntactic, phonological, phonetic, articulatory, and prosodic systems are considered. The representation of the differences (in terms of distances or projected onto an atlas) between the systems studied is another highlight.

C- Contextualized Behaviors for Interaction

This axis models and describes performances during situated communicational interactions, at para- and extra-linguistic levels: whether for pragmatic functions (speech acts, attitudinal nuances), emotions (affective interaction, social emotions), nudges (gentle manipulation), vocal effort (Lombard speech, voice strength), etc. It aims to propose models of behavioral changes linked to these phenomena, in order to be able to detect them, measure their variation or dynamics, and categorize them. The analysis of the acoustic-linguistic parameters of the voice (parameters derived from models, glottal source, articulatory choices, etc.) makes it possible to link performances and functions.

Coordination

Members

Publications

  • Article dans une revue

    Philippe Boula de Mareüil, Marc Evrard, Alexandre François, Antonio Romano. Computer modelling of innovations relative to Latin in contemporary Romance dialects. Isogloss. Open Journal of Romance Linguistics, 2025, 11 (3), pp.1 – 31. ⟨10.5565/rev/isogloss.423⟩. ⟨hal-05144863⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Anne Baillot, Anne-Laure Ligozat. Introduction. Sobriété numérique. Humanités numériques, 2025, 11, ⟨10.4000/1498x⟩. ⟨hal-05143071⟩

    STL

    Year of publication

  • Communication dans un congrès

    Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset. Leveraging Information Retrieval to Enhance Spoken Language Understanding Prompts in Few-Shot Learning. Interpseech 2025, Aug 2025, Rotterdam, Netherlands. ⟨hal-05095796⟩

    STL, STL

    Year of publication

    Available in free access

  • Article dans une revue

    Agata Savary. NLP-based Study of Universals of Linguistic Idiosyncrasy. Dagstuhl Reports, 2023, 13 (5), pp.64-67. ⟨hal-04323075⟩

    ILES, STL

    Year of publication

  • Thèse

    Mathieu Laï-King. Qualité des articles de recherche et modèles de langue neuronaux : applications au domaine biomédical. Intelligence artificielle [cs.AI]. Université Paris-Saclay, 2025. Français. ⟨NNT : 2025UPASG031⟩. ⟨tel-05079724⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Clément Morand, Anne-Laure Ligozat, Aurélie Névéol. Characterizing Goals and Impacts of Digitalization: The Case of Promises in French Healthcare Policies. 2025. ⟨hal-05066176⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Luc Mottin, Julien Gobeill, Jeevanthi Liyana Pathirana, Nona Naderi, Anaïs Mottaz, et al.. Manuscript Classification to Support the Analysis of Biases in Publication Opportunities. The 35th Medical Informatics Europe Conference, May 2025, Glagow, United Kingdom. ⟨10.3233/SHTI250475⟩. ⟨hal-05070636⟩

    STL

    Year of publication

  • Rapport

    Karin Dassas, Cyrille Bonamy, Bruno Bzeznik, Romaric David, Emmanuelle Frenoux, et al.. Estimer l’impact carbone des activités numériques de l’Observatoire de Paris. EcoInfo. 2025, pp.1-47. ⟨hal-05068666⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. Clinical text generation: Are we there yet?. Annual Review of Biomedical Data Science, 2025, 8, ⟨10.1146/annurev-biodatasci-103123-095202⟩. ⟨hal-05055957⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Arezoo Saedi, Afsaneh Fatemi, Mohammad Ali Nematbakhsh, Sophie Rosset, Anne Vilnat. Entity search based on consumer preferences leveraging user reviews. Expert Systems with Applications, 2025, 275, pp.126990. ⟨10.1016/j.eswa.2025.126990⟩. ⟨hal-05047109⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Foucauld Estignard, Sahar Ghannay, Julien Girard-Satabin, Nicolas Hiebel, Aurélie Névéol. Evaluating the Confidentiality of Synthetic Clinical Texts Generated by Language Models. 23rd International Conference on Artificial Intelligence in Medicine (AIME), Jun 2025, Pavie, Italy. ⟨hal-05046326v2⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Lisa Raithel, Philippe Thomas, Bhuvanesh Verma, Roland Roller, Hui-Syuan Yeh, et al.. Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese. The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.170-182. ⟨hal-04781015⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient’s Point of View. 2025. ⟨hal-04992084⟩

    STL

    Year of publication

    Available in free access

  • Chapitre d'ouvrage

    Mathieu Constant, Marie Candito, Yannick Parmentier, Carlos Ramisch, Agata Savary. Construction, exploitation et exploration de ressources linguistiques pour le traitement automatique des expressions polylexicales en français : le projet PARSEME-FR. Lidia Becker; Julia Kuhn; Christina Ossenkop; Claudia Polzin-Haumann; Elton Prifti. Digitale romanistische Sprachwissenschaft: Stand und Perspektiven, Narr Francke Attempto Verlag GmbH + Co. KG, pp.219-250, 2023, Romanistisches Kolloquium, 978-3-8233-8506-6. ⟨hal-04995189⟩

    ILES, STL

    Year of publication

  • Thèse

    Rémi Uro. Détection et caractérisation des interruptions dans les interactions orales pour la description du comportement des femmes et des hommes dans les contenus audiovisuels. Informatique et langage [cs.CL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG055⟩. ⟨tel-04994439⟩

    STL, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Amel Fraisse, Patrick Paroubek, Ramit Goyal, Nassreddine Znaidi. Measuring Multilingualism in Online Public Access Catalogs. The ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Dec 2024, Hong Kong, China. ⟨10.1145/3677389.3702544⟩. ⟨hal-04986773⟩

    ILES, STL

    Year of publication

  • Communication dans un congrès

    Manon Scholivet, Agata Savary, Louis Estève, Marie Candito, Carlos Ramisch. SELEXINI – a large and diverse automatically parsed corpus of French. Building and Using Comparable Corpora (BUCC), Jan 2025, Abu DHABI, United Arab Emirates. ⟨hal-04978746⟩

    ILES, STL

    Year of publication

    Available in free access

  • Thèse

    Hui-Syuan Yeh. Prompt-based Relation Extraction for Pharmacovigilance. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG097⟩. ⟨tel-04968043⟩

    STL, STL

    Year of publication

    Available in free access

  • Rapport

    Sylvain Bouveret, Aurélie Bugeau, Frenoux Emmanuelle, Julien Lefevre, Laurent Lefèvre, et al.. Quiz sur les impacts environnementaux du numérique. EcoInfo. 2025, pp.1-5. ⟨hal-04960328v2⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Camille Challant. Représentation formelle avec AZee et contraintes grammaticales pour la langue des signes française. Théorie et langage formel [cs.FL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG086⟩. ⟨tel-04957486⟩

    STL, STL

    Year of publication

    Available in free access

  • Article dans une revue

    Zheng Zhang, Brian Denton, Xiaolan Xie. Branch and Price for Chance-Constrained Bin Packing. INFORMS Journal on Computing, 2020, 32 (3), pp.547-564. ⟨10.1287/ijoc.2019.0894⟩. ⟨hal-04941861⟩

    ILES, STL

    Year of publication

  • Communication dans un congrès

    Simon Devauchelle, David Doukhan, Lucas Ondel Yang, Benjamin Élie, Albert Rilliard. Estimation automatique de caractéristiques acoustiques pour l’étude diachronique du français oral dans les médias. Atelier DAHLIA: DigitAl Humanities and cuLtural herItAge: data and knowledge management and analysis, Claudia Marinica; Fabrice Guillet; Florent Laroche, Jan 2025, Strasbourg, France. ⟨hal-04938377⟩

    STL, STL

    Year of publication

    Available in free access

  • Article dans une revue

    Rémi Uro, David Doukhan. Pendant le confinement, le temps de parole des femmes a baissé à la télévision et à la radio. La revue des médias, 2020. ⟨hal-04906221⟩

    STL, TLP

    Year of publication

  • Communication dans un congrès

    Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. “Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French. NAACL 2025 – Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Apr 2025, Albuquerque, United States. ⟨hal-04938811⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. RNA-TorsionBERT: leveraging language models for RNA 3D torsion angles prediction. Bioinformatics, 2025, 41 (1), pp.btaf004. ⟨10.1093/bioinformatics/btaf004⟩. ⟨hal-04911519⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Marion Ficher, Tom Bauer, Anne-Laure Ligozat. A comprehensive review of the end-of-life modeling in LCAs of digital equipment. International Journal of Life Cycle Assessment, 2024, 30 (1), pp.20-42. ⟨10.1007/s11367-024-02367-x⟩. ⟨hal-04924691⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Atilla Kaan Alkan. Natural Language Processing for Analyzing Messages of Astrophysical Observations. Artificial Intelligence [cs.AI]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG114⟩. ⟨tel-04928511⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. Has AlphaFold3 achieved success for RNAs?. 2025. ⟨hal-04911522⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Léa-Marie Lam-Yee-Mui. Modélisations pour la reconnaissance de la parole à données contraintes. Traitement du signal et de l’image [eess.SP]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG075⟩. ⟨tel-04918814⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. Has AlphaFold 3 achieved success for RNA?. Acta crystallographica Section D : Structural biology [1993-..], 2025, 81 (2), pp.49–62. ⟨10.1107/S2059798325000592⟩. ⟨hal-04919467⟩

    STL

    Year of publication