M3

Models, Methods, and Multilingualism (M3)

Coordination : Gilles ADDA

The research focus of the Models, Methods, and Multilingualism team is on developing models and methods to help both the discovery of fundamental properties of language and the implementation of efficient systems to process it. We are interested in language in all its dimensions and modalities but strongly emphasize the multilingual dimension. The methods and models developed by the team are diverse by nature: computational (neural models, stochastic or symbolic methods), linguistic (language typology, linguistic diversity, and universals, affects), or societal (accessibility, nudges, language preservation, processing of underresourced languages and dialects). One of the team’s common aims is to relate language universals to the characteristics of language diversity and variation within a unified vision of linguistic and statistical (“automatic processing”) modeling of languages.

We can illustrate the team’s activity through a structured set of themes and sub-themes:

Universals in multilingual language modeling

Keywords: Linguistic diversity and universality in modeling; Representation of oral languages; Universal phonetic modeling and representation; Unified multilingual modeling and automatic identification of idiomaticity; Large multilingual and multimodal language models; Generative AI; Universal and cultural models of affects; Syntax of oral languages; Quantitative typology; Comparable corpora; Accessibility; Evaluation and resources; Multilingual generic systems: speech recognition, text generation, speech synthesis.

Methods and models for under-resourced languages

Keywords: Documentation of under-resourced languages; Scientific policies for endangered languages, Ethical and societal impact; Automatic processing of under-resourced languages; Massively multilingual models and interlingual transfer; Portability from a well-resourced to an under-resourced language.

Machine learning for NLP

Keywords: Machine learning and inference algorithms for structured prediction; Weak or unsupervised learning; Continuous learning; Representation learning and meta-learning; Learning in context of affective interactions.

Corpus linguistics, interlingual and intralingual variation

Keywords: Accents, dialects, and varieties: dialectometry (geoprosody) and linguistic cartography; Speaking styles; Variation of prosodic codes between languages and cultures (symbolic codes and socially coded attitudes); Expressive and multimodal prosody: illocutions, attitudes, social affects; Voice, voice strength, vocal quality, social uses.

Modeling of affective behaviors

Keywords: Automatic learning and detection of affective behaviors from paralinguistic and linguistic cues; Adaptation of large acoustic and linguistic models to emotion detection; Detection of abnormal behaviors and nudges in interaction; Ethical and societal impact of affects modeling and nudges.

Coordination

  • Sciences et Technologies des Langues

    M3

    Adda Gilles

    Head of M3

    Engineer and researcher

Membres de l’équipe

Publications

  • Communication dans un congrès

    Maxime Fily, Guillaume Wisniewski, Séverine Guillaume, Gilles Adda, Alexis Michaud. Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models. Findings of the Association for Computational Linguistics: EACL 2024, Association for Computational Linguistics, Mar 2024, St. Julian’s, Malta. ⟨hal-04561819⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Hugo Boulanger, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. Using Structured Health Information for Controlled Generation of Clinical Cases in French. The 6th Clinical Natural Language Processing Workshop At NAACL 2024 (ClinicalNLP 2024), Jun 2024, Mexico city, Mexico. ⟨hal-04558890⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Nicolas Hiebel, Bertrand Remy, Bruno Guillaume, Olivier Ferret, Aurélie Névéol, et al.. Hostomytho: A GWAP for Synthetic Clinical Texts Evaluation and Annotation. Games and Natural Language Processing Workshop at LREC-COLING 2024, May 2024, Turin, Italy, May 2024, Turin (Italie), Italy. ⟨hal-04555052⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Oralie Cattan. Systèmes de questions-réponses interactifs à grande échelle. Informatique [cs]. Université Paris-Saclay (2020-..), 2022. Français. ⟨NNT : ⟩. ⟨tel-04551072⟩

    STL

    Year of publication

  • Article dans une revue

    Luma da Silva Miranda, João Antônio de Moraes, Albert Rilliard. Visual channel facilitates the comprehension of the intonation of Brazilian Portuguese wh-questions and wh-exclamations: evidence from congruent and incongruent stimuli. Language and Cognition, 2024, pp.1-21. ⟨10.1017/langcog.2024.16⟩. ⟨hal-04538371⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials. 2024. ⟨hal-04536273⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Djegdjiga Amazouz, Martine-Adda Decker, Lori Lamel. Variation du voisement des occlusives orales en code-switching: analyses par ABX automatique et mesures acoustiques. Journées d’Études sur la Parole – JEP2022, Jun 2022, Noirmoutier, France. ⟨hal-03703081⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials. 2024. ⟨hal-04536600⟩

    STL

    Year of publication

  • Communication dans un congrès

    Karën Fort, Laura Alonso Alemany, Luciana Benotti, Julien Bezançon, Claudia Borg, et al.. Your Stereotypical Mileage may Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, May 2024, Turin (Italie), Italy. ⟨hal-04537096⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paul Lerner, Cyril Grouin. INCLURE: a Dataset and Toolkit for Inclusive French Translation. The 17th Workshop on Building and Using Comparable Corpora (BUCC @ LREC 2024), 2024, Turin, Italy. ⟨hal-04531938⟩

    STL

    Year of publication

    Available in free access