M3

Models, Methods, and Multilingualism (M3)

Coordination : Gilles ADDA

The research focus of the Models, Methods, and Multilingualism team is on developing models and methods to help both the discovery of fundamental properties of language and the implementation of efficient systems to process it. We are interested in language in all its dimensions and modalities but strongly emphasize the multilingual dimension. The methods and models developed by the team are diverse by nature: computational (neural models, stochastic or symbolic methods), linguistic (language typology, linguistic diversity, and universals, affects), or societal (accessibility, nudges, language preservation, processing of underresourced languages and dialects). One of the team’s common aims is to relate language universals to the characteristics of language diversity and variation within a unified vision of linguistic and statistical (“automatic processing”) modeling of languages.

We can illustrate the team’s activity through a structured set of themes and sub-themes:

Universals in multilingual language modeling

Keywords: Linguistic diversity and universality in modeling; Representation of oral languages; Universal phonetic modeling and representation; Unified multilingual modeling and automatic identification of idiomaticity; Large multilingual and multimodal language models; Generative AI; Universal and cultural models of affects; Syntax of oral languages; Quantitative typology; Comparable corpora; Accessibility; Evaluation and resources; Multilingual generic systems: speech recognition, text generation, speech synthesis.

Methods and models for under-resourced languages

Keywords: Documentation of under-resourced languages; Scientific policies for endangered languages, Ethical and societal impact; Automatic processing of under-resourced languages; Massively multilingual models and interlingual transfer; Portability from a well-resourced to an under-resourced language.

Machine learning for NLP

Keywords: Machine learning and inference algorithms for structured prediction; Weak or unsupervised learning; Continuous learning; Representation learning and meta-learning; Learning in context of affective interactions.

Corpus linguistics, interlingual and intralingual variation

Keywords: Accents, dialects, and varieties: dialectometry (geoprosody) and linguistic cartography; Speaking styles; Variation of prosodic codes between languages and cultures (symbolic codes and socially coded attitudes); Expressive and multimodal prosody: illocutions, attitudes, social affects; Voice, voice strength, vocal quality, social uses.

Modeling of affective behaviors

Keywords: Automatic learning and detection of affective behaviors from paralinguistic and linguistic cues; Adaptation of large acoustic and linguistic models to emotion detection; Detection of abnormal behaviors and nudges in interaction; Ethical and societal impact of affects modeling and nudges.

Coordination

  • Sciences et Technologies des Langues

    M3

    Adda Gilles

    Head of M3

    Engineer and researcher

Membres de l’équipe

Publications

  • Article dans une revue

    Luma da Silva Miranda, João Antônio de Moraes, Albert Rilliard. Visual channel facilitates the comprehension of the intonation of Brazilian Portuguese wh-questions and wh-exclamations: evidence from congruent and incongruent stimuli. Language and Cognition, 2024, pp.1-21. ⟨10.1017/langcog.2024.16⟩. ⟨hal-04538371⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials. 2024. ⟨hal-04536273⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Djegdjiga Amazouz, Martine-Adda Decker, Lori Lamel. Variation du voisement des occlusives orales en code-switching: analyses par ABX automatique et mesures acoustiques. Journées d’Études sur la Parole – JEP2022, Jun 2022, Noirmoutier, France. ⟨hal-03703081⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials. 2024. ⟨hal-04536600⟩

    STL

    Year of publication

  • Communication dans un congrès

    Karën Fort, Laura Alonso Alemany, Luciana Benotti, Julien Bezançon, Claudia Borg, et al.. Your Stereotypical Mileage may Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, May 2024, Turin (Italie), Italy. ⟨hal-04537096⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paul Lerner, Cyril Grouin. INCLURE: a Dataset and Toolkit for Inclusive French Translation. The 17th Workshop on Building and Using Comparable Corpora (BUCC @ LREC 2024), 2024, Turin, Italy. ⟨hal-04531938⟩

    STL

    Year of publication

    Available in free access

  • Proceedings/Recueil des communications

    Karën Fort, Aurélie Névéol. Ethics and NLP: 10 years after. Journée d’études ATALA “éthique et TALTraitement Automatique des langues : 10 ans après”, 2024. ⟨hal-04533870⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paul Lerner, Olivier Ferret, Camille Guinaudeau. Cross-modal Retrieval for Knowledge-based Visual Question Answering. 46th European Conference on Information Retrieval (ECIR 2024), 2024, Glasgow, United Kingdom. ⟨hal-04384431⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Tomohiro Nishiyama, Lisa Raithel, Roland Roller, Pierre Zweigenbaum, Eiji Aramaki. Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain. Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo), Mar 2024, St. Julian’s, Malta. pp.8-17. ⟨hal-04528240⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Nadège Alavoine, Gaëlle Laperriere, Christophe Servan, Sahar Ghannay, Sophie Rosset. New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy. ⟨hal-04523286⟩

    STL

    Year of publication

    Available in free access