M3

Models, Methods, and Multilingualism (M3)

Coordination : Gilles ADDA

The research focus of the Models, Methods, and Multilingualism team is on developing models and methods to help both the discovery of fundamental properties of language and the implementation of efficient systems to process it. We are interested in language in all its dimensions and modalities but strongly emphasize the multilingual dimension. The methods and models developed by the team are diverse by nature: computational (neural models, stochastic or symbolic methods), linguistic (language typology, linguistic diversity, and universals, affects), or societal (accessibility, nudges, language preservation, processing of underresourced languages and dialects). One of the team’s common aims is to relate language universals to the characteristics of language diversity and variation within a unified vision of linguistic and statistical (“automatic processing”) modeling of languages.

We can illustrate the team’s activity through a structured set of themes and sub-themes:

Universals in multilingual language modeling

Keywords: Linguistic diversity and universality in modeling; Representation of oral languages; Universal phonetic modeling and representation; Unified multilingual modeling and automatic identification of idiomaticity; Large multilingual and multimodal language models; Generative AI; Universal and cultural models of affects; Syntax of oral languages; Quantitative typology; Comparable corpora; Accessibility; Evaluation and resources; Multilingual generic systems: speech recognition, text generation, speech synthesis.

Methods and models for under-resourced languages

Keywords: Documentation of under-resourced languages; Scientific policies for endangered languages, Ethical and societal impact; Automatic processing of under-resourced languages; Massively multilingual models and interlingual transfer; Portability from a well-resourced to an under-resourced language.

Machine learning for NLP

Keywords: Machine learning and inference algorithms for structured prediction; Weak or unsupervised learning; Continuous learning; Representation learning and meta-learning; Learning in context of affective interactions.

Corpus linguistics, interlingual and intralingual variation

Keywords: Accents, dialects, and varieties: dialectometry (geoprosody) and linguistic cartography; Speaking styles; Variation of prosodic codes between languages and cultures (symbolic codes and socially coded attitudes); Expressive and multimodal prosody: illocutions, attitudes, social affects; Voice, voice strength, vocal quality, social uses.

Modeling of affective behaviors

Keywords: Automatic learning and detection of affective behaviors from paralinguistic and linguistic cues; Adaptation of large acoustic and linguistic models to emotion detection; Detection of abnormal behaviors and nudges in interaction; Ethical and societal impact of affects modeling and nudges.

Coordination

  • Sciences et Technologies des Langues

    M3

    Adda Gilles

    Head of M3

    Engineer and researcher

Membres de l’équipe

Publications

  • Article dans une revue

    Annelies Braffort. L’héritage scientifique de Patrice Dalle : le traitement automatique des langues des signes au service de l’enseignement en LSF. La main de Thôt : théories, enjeux et pratiques de la traduction, A paraître, 11. ⟨hal-04256752⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Clément Morand, Aurélie Névéol, Anne-Laure Ligozat. MLCA: a tool for Machine Learning Life Cycle Assessment. 2024 International Conference on ICT for Sustainability (ICT4S), Jun 2024, Stockholm, Sweden. ⟨hal-04643414⟩

    STL

    Year of publication

    Available in free access

  • Chapitre d'ouvrage

    Philippe Boula de Mareüil, Antonio Romano, Marc Evrard, Alexandre François. Cartografia di innovazioni rispetto al latino attraverso un atlante sonoro dell’Europa. Erica Autelli. Il patrimonio linguistico storico della Liguria 2, InSedicesimo, pp.51-62, 2024. ⟨hal-04644943⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Nassim Naderi, Nona Naderi, Huey Chern Boo, Kuan-Huei Lee, Po-Ju Chen. Editorial: Food tourism: culture, technology, and sustainability. Frontiers in Nutrition, 2024, 11 (1), pp.e42630. ⟨10.3389/fnut.2024.1390676⟩. ⟨hal-04644101⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Jenny Copara, Nona Naderi, Gilles Falquet, Douglas Teodoro. A data-driven assessment of biomedical terminology evolution using information theoretical and network analysis approaches. 2024. ⟨hal-04644071⟩

    STL

    Year of publication

  • Communication dans un congrès

    Constant Bonard, Gustave Cortal. Improving Language Models for Emotion Analysis: Insights from Cognitive Science. The 13th edition of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2024) co-located with the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Aug 2024, Bangkok, Thailand. ⟨hal-04624340v2⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Camille Challant, Michael Filhol. Extension d’AZee avec des règles de production concernant les gestes non-manuels pour la langue des signes française. 35èmes Journées d’Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.410-421. ⟨hal-04623032⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Clémence Sebe, Sarah Cohen-Boulakia, Olivier Ferret, Aurélie Névéol. Extraction d’entités nommées décrivant des chaînes de traitement bioinformatiques dans des articles scientifiques en anglais. 35èmes Journées d’Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.422-434. ⟨hal-04623033⟩

    BioInfo, STL, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Rémi Uro, Albert Rilliard, David Doukhan, Marie Tahon, Antoine Laurent. Évaluation perceptive de l’anticipation de la prise de parole lors d’interactions dialogiques en français. 35èmes Journées d’Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.390-400. ⟨hal-04623090⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Marco Naguib, Aurélie Névéol, Xavier Tannier. Reconnaissance d’entités cliniques en few-shot en trois langues. 35èmes Journées d’Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.169-197. ⟨hal-04623016⟩

    STL

    Year of publication

    Available in free access