LIPS

Language Interaction Speech and Signe (LIPS)

Coordination : Ioana VASILESCU

The scientific challenges of the LIPS team concern oral, spoken and signed languages, with the aim of linguistic description and modeling. We are targeting a variety of applications such as speech recognition, dialogue systems, automatic detection of affective states, comprehension and generation, speech synthesis and automatic processing of sign languages. The ethical dimension is at the heart of our work, from the setting up of experimental paradigms to the use of our research results.

The team brings together researchers in automatic language processing and linguists around approaches centered on the situated dimension of language: we call on a variety of data, of different sizes and from different sources, illustrating linguistic variation in all its dimensions, from minimal units to meaning.

Multimodal processing involving the combinatorial written and spoken variety of languages spoken, as well as other visual information (e.g. occulometry), is also at the heart of our concerns.

The team comprises 13 permanent members (CNRS researchers, teacher-researchers at Université Paris-Saclay), 17 PhD researchers, and 13 contract researchers. We maintain links with industry (theses under CIFRE contracts, research projects) and regularly organize scientific events.

News

Coordination

  • Sciences et Technologies des Langues

    LIPS

    Vasilescu Ioana

    Research director

    Team coordinator

    Corpus linguistics, spoken language variation, multilingual corpora

Team members

Publications

  • Pré-publication, Document de travail

    Leticia Rebollo Couto, Albert Rilliard. Variación pragmática, traducción audiovisual y estrategias conversacionales para el doblaje: léxico coloquial y palabras tabús – Anexos. 2024. ⟨hal-04578522⟩

    STL

    Year of publication

  • Communication dans un congrès

    Rabab Alkhalifa, Hsuvas Borkakoty, Romain Deveaud, Alaa El-Ebshihy, Luis Espinosa-Anke, et al.. LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024. Advances In Information Retrieval (ECIR 2024), Mar 2024, Glasgow (Ecosse), United Kingdom. pp.60-66, ⟨10.1007/978-3-031-56072-9_8⟩. ⟨hal-04577466⟩

    STL

    Year of publication

  • Article dans une revue

    Boya Zhang, Nona Naderi, Rahul Mishra, Douglas Teodoro. Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and Validation. JMIR AI, 2024, 3, pp.e42630. ⟨10.2196/42630⟩. ⟨hal-04574791⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Hossein Rouhizadeh, Irina Nikishina, Anthony Yazdani, Alban Bornet, Boya Zhang, et al.. A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models. Scientific Data , 2024, 11 (1), pp.455. ⟨10.1038/s41597-024-03317-w⟩. ⟨hal-04574786⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Maxime Fily, Guillaume Wisniewski, Séverine Guillaume, Gilles Adda, Alexis Michaud. Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models. Findings of the Association for Computational Linguistics: EACL 2024, Association for Computational Linguistics, Mar 2024, St. Julian’s, Malta. ⟨hal-04561819⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Hugo Boulanger, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. Using Structured Health Information for Controlled Generation of Clinical Cases in French. The 6th Clinical Natural Language Processing Workshop At NAACL 2024 (ClinicalNLP 2024), Jun 2024, Mexico city, Mexico. ⟨hal-04558890⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Marion Ficher, Tom Bauer, Anne-Laure Ligozat. A comprehensive review of the end-of-life modeling in LCAs of digital equipment. 2024. ⟨hal-04555155⟩

    STL, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Nicolas Hiebel, Bertrand Remy, Bruno Guillaume, Olivier Ferret, Aurélie Névéol, et al.. Hostomytho: A GWAP for Synthetic Clinical Texts Evaluation and Annotation. Games and Natural Language Processing Workshop at LREC-COLING 2024, May 2024, Turin, Italy, May 2024, Turin (Italie), Italy. ⟨hal-04555052⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Oralie Cattan. Systèmes de questions-réponses interactifs à grande échelle. Informatique [cs]. Université Paris-Saclay (2020-..), 2022. Français. ⟨NNT : ⟩. ⟨tel-04551072⟩

    STL

    Year of publication

  • Article dans une revue

    Luma da Silva Miranda, João Antônio de Moraes, Albert Rilliard. Visual channel facilitates the comprehension of the intonation of Brazilian Portuguese wh-questions and wh-exclamations: evidence from congruent and incongruent stimuli. Language and Cognition, 2024, pp.1-21. ⟨10.1017/langcog.2024.16⟩. ⟨hal-04538371⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials. 2024. ⟨hal-04536273⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Djegdjiga Amazouz, Martine-Adda Decker, Lori Lamel. Variation du voisement des occlusives orales en code-switching: analyses par ABX automatique et mesures acoustiques. Journées d’Études sur la Parole – JEP2022, Jun 2022, Noirmoutier, France. ⟨hal-03703081⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials. 2024. ⟨hal-04536600⟩

    STL

    Year of publication

  • Communication dans un congrès

    Karën Fort, Laura Alonso Alemany, Luciana Benotti, Julien Bezançon, Claudia Borg, et al.. Your Stereotypical Mileage may Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, May 2024, Turin (Italie), Italy. ⟨hal-04537096⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paul Lerner, Cyril Grouin. INCLURE: a Dataset and Toolkit for Inclusive French Translation. The 17th Workshop on Building and Using Comparable Corpora (BUCC @ LREC 2024), 2024, Turin, Italy. ⟨hal-04531938⟩

    STL

    Year of publication

    Available in free access

  • Proceedings/Recueil des communications

    Karën Fort, Aurélie Névéol. Ethics and NLP: 10 years after. Journée d’études ATALA “éthique et TALTraitement Automatique des langues : 10 ans après”, 2024. ⟨hal-04533870⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paul Lerner, Olivier Ferret, Camille Guinaudeau. Cross-modal Retrieval for Knowledge-based Visual Question Answering. 46th European Conference on Information Retrieval (ECIR 2024), 2024, Glasgow, United Kingdom. ⟨hal-04384431⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Tomohiro Nishiyama, Lisa Raithel, Roland Roller, Pierre Zweigenbaum, Eiji Aramaki. Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain. Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo), Mar 2024, St. Julian’s, Malta. pp.8-17. ⟨hal-04528240⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Nadège Alavoine, Gaëlle Laperriere, Christophe Servan, Sahar Ghannay, Sophie Rosset. New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy. ⟨hal-04523286⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Nesrine Bannour, Christophe Servan, Aurélie Névéol, Xavier Tannier. A Benchmark Evaluation of Clinical Named Entity Recognition in French. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy. ⟨hal-04523267⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Christophe Servan, Sahar Ghannay, Sophie Rosset. mALBERT: Is a Compact Multilingual BERT Model Still Worth It?. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, May 2024, Torino, Italy. ⟨hal-04520797⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Aaron Boussidan, Fanny Ducel, Aurélie Névéol, Karën Fort. What ChatGPT tells us about ourselves. Journée d’étude Éthique et TALTraitement Automatique des langues 2024, Apr 2024, Nancy, France. ⟨hal-04521121⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Thierry Hamon, Natalia Grabar. Automatic Prediction of Semantic Labels for French Medical Terms. Medical Informatics Europe conference (MIE2022), May 2022, Nice, France. pp.868-869, ⟨10.3233/SHTI220610⟩. ⟨hal-04519905⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Pierre Lepagnol, Thomas Gerald, Sahar Ghannay, Christophe Servan, Sophie Rosset. Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification. LREC-COLING 2024, May 2024, TURIN, Italy. ⟨hal-04519930v2⟩

    ILES, ILES, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Mérième Bouhandi, Emmanuel Morin, Thierry Hamon. Graph Neural Networks for Adapting Off-the-shelf General Domain Language Models to Low-Resource Specialised Domains. 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022), ACL, Jul 2022, Seattle, Washington, United States. pp.36-42, ⟨10.18653/v1/2022.dlg4nlp-1.5⟩. ⟨hal-04517190⟩

    STL

    Year of publication

    Available in free access

  • Poster de conférence

    Elise Lincker, Léa Pacini, Olivier Pons, Camille Guinaudeau, Jérôme Dupire, et al.. MALIN : MAnuels scoLaires INclusifs : Accessibilité numérique des manuels scolaires. Colloque Handiversité 2023 – L’innovation pour le partage, Apr 2023, Gif-sur-Yvette, France. ⟨hal-04410349⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Angèle Gayet-Ageron, Khaoula Ben Messaoud, Mark Oliver Richards, Cyril Jaksic, Julien Gobeill, et al.. Assessment of gender and geographical bias in the editorial decision-making process of biomedical journals: A Case-Control study.. Medrxiv : the Preprint Server For Health Sciences, 2024, ⟨10.1101/2024.03.15.24304220⟩. ⟨hal-04510221⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Clement Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi. RNAdvisor: a comprehensive benchmarking tool for the measure and prediction of RNA structural model quality. Briefings in Bioinformatics, 2024, 25 (2), pp.bbae064. ⟨10.1093/bib/bbae064⟩. ⟨hal-04508073⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Anne-Laure Ligozat, Christophe Brun, Benjamin Demirdjian, Guillaume Gouget, Emilie Jardé, et al.. Setting Climate Targets: The Case of Higher Education and Research. BioRxiv, 2024, ⟨10.1101/2024.03.11.584380⟩. ⟨hal-04505199⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Yanis Ouakrim, Hannah Bull, Michèle Gouiffès, Denis Beautemps, Thomas Hueber, et al.. Mediapi-RGB: Enabling Technological Breakthroughs in French Sign Language (LSF) Research through an Extensive Video-Text Corpus. VISAPP 2024 – 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Feb 2024, Rome, Italy. ⟨10.5220/0012372600003660⟩. ⟨hal-04494094⟩

    AMIArchitectures et modèles pour l'Interaction, STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Aurélie Bugeau, Anne-Laure Ligozat. Analysing ICT in prospective scenarios to help reveal undone computer science. Undone Computer Science conference, Feb 2024, Nantes (France), France. ⟨hal-04486589⟩

    STL

    Year of publication

  • Article dans une revue

    Julien Lefevre, Aurélie Bugeau, Jacques Combaz, Laurent Lefèvre, Anne-Laure Ligozat, et al.. Impacts environnementaux de l’IA : quels réels bénéfices ?. Collection numérique de l’AMUE, Agence de mutualisation des universités et établissements d’enseignement supérieur, 2023. ⟨hal-04486682⟩

    STL

    Year of publication

    Available in free access

  • Chapitre d'ouvrage

    Nicholas Asher, Pierre Zweigenbaum. Artificial Intelligence and Language. Pierre Marquis; Odile Papini; Henri Prade. A Guided Tour of Artificial Intelligence Research, III: Interfaces and Applications of Artificial Intelligence (chapter 4), Springer International Publishing, pp.117-145, 2020, 978-3-030-06169-2. ⟨10.1007/978-3-030-06170-8_4⟩. ⟨hal-04483086⟩

    ILES, STL

    Year of publication

  • Proceedings/Recueil des communications

    Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff. Proceedings of the 13th Workshop on Building and Using Comparable Corpora. LREC 2020, 2020, 979-10-95546-42-9. ⟨hal-04482188⟩

    ILES, ILES, STL

    Year of publication

  • Communication dans un congrès

    Rabab Alkhalifa, Iman Bilal, Hsuvas Borkakoty, Jose Camacho-Collados, Romain Deveaud, et al.. Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance. CLEF 2023: Experimental IR Meets Multilinguality, Multimodality, and Interaction, Sep 2023, Thessalonic, Greece. pp.440-458, ⟨10.1007/978-3-031-42448-9_28⟩. ⟨hal-04475726⟩

    ILES, STL

    Year of publication

  • Communication dans un congrès

    Fatima Hamlaoui, Emmanuel-Moselly Makasso, Markus Müller, Jonas Engelmann, Gilles Adda, et al.. BULBasaa: A Bilingual Bàsàá-French Speech Corpus for the Evaluation of Language Documentation Tools. LREC 2018, European Language Resources Association (ELRA), May 2018, Miyazaki, Japan. ⟨hal-04466108⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Yuming Zhai, Gabriel Illouz, Anne Vilnat. Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models. 28th International Conference on Computational Linguistics (COLING), Dec 2020, Barcelona (on line), Spain. pp.5944-5956, ⟨10.18653/v1/2020.coling-main.522⟩. ⟨hal-04468022⟩

    ILES, STL, STL

    Year of publication

    Available in free access

  • Article dans une revue

    Surya Roca, Sophie Rosset, José García, Álvaro Alesanco. A Study on the Impacts of Slot Types and Training Data on Joint Natural Language Understanding in a Spanish Medication Management Assistant Scenario. Sensors, 2022, 22 (6), pp.2364. ⟨10.3390/s22062364⟩. ⟨hal-04465686⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Laura Spinu, Ioana Vasilescu, Lori Lamel, Jason Lilley. Voicing neutralization in Romanian fricatives across different speech styles. Interspeech, ISCA, Sep 2022, Incheon, South Korea. pp.1342-1346, ⟨10.21437/interspeech.2022-10716⟩. ⟨hal-04465920⟩

    STL, TLP

    Year of publication

    Available in free access

  • Proceedings/Recueil des communications

    Christophe Servan, Anne Vilnat. Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN) : volume 5 : démonstrations. CORIA – TALN 2023, 2023. ⟨hal-04462998⟩

    STL

    Year of publication

    Available in free access