SEME

SEMantics and Information Extraction (SEME)

Coordination: Cyril GROUIN

The SEME (semantics and information extraction) team is interested in the problems of accessing the meaning contained in language productions, for the purposes of analysis, comprehension, modeling or production. We apply our research to the written modality, without restriction on the original medium (text produced in electronic format, or from a speech transcription, or from optical recognition) and work on productions in open or specialized domains such as the medical field. We use both linguistic and statistical or neural learning approaches. We are particularly interested in the latter type of approach, and in the environmental costs they generate in automatic language processing, both during production and during use.

Information extraction
Corpus and modeling
Sémantics, poly-lexical expressions

The team comprises 10 permanent members (CNRS researchers, lecturers at Université Paris-Saclay, ENSIIE, and Université Sorbonne Paris-Nord), 14 PhD students, and 3 post-docs or fixed-term contracts. We maintain links with industry (theses under CIFRE contracts, research projects) and regularly organize scientific events (TALN conference, scientific workshops, etc.).

Coordination

Sciences et Technologies des Langues

Direction, SEME

Grouin Cyril

Research Engineer

Deputy Director (DUA)

Members

Sciences et Technologies des Langues

SEME

Aguiar Mathilde

PhD Student

Email

Personal page
Sciences et Technologies des Langues

SEME

Arakkal Remesh Binesh
Sciences et Technologies des Langues

SEME

Arslan Boğaçhan
Sciences et Technologies des Langues

SEME

Beurtheret Eloi
Sciences et Technologies des Langues

SEME

Bezançon Julien

Email
Sciences et Technologies des Langues

SEME

Boudour Sonia
Sciences et Technologies des Langues

SEME

Cellard Loup
Sciences et Technologies des Langues

SEME

Ducel Fanny

PhD Student

Personal page
Sciences et Technologies des Langues

SEME

Estève Louis

PhD Student
Sciences et Technologies des Langues

SEME

Estienne Lautaro
Sciences et Technologies des Langues

SEME

Feillet Eva

Associate Professor

Deep learning

Continual learning

Personal page
Sciences et Technologies des Langues

SEME

Ficher Marion

PhD Student
Sciences et Technologies des Langues

SEME

Gerald Thomas

Researcher
Sciences et Technologies des Langues

Direction, SEME

Grouin Cyril

Research Engineer

Deputy Director (DUA)
Sciences et Technologies des Langues

SEME

Hammal Ayoub

PhD Student
Sciences et Technologies des Langues

SEME

Hamon Thierry

Teacher-Researcher

Email
Sciences et Technologies des Langues

SEME

Hanczyk Baptiste
Sciences et Technologies des Langues

SEME

Hornero Merino Marina
Sciences et Technologies des Langues

SEME

Illouz Gabriel

Teacher-Researcher
Sciences et Technologies des Langues

SEME

Kaur Sukhjot
Sciences et Technologies des Langues

SEME

Kebdi Lounès
Sciences et Technologies des Langues

SEME

Kermadj Zineddine
Sciences et Technologies des Langues

SEME

Li Xiaomeng
Sciences et Technologies des Langues

SEME

Ligozat Anne-Laure

Professor

0169158152

Personal page
Sciences et Technologies des Langues

SEME

Longuépée Lubin
Sciences et Technologies des Langues

SEME

Morand Clément

PhD Student

Email
Sciences et Technologies des Langues

SEME

Naderi Nona

Associate Professor

Personal page
Sciences et Technologies des Langues

SEME

Nakamura Takuya
Sciences et Technologies des Langues

SEME

Névéol Aurélie

Researcher

Email

Personal page
Sciences et Technologies des Langues

SEME

Ouatmani Amine

Email

0766466886
Sciences et Technologies des Langues

SEME

Paroubek Patrick

Research Engineer

Natural Language Processing Expert

0169158004

Email

Personal page
Sciences et Technologies des Langues

SEME

Pinganaud Solenn
Sciences et Technologies des Langues

SEME

Rachmat Benedictus Kent

Machine Learning

NLP

AIArtificial Intelligence

Email

Personal page
Sciences et Technologies des Langues

SEME

Rauhut Marta
Sciences et Technologies des Langues

SEME

Sagorin Clarent
Sciences et Technologies des Langues

SEME

Sauvage Eve

PhD Student
Sciences et Technologies des Langues

SEME

Savary Agata

Professor

0169158003

Personal page
Sciences et Technologies des Langues

SEME

Sebe Clémence

PhD Student
Sciences et Technologies des Langues

SEME

Thiant Floris
Sciences et Technologies des Langues

SEME

Vallet Sam

Publications

Pré-publication, Document de travail

Alexandre Genadot, Nicolas Guilliot, Philippe Boula de Mareüil. Introduction to the book “Cartographier les Langues de Nouvelle-Aquitaine: entre Grammaire et Société”. 2026. ⟨hal-05662837⟩

STL

Year of publication 2026

HAL publication
Communication dans un congrès

Agata Savary, Manon Scholivet, Carlos Ramisch, Takuya Nakamura, Eric Bilinski, et al.. PARSEME 2.0 Multilingual Corpus of Multiword Expressions. LREC 2026 – 15th biennial Language Resources and Evaluation Conference, ELRA Language Resources Association, May 2026, Palma De MaJorque, Spain. ⟨10.63317/2iy5qf38yhay⟩. ⟨hal-05661505⟩

ILES, STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Julie Halbout, Annelies Braffort, Michèle Gouiffès, Diandra Fabre, Julie Lascar. Learning to Spot Signs from Named Entities. A study on French Sign Language. LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion, May 2026, Palma de Majorque, Spain. ⟨hal-05636077⟩

AMIArchitectures et modèles pour l'Interaction, STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Damien Lacroux, Aurélie Bugeau, Anne-Laure Ligozat. The indirect rebound effects of AIArtificial Intelligence as undone science: philosophical reflection on two structural causes. Undone Computer Science, Mar 2026, Luxembourg, Luxembourg. ⟨hal-05624399⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Benedictus Kent Rachmat, Thomas Gerald, Zheng Zhang, Cyril Grouin. Les données de calibration comptent-elles vraiment pour LoRA?. EvalLLM2026 : Atelier sur l’évaluation des modèles génératifs (LLM), le RAG et challenges, Jul 2026, Nantes (France), France. ⟨hal-05633638⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials. The Fifteenth Language Resources and Evaluation Conference (LREC 2026), May 2026, Palma, France. pp.5290-5300, ⟨10.63317/359toazp33g8⟩. ⟨hal-05652719⟩

STL

Year of publication 2026

HAL publication
Communication dans un congrès

Jenny Copara, Nona Naderi, Gilles Falquet, Douglas Teodoro. MeSH Concept Relevance and Knowledge Evolution: A Data-Driven Perspective. 12th International Conference on Information Management and Big Data. Communications in Computer and Information Science, Oct 2025, Lima (Pérou), Peru. pp.280-299, ⟨10.1007/978-3-032-20322-9_20⟩. ⟨hal-05625658⟩

STL

Year of publication 2025

Available in free access

HAL publication
Communication dans un congrès

Clément Morand, Aina Rasoldier, Paul Gay. Not up to its critical perspective on digitalization: A Descriptive Analysis of How Sustainability is Approached in the ICT4S Conference. ICT4S, Jun 2026, Berne, France. ⟨hal-05615744⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Fanny Ducel, Lucie Digoin-Caparros, Ibrahim Al Kotob, Shayan Ahmed Shariff, Binesh Arakkal Remesh, et al.. Les benchmarks sont une source de biais des LLM : MMLU, CommonSenseQA et MGSM au microscope. TALN 2026 – 33e Conférence sur le Traitement Automatique des Langues Naturelles, Jun 2026, Nantes, France. ⟨hal-05618509⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Louis Estève, Christophe Servan, Thomas Lavergne, Agata Savary. A Diversity Diet for a Healthier Model: A Case Study of French ModernBERT. 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Jul 2026, San Diego, United States. ⟨hal-05599374⟩

STL

Year of publication 2026

Available in free access

HAL publication
Thèse

Virgile Barthet. Extraction d’information et classification de textes cliniques pour la prédiction du risque de décès. Intelligence artificielle [cs.AIArtificial Intelligence]. Université Paris-Saclay, 2026. Français. ⟨NNT : 2026UPASG019⟩. ⟨tel-05599487⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Luc Pommeret, Thomas Gerald, Christophe Servan, Sahar Ghannay, Patrick Paroubek, et al.. Étude des propositionneurs multilingues : formalisation, évaluation et interprétabilité. CORIA-TALN, ARIA; ATALA, Jun 2026, Nantes, France. ⟨hal-05597666⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Mathilde Deletombe, Manon Scholivet, Louis Estève, Thomas Lavergne, Agata Savary. Diversity patterns run deep: Impact of diversity intake on multiword expression identification. 22nd Workshop on Multiword Expressions (MWE 2026), Mar 2026, Rabat, Morocco. pp.110-116, ⟨10.18653/v1/2026.mwe-1.13⟩. ⟨hal-05588681⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Manon Scholivet, Agata Savary, Carlos Ramisch, Eric Bilinski, Takuya Nakamura, et al.. Edition 2.0 of the PARSEME shared task on multilingual identification and paraphrasing of multiword expressions. Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), Mar 2026, Rabat, Morocco. pp.254-275, ⟨10.18653/v1/2026.mwe-1.33⟩. ⟨hal-05588684⟩

ILES, STL

Year of publication 2026

Available in free access

HAL publication
Pré-publication, Document de travail

Eva Feillet, Ryan Whetten, David Picard, Alexandre Allauzen. POLYNOMIAL MIXING FOR EFFICIENT SELF-SUPERVISED SPEECH ENCODERS. 2026. ⟨hal-05589762⟩

STL

Year of publication 2026

Available in free access

HAL publication
Chapitre d'ouvrage

Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Emmanuel Morin, Jean-Luc Gauvain. Neural Probabilistic Language Models. Innovations in Machine Learning: Theory and Applications, 194, pp.137-186, 2005, ⟨10.1007/3-540-33486-6_6⟩. ⟨hal-01434258⟩

STL, STL, TLP

Year of publication 2005

HAL publication
Communication dans un congrès

Jean-Luc Gauvain, Abdel Messaoudi, Holger Schwenk. Language Recognition Using Phone Lattices. International Conference on Speech and Language Processing, Oct 2004, Jeju, South Korea. pp.1283–1286. ⟨hal-01434492⟩

STL, STL, TLP

Year of publication 2004

HAL publication
Communication dans un congrès

Luc Pommeret, Thomas Gerald, Sophie Rosset, Patrick Paroubek, Christophe Servan, et al.. Les propositions atomiques : un pont entre approches neuronales et symboliques. Journée interprétabilité, GDR TALTraitement Automatique des langues, Mar 2026, Jussieu, Paris, France. ⟨hal-05575718⟩

STL

Year of publication 2026

HAL publication
Communication dans un congrès

Luc Pommeret, Thomas Gerald, Patrick Paroubek, Sahar Ghannay, Christophe Servan, et al.. LLM-based Atomic Propositions Help Weak Extractors: Evaluation of a Propositioner for Triplet Extraction. KG-LLM@LREC – Knowledge Graphs and Large Language Models, ELRA, May 2026, Palma De Majorque, Spain. ⟨hal-05572941⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Luc Pommeret, Thibault Wagret, Jules Deret. THIVLVC: Retrieval Augmented Dependency Parsing for Latin. EvaLatin (LT4HALA@LREC), ELRA, May 2026, Palma De Majorque, Spain. ⟨hal-05572961v2⟩

STL

Year of publication 2026

Available in free access

HAL publication
Article dans une revue

Jean-Luc Gauvain, Gilles Adda, Lori Lamel, Fabrice Lefèvre, Holger Schwenk. Transcription de la parole conversationnelle. Revue TALTraitement Automatique des langues : traitement automatique des langues, 2005, 45 (3). ⟨hal-01434260⟩

STL, TLP

Year of publication 2005

HAL publication
Communication dans un congrès

Jean-Luc Gauvain, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Veronique Gendner, et al.. Where are we in transcribing French broadcast news?. Eurospeech, Sep 2005, Lisbonne, Portugal. pp.1665-1668, ⟨10.21437/Interspeech.2005-544⟩. ⟨hal-01434245⟩

STL, STL, TLP

Year of publication 2005

HAL publication
Communication dans un congrès

Lori Lamel, Jean-Luc Gauvain, Gilles Adda, Claude Barras, Eric Bilinski, et al.. The LIMSILaboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, créé en 1972 et dont les équipes ont rejoint celles du LRI en 2021 pour fonder le LISN. 2006 Tc-Star Transcription Systems. Tc-Star Speech to Speech Translation Workshop, Jun 2006, Barcelone, Spain. pp.123-128. ⟨hal-01434203⟩

STL, STL, TLP

Year of publication 2006

HAL publication
Communication dans un congrès

Hélène Bonneau-Maynard, Alexandre Allauzen, Daniel Déchelotte, Holger Schwenk. Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation. HLT/NACL workshop on Syntax and Structure in Statistical Translation, Apr 2007, Rochester, United States. pp.65-71. ⟨hal-01434104⟩

STL, STL, TLP

Year of publication 2007

HAL publication
Communication dans un congrès

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte M. Alves, André F T Martins, Ayoub Hammal, et al.. EuroBERT: Scaling Multilingual Encoders for European Languages. COLM 2025 – Second Conference on Language Modeling, Oct 2025, Montreal, Canada. pp.1-28. ⟨hal-05226285⟩

STL

Year of publication 2025

Available in free access

HAL publication
Thèse

Pierre Lepagnol. Petits modèles génératifs en contexte industriel : Adaptation par prompting avec peu de données. Intelligence artificielle [cs.AIArtificial Intelligence]. Université Paris-Saclay, 2026. Français. ⟨NNT : 2026UPASG011⟩. ⟨tel-05572429⟩

STL

Year of publication 2026

Available in free access

HAL publication
Rapport, Rapport

Karin Dassas, Cyrille Bonamy, Bruno Bzeznik, Emmanuelle Frenoux, Gaël Guennebaud, et al.. Estimer l’impact carbone des activités numériques d’une unité de recherche. CNRS (EcoInfo). 2026. ⟨hal-05568070⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Ayoub Hammal, Pierre Zweigenbaum, Caio Corro. KAD: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral. EACL 2026 – 19th Conference of the European Chapter of the Association for Computational Linguistics, Mar 2026, Rabat, Morocco. pp.3854-3872, ⟨10.18653/v1/2026.eacl-long.179⟩. ⟨hal-05571208⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Jules Françoise, Julie Lascar, Cyril Verrecchia, Sidonie Minodier, Michèle Gouiffès, et al.. LaboSignes: an Interactive French Sign Language Recognition Interface. ACM CHI’26, Apr 2026, Barcelona, Spain. ⟨10.1145/3772363.3799328⟩. ⟨hal-05564455⟩

AMIArchitectures et modèles pour l'Interaction, ASARD, STL

Year of publication 2026

Available in free access

HAL publication
Pré-publication, Document de travail

Clément Morand, Jacques Combaz, Aurélie Névéol, Anne-Laure Ligozat. When rebound effect is not a side effect: analyzing sociotechnical contexts of digital technologies. 2026. ⟨hal-05566029⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Julie Lascar, Jules Françoise, Michèle Gouiffès, Annelies Braffort, Diandra Fabre. PoET: Lightweight Pose Encoder Transformer for Online Sign Language Recognition. 21st International Conference on Computer Vision Theory and Applications, Mar 2026, Marbella, Spain. pp.19-28, ⟨10.5220/0014237500004084⟩. ⟨hal-05564749⟩

AMIArchitectures et modèles pour l'Interaction, STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Baptiste Pras, Nona Naderi. Fine-Grained Mention-Level Analysis of Biomedical Entity Linking Models. Medical Informatics Europe 2026, EFMI, May 2026, Gênes (Italie), Italy. pp.999-1003, ⟨10.3233/SHTI260329⟩. ⟨hal-05544092⟩

STL

Year of publication 2026

Available in free access

HAL publication
Notice d’encyclopédie ou de dictionnaire

Albert Rilliard. Fala, emoções e atitudes. Speech Sciences Entries, 2024, https://gepf.falar.org/entries/66. ⟨hal-05474723⟩

STL

Year of publication 2024

Available in free access

HAL publication
Article dans une revue

Natalia Grabar, Cyril Grouin. Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing. IMIA Yearbook of Medical Informatics, 2022, 31 (01), pp.254-260. ⟨10.1055/s-0042-1742547⟩. ⟨hal-03931852⟩

ILES, STL

Year of publication 2022

Available in free access

HAL publication
Communication dans un congrès

Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset. Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER. The Fifteenth biennial Language Resources and Evaluation Conference (LREC 2026), May 2026, Palma de Majorque, Spain. ⟨10.63317/3osjjdr778fh⟩. ⟨hal-05546569⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Clémentine Bleuze, Fanny Ducel, Maxime Amblard, Karën Fort. COCOA: Creation and Exploratory Investigation of a Corpus of Claims from NLP Articles. LREC 2026 – International Conference on Language Resources and Evaluation, ELRA Language Resources Association, May 2026, Palma de Mallorca, Spain. ⟨10.63317/38hiuxwcq4bc⟩. ⟨hal-05547842⟩

STL

Year of publication 2026

Available in free access

HAL publication
Pré-publication, Document de travail

Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials. 2026. ⟨hal-05533706v2⟩

STL

Year of publication 2026

Available in free access

HAL publication
Article dans une revue

Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset, Khaled Zaouk, et al.. Diart: A Python Library for Real-Time Speaker Diarization. Journal of Open Source Software, 2024, 9 (99), pp.5266. ⟨10.21105/joss.05266⟩. ⟨hal-05530961⟩

STL

Year of publication 2024

Available in free access

HAL publication
Communication dans un congrès

Clémentine Bleuze, Karën Fort, Vincent P. Martin, Aurélie Névéol. Grands modèles de langue pour la détection de pathologies psychiatriques : promesses, réalité, et enjeux. Journée d’étude “LLM@hopital”, ATALA, Mar 2026, Paris, France. ⟨hal-05532823⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Iskandar Boucharenc. Hierarchical Prefixes for Long Document Representations. ECIR – European Conference on Information Retrieval, Apr 2025, Lucca, Italy. pp.171-177, ⟨10.1007/978-3-031-88720-8_28⟩. ⟨hal-05530637⟩

STL

Year of publication 2025

HAL publication
Communication dans un congrès

Fanny Ducel, Aurélie Névéol, Vidit Khazanchi, Loïc Leclere, Arthur Pedrini, et al.. Code-switching as a Bias Indicator in LLMs: “The consequences are not the same para nosotros”. LREC 2026 – 15th biennial Language Resources and Evaluation Conference, May 2026, Palma De Mallorca, Spain. ⟨10.63317/2mq6kqjk9bng⟩. ⟨hal-05529786⟩

STL

Year of publication 2026

Available in free access

HAL publication
Communication dans un congrès

Oralie Cattan, Christophe Servan, Sophie Rosset. On the Usability of Transformers-based models for a French Question-Answering task. Joint Conference of the Information Retrieval Communities in Europe (CIRCLE) 2022, Jul 2022, Samatan, France. ⟨hal-03701740⟩

ILES, STL

Year of publication 2022

Available in free access

HAL publication
Communication dans un congrès

Léa Pacini, Jérôme Dupire, Isabelle Barbet, Olivier Pons, Camille Guinaudeau, et al.. Textbook’s accessibility for children with dyspraxia and visual disability. 17th International Conference of the Association for the Advancement of Assistive Technology in Europe, AAATE 2023, Association for the Advancement of Assistive Technology in Europe, Aug 2023, Paris, France. ⟨hal-04410340⟩

STL

Year of publication 2023

Available in free access

HAL publication
Communication dans un congrès

Fanny Ducel. How to define, understand and evaluate stereotypical biases in language models?. IASIV – Séminaire du groupe de travail Intelligence Artificielle Sûre, Intelligible et Vérifiable, Mar 2025, Palaiseau, France. ⟨hal-05467784⟩

STL

Year of publication 2025

Available in free access

HAL publication
Thèse

Gustave Cortal. Natural language processing for subjectivity analysis in personal narratives. Computation and Language [cs.CL]. Université Paris-Saclay, 2026. English. ⟨NNT : 2026UPASG003⟩. ⟨tel-05501345⟩

STL

Year of publication 2026

Available in free access

HAL publication
Poster de conférence

Julie Halbout, Annelies Braffort, Michèle Gouiffès. Annotation automatique d’un corpus de Langue des Signes Française. RJCP – Rencontres Jeunes Chercheurs en Parole, Nov 2025, Paris, France. ⟨hal-05495878⟩

STL

Year of publication 2025

HAL publication
Poster de conférence

Annelies Braffort, Michael Filhol, Michèle Gouiffès, Julie Halbout, Julie Lascar. Sign Language Processing with Linguistic Structure. BMVA Symposium on AIArtificial Intelligence for Sign Language Translation, Production, and Linguistics, Dec 2025, London, United Kingdom. ⟨hal-05495664⟩

STL

Year of publication 2025

HAL publication
Communication dans un congrès

Jules Françoise, Julie Lascar, Cyril Verrecchia, Sidonie Minodier, Michèle Gouiffès, et al.. LaboSignes : vers une IAIntelligence Artificielle participative pour la reconnaissance automatique de la Langue des Signes Française. Journée d’études AFIA-ATALA : Technologies linguistiques pour les langues peu dotées, Dec 2025, Paris, France. ⟨hal-05495906⟩

AMIArchitectures et modèles pour l'Interaction, STL

Year of publication 2025

HAL publication
Article dans une revue

S. Rosset, D. Tribout, L. Lamel. Multi-level Information and Automatic dialog Act Detection in Human-Human Spoken Dialogs. Speech Communication, 2008, 50 (1), pp.1-13. ⟨10.1016/j.specom.2007.05.007⟩. ⟨hal-00499189⟩

STL, TLP, TLP

Year of publication 2008

Available in free access

HAL publication
Communication dans un congrès

Idrissa Mahamoudou Dicko, Nona Naderi. Biomedical hallucination detection of LLMs using Med-HALT and HaloScope frameworks. JDSE 2025 – 10th Junior Conference on Data Sciences and Engineering, Sep 2025, Gif-sur-Yvette, France. ⟨hal-05483690⟩

STL

Year of publication 2025

Available in free access

HAL publication

All Publications

Coordination

Members

Publications

Alexandre Genadot, Nicolas Guilliot, Philippe Boula de Mareüil. Introduction to the book “Cartographier les Langues de Nouvelle-Aquitaine: entre Grammaire et Société”. 2026. ⟨hal-05662837⟩

Damien Lacroux, Aurélie Bugeau, Anne-Laure Ligozat. The indirect rebound effects of AIArtificial Intelligence as undone science: philosophical reflection on two structural causes. Undone Computer Science, Mar 2026, Luxembourg, Luxembourg. ⟨hal-05624399⟩

Benedictus Kent Rachmat, Thomas Gerald, Zheng Zhang, Cyril Grouin. Les données de calibration comptent-elles vraiment pour LoRA?. EvalLLM2026 : Atelier sur l’évaluation des modèles génératifs (LLM), le RAG et challenges, Jul 2026, Nantes (France), France. ⟨hal-05633638⟩

Clément Morand, Aina Rasoldier, Paul Gay. Not up to its critical perspective on digitalization: A Descriptive Analysis of How Sustainability is Approached in the ICT4S Conference. ICT4S, Jun 2026, Berne, France. ⟨hal-05615744⟩

Louis Estève, Christophe Servan, Thomas Lavergne, Agata Savary. A Diversity Diet for a Healthier Model: A Case Study of French ModernBERT. 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Jul 2026, San Diego, United States. ⟨hal-05599374⟩

Virgile Barthet. Extraction d’information et classification de textes cliniques pour la prédiction du risque de décès. Intelligence artificielle [cs.AIArtificial Intelligence]. Université Paris-Saclay, 2026. Français. ⟨NNT : 2026UPASG019⟩. ⟨tel-05599487⟩

Luc Pommeret, Thomas Gerald, Christophe Servan, Sahar Ghannay, Patrick Paroubek, et al.. Étude des propositionneurs multilingues : formalisation, évaluation et interprétabilité. CORIA-TALN, ARIA; ATALA, Jun 2026, Nantes, France. ⟨hal-05597666⟩

Eva Feillet, Ryan Whetten, David Picard, Alexandre Allauzen. POLYNOMIAL MIXING FOR EFFICIENT SELF-SUPERVISED SPEECH ENCODERS. 2026. ⟨hal-05589762⟩

Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Emmanuel Morin, Jean-Luc Gauvain. Neural Probabilistic Language Models. Innovations in Machine Learning: Theory and Applications, 194, pp.137-186, 2005, ⟨10.1007/3-540-33486-6_6⟩. ⟨hal-01434258⟩

Jean-Luc Gauvain, Abdel Messaoudi, Holger Schwenk. Language Recognition Using Phone Lattices. International Conference on Speech and Language Processing, Oct 2004, Jeju, South Korea. pp.1283–1286. ⟨hal-01434492⟩

Luc Pommeret, Thomas Gerald, Sophie Rosset, Patrick Paroubek, Christophe Servan, et al.. Les propositions atomiques : un pont entre approches neuronales et symboliques. Journée interprétabilité, GDR TALTraitement Automatique des langues, Mar 2026, Jussieu, Paris, France. ⟨hal-05575718⟩

Luc Pommeret, Thibault Wagret, Jules Deret. THIVLVC: Retrieval Augmented Dependency Parsing for Latin. EvaLatin (LT4HALA@LREC), ELRA, May 2026, Palma De Majorque, Spain. ⟨hal-05572961v2⟩

Jean-Luc Gauvain, Gilles Adda, Lori Lamel, Fabrice Lefèvre, Holger Schwenk. Transcription de la parole conversationnelle. Revue TALTraitement Automatique des langues : traitement automatique des langues, 2005, 45 (3). ⟨hal-01434260⟩

Jean-Luc Gauvain, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Veronique Gendner, et al.. Where are we in transcribing French broadcast news?. Eurospeech, Sep 2005, Lisbonne, Portugal. pp.1665-1668, ⟨10.21437/Interspeech.2005-544⟩. ⟨hal-01434245⟩

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte M. Alves, André F T Martins, Ayoub Hammal, et al.. EuroBERT: Scaling Multilingual Encoders for European Languages. COLM 2025 – Second Conference on Language Modeling, Oct 2025, Montreal, Canada. pp.1-28. ⟨hal-05226285⟩

Pierre Lepagnol. Petits modèles génératifs en contexte industriel : Adaptation par prompting avec peu de données. Intelligence artificielle [cs.AIArtificial Intelligence]. Université Paris-Saclay, 2026. Français. ⟨NNT : 2026UPASG011⟩. ⟨tel-05572429⟩

Karin Dassas, Cyrille Bonamy, Bruno Bzeznik, Emmanuelle Frenoux, Gaël Guennebaud, et al.. Estimer l’impact carbone des activités numériques d’une unité de recherche. CNRS (EcoInfo). 2026. ⟨hal-05568070⟩

Jules Françoise, Julie Lascar, Cyril Verrecchia, Sidonie Minodier, Michèle Gouiffès, et al.. LaboSignes: an Interactive French Sign Language Recognition Interface. ACM CHI’26, Apr 2026, Barcelona, Spain. ⟨10.1145/3772363.3799328⟩. ⟨hal-05564455⟩

Clément Morand, Jacques Combaz, Aurélie Névéol, Anne-Laure Ligozat. When rebound effect is not a side effect: analyzing sociotechnical contexts of digital technologies. 2026. ⟨hal-05566029⟩

Baptiste Pras, Nona Naderi. Fine-Grained Mention-Level Analysis of Biomedical Entity Linking Models. Medical Informatics Europe 2026, EFMI, May 2026, Gênes (Italie), Italy. pp.999-1003, ⟨10.3233/SHTI260329⟩. ⟨hal-05544092⟩

Albert Rilliard. Fala, emoções e atitudes. Speech Sciences Entries, 2024, https://gepf.falar.org/entries/66. ⟨hal-05474723⟩

Natalia Grabar, Cyril Grouin. Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing. IMIA Yearbook of Medical Informatics, 2022, 31 (01), pp.254-260. ⟨10.1055/s-0042-1742547⟩. ⟨hal-03931852⟩

Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials. 2026. ⟨hal-05533706v2⟩

Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset, Khaled Zaouk, et al.. Diart: A Python Library for Real-Time Speaker Diarization. Journal of Open Source Software, 2024, 9 (99), pp.5266. ⟨10.21105/joss.05266⟩. ⟨hal-05530961⟩

Clémentine Bleuze, Karën Fort, Vincent P. Martin, Aurélie Névéol. Grands modèles de langue pour la détection de pathologies psychiatriques : promesses, réalité, et enjeux. Journée d’étude “LLM@hopital”, ATALA, Mar 2026, Paris, France. ⟨hal-05532823⟩

Iskandar Boucharenc. Hierarchical Prefixes for Long Document Representations. ECIR – European Conference on Information Retrieval, Apr 2025, Lucca, Italy. pp.171-177, ⟨10.1007/978-3-031-88720-8_28⟩. ⟨hal-05530637⟩

Oralie Cattan, Christophe Servan, Sophie Rosset. On the Usability of Transformers-based models for a French Question-Answering task. Joint Conference of the Information Retrieval Communities in Europe (CIRCLE) 2022, Jul 2022, Samatan, France. ⟨hal-03701740⟩

Fanny Ducel. How to define, understand and evaluate stereotypical biases in language models?. IASIV – Séminaire du groupe de travail Intelligence Artificielle Sûre, Intelligible et Vérifiable, Mar 2025, Palaiseau, France. ⟨hal-05467784⟩

Gustave Cortal. Natural language processing for subjectivity analysis in personal narratives. Computation and Language [cs.CL]. Université Paris-Saclay, 2026. English. ⟨NNT : 2026UPASG003⟩. ⟨tel-05501345⟩

Julie Halbout, Annelies Braffort, Michèle Gouiffès. Annotation automatique d’un corpus de Langue des Signes Française. RJCP – Rencontres Jeunes Chercheurs en Parole, Nov 2025, Paris, France. ⟨hal-05495878⟩

Annelies Braffort, Michael Filhol, Michèle Gouiffès, Julie Halbout, Julie Lascar. Sign Language Processing with Linguistic Structure. BMVA Symposium on AIArtificial Intelligence for Sign Language Translation, Production, and Linguistics, Dec 2025, London, United Kingdom. ⟨hal-05495664⟩

S. Rosset, D. Tribout, L. Lamel. Multi-level Information and Automatic dialog Act Detection in Human-Human Spoken Dialogs. Speech Communication, 2008, 50 (1), pp.1-13. ⟨10.1016/j.specom.2007.05.007⟩. ⟨hal-00499189⟩

Idrissa Mahamoudou Dicko, Nona Naderi. Biomedical hallucination detection of LLMs using Med-HALT and HaloScope frameworks. JDSE 2025 – 10th Junior Conference on Data Sciences and Engineering, Sep 2025, Gif-sur-Yvette, France. ⟨hal-05483690⟩