The Department of Language Sciences and Technologies studies fundamental questions relating to linguistic systems by exploiting large corpora collected, annotated and enriched in an unsupervised or semi-supervised way by statistical learning models adapted to the linguistic material.
These models make it possible to study how languages function, their variations (phonetic-phonological, morphological-lexical, syntactic and semantic), both synchronic and diachronic, diaphasic and diatopic, and to raise questions about their acquisition as mother tongues or second languages. Finally, the department is developing major applications in language processing: speech recognition, automatic translation, information retrieval, conversational agents, etc. … which are increasingly important for society (safeguarding endangered languages, providing tools for people with disabilities, helping to process information and medical knowledge) and for ethics.
This approach to language and languages covers a broad spectrum, from the most fundamental to the most applied research, in a wide variety of media (newspapers, social media, video, telephone, . . .) and all modalities (written, spoken and signed).
This research is highly multidisciplinary, bringing together diverse communities from the fields of computer science, engineering and the humanities.
Lautaro Estienne, Gabriel Ben Zenou, Nona Naderi, Jackie Cheung, Pablo Piantanida. Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog. In proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Nov 2025, Suzhou, China. ⟨10.48550/arXiv.2507.14063⟩. ⟨hal-05347472⟩
Marco Naguib, Xavier Tannier, Aurélie Névéol. Few-shot clinical entity recognition in English, French and Spanish: masked language models outperform generative model prompting. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Nov 2024, Miami, United States. pp.6829-6852, ⟨10.18653/v1/2024.findings-emnlp.400⟩. ⟨hal-05331970⟩
Julie Halbout, Diandra Fabre. Corpus bilingue sous-titrage et Langue des Signes Française : la problématique de l’alignement automatique des données. 20e Conférence en Recherche d’Information et Applications (CORIA) 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN) 27ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL) Les 18e Rencontres Jeunes Chercheurs en RI (RJCRI), 2025, Marseille, France. pp.91-103. ⟨hal-05330660⟩
Christophe Servan, Cyril Grouin, Aurélie Névéol, Pierre Zweigenbaum. Comment évaluer un grand modèle de langue dans le domaine médical en français ?. 20e Conférence en Recherche d’Information et Applications (CORIA) 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN) 27ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL) Les 18e Rencontres Jeunes Chercheurs en RI (RJCRI), 2025, Marseille, France. pp.51-67. ⟨hal-05329783⟩
Omar Adjali, Olivier Ferret, Sahar Ghannay, Hervé Le Borgne. Génération augmentée de récupération multi-niveau pour répondre à des questions visuelles. 20e Conférence en Recherche d’Information et Applications (CORIA) 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN) 27ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL) Les 18e Rencontres Jeunes Chercheurs en RI (RJCRI), 2025, Marseille, France. pp.128-130. ⟨hal-05330645⟩
Eve Sauvage. SynKGP: Knowledge Graph Population with Syntactic-LLM Hybridation for Question-Answering. ECIR, Apr 2025, Lucca, Italy. pp.212-219, ⟨10.1007/978-3-031-88720-8_34⟩. ⟨hal-05344073⟩
Anne-Laure Ligozat. Côté obscur de l’IA : quels bénéfices réels de l’IA pour faire face aux crises environnementales ?. GreenDays 2023, Mar 2023, Lyon, France. ⟨hal-05317071⟩
Armand Stricker, Patrick Paroubek. Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA; ICCL, May 2024, Torino, Italy. pp.3203–3214. ⟨hal-05242362⟩
Fanny Ducel, Jeffrey André, Aurélie Névéol, Karën Fort. Introducing MascuLead: the First Gender Bias Leaderboard. EALM 2025 – Ethic and Alignment of (Large) Language Models, Jun 2025, Marseille, France. pp.12-19. ⟨hal-05282981⟩
Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. « Les femmes ne font pas de crise cardiaque ! » Étude des biais de genre dans les cas cliniques synthétiques en français. 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2025), Jul 2025, Marseille, France. pp.1. ⟨hal-05282965⟩
Clémentine Bleuze, Fanny Ducel, Maxime Amblard, Karën Fort. « De nos jours, ce sont les résultats qui comptent » : création et étude diachronique d’un corpus de revendications issues d’articles de TALTraitement Automatique des langues. TALN 2025 – 32ème Conférence sur le Traitement Automatique des Langues Naturelles, Jul 2025, Marseille, France. ⟨hal-05282966⟩
Yajing Feng. Continuous Recognition of Client Emotions from Speech and Text in Real-World Call Center Conversations : a Context-Aware Dataset and Empirical Study. Artificial Intelligence [cs.AI]. Université Paris-Saclay, 2025. English. ⟨NNT : 2025UPASG042⟩. ⟨tel-05241382⟩
Alexander Goldberg, Ihsan Ullah, Thanh Gia Hieu Khuong, Benedictus Kent Rachmat, Zhen Xu, et al.. Usefulness of LLMs as an Author Checklist Assistant for Scientific Papers: NeurIPS’24 Experiment. 2025. ⟨hal-05230379⟩
Floris Thiant, Olivia Penas, Yann Leroy, Anne-Laure Ligozat. System analysis of digital service system perimeter and its interdependencies in Life Cycle Assessment. 2025 IEEE International Symposium on Systems Engineering (ISSE 2025), Oct 2025, Palaiseau, France. ⟨hal-05240543⟩
Thomas Gerald, Louis Tamames, Sofiane Ettayeb, Ha-Quang Le, Patrick Paroubek, et al.. CQuAE: A new Contextualized QUestion Answering corpus on Education domain. Data and Knowledge Engineering, 2024, 151, pp.102305. ⟨10.1016/j.datak.2024.102305⟩. ⟨hal-05242257⟩
Tommaso Raso, Saulo Mendes Santos, Albert Rilliard, João A. Moraes. Defining and Identifying Discourse Markers in Spontaneous Speech. Miguel Oliveira, Jr. Prosodic Interfaces – Interdisciplinary Perspectives on Sound Patterns and Human Interaction, De Gruyter, pp.65-102, 2025, 978-3-11-105990-7. ⟨10.1515/9783111060309-003⟩. ⟨hal-05230528⟩
Clémence Sebe, Sarah Cohen-Boulakia, Olivier Ferret, Aurélie Névéol. Extracting Information in a Low-resource Setting: Case Study on Bioinformatics Workflows. Symposium on Intelligent Data Analysis (IDA 2025), May 2025, Konstanz, Germany. pp.274-287, ⟨10.1007/978-3-031-91398-3_21⟩. ⟨hal-05244222⟩
Philippe Boula de Mareüil, Paolo Roseano. A speaking atlas of the languages of the Iberian Peninsula: focus on rhythm and varieties in contact. Dialectologia, 2025, 35, pp.27-54. ⟨10.1344/dialectologia.35.2⟩. ⟨hal-05263043⟩
Gaël Guennebaud, Anne-Laure Ligozat, Anne-Cécile Orgerie, Matthieu Simonin. Evaluating and Reporting the Carbon Footprint of Shared Computing Platforms: Choices and Limits. ISPDC 2025 – 24th IEEE International Symposium on Parallel and Distributed Computing, Jul 2025, Rennes, France. pp.1-7. ⟨hal-05195576⟩
Haohua Dong, Ana Manzano Rodríguez, Camille Guinaudeau, Shin’Ichi Satoh. Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification. Second workshop on Fairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing (FAILED) at the 2025 International Conference on Computer Vision, Oct 2025, Honolulu, HI, United States. ⟨hal-05210445⟩
Nicolas Hiebel. Création éthique de données textuelles artificielles : application au domaine biomédical. Traitement du texte et du document. Université Paris-Saclay, 2025. Français. ⟨NNT : 2025UPASG033⟩. ⟨tel-05185326⟩
Philippe Boula de Mareüil, Alexis Pierrard, Albert Rilliard. Acoustic study of /r/ front fricatives in Bolivian Highland Spanish. Estudios de Fonética Experimental , 2025, 34, pp.41 – 56. ⟨10.1344/efe-2025-34-41-56⟩. ⟨hal-05157171⟩
Ana Manzano Rodríguez, Camille Guinaudeau, Shin Ichi Satoh. Uncovering Gender Biases in Gender Identification Models for Japanese Data Analysis. Workshop on Demographic Diversity in Computer Vision @ CVPR 2025, Jun 2025, Nashville (Tennessee), United States. ⟨hal-05154054⟩
Philippe Boula de Mareüil, Marc Evrard, Alexandre François, Antonio Romano. Computer modelling of innovations relative to Latin in contemporary Romance dialects. Isogloss. Open Journal of Romance Linguistics, 2025, 11 (3), pp.1 – 31. ⟨10.5565/rev/isogloss.423⟩. ⟨hal-05144863⟩
Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset. Leveraging Information Retrieval to Enhance Spoken Language Understanding Prompts in Few-Shot Learning. Interspeech 2025, Aug 2025, Rotterdam, Netherlands. ⟨10.21437/Interspeech.2025-175⟩. ⟨hal-05095796⟩
Mathieu Laï-King. Qualité des articles de recherche et modèles de langue neuronaux : applications au domaine biomédical. Intelligence artificielle [cs.AI]. Université Paris-Saclay, 2025. Français. ⟨NNT : 2025UPASG031⟩. ⟨tel-05079724⟩
Clément Morand, Anne-Laure Ligozat, Aurélie Névéol. Characterizing Goals and Impacts of Digitalization: The Case of Promises in French Healthcare Policies. 2025. ⟨hal-05066176⟩
Luc Mottin, Julien Gobeill, Jeevanthi Liyana Pathirana, Nona Naderi, Anaïs Mottaz, et al.. Manuscript Classification to Support the Analysis of Biases in Publication Opportunities. The 35th Medical Informatics Europe Conference, May 2025, Glagow, United Kingdom. ⟨10.3233/SHTI250475⟩. ⟨hal-05070636⟩
Karin Dassas, Cyrille Bonamy, Bruno Bzeznik, Romaric David, Emmanuelle Frenoux, et al.. Estimer l’impact carbone des activités numériques de l’Observatoire de Paris. EcoInfo. 2025, pp.1-47. ⟨hal-05068666⟩
Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. Clinical text generation: Are we there yet?. Annual Review of Biomedical Data Science, 2025, 8, pp.173-198. ⟨10.1146/annurev-biodatasci-103123-095202⟩. ⟨hal-05055957⟩
Arezoo Saedi, Afsaneh Fatemi, Mohammad Ali Nematbakhsh, Sophie Rosset, Anne Vilnat. Entity search based on consumer preferences leveraging user reviews. Expert Systems with Applications, 2025, 275, pp.126990. ⟨10.1016/j.eswa.2025.126990⟩. ⟨hal-05047109⟩
Foucauld Estignard, Sahar Ghannay, Julien Girard-Satabin, Nicolas Hiebel, Aurélie Névéol. Evaluating the Confidentiality of Synthetic Clinical Texts Generated by Language Models. 23rd International Conference on Artificial Intelligence in Medicine (AIME), Jun 2025, Pavie, Italy. ⟨hal-05046326v2⟩
Lisa Raithel, Philippe Thomas, Bhuvanesh Verma, Roland Roller, Hui-Syuan Yeh, et al.. Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese. The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.170-182. ⟨hal-04781015⟩
Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient’s Point of View. 2025. ⟨hal-04992084⟩
Mathieu Constant, Marie Candito, Yannick Parmentier, Carlos Ramisch, Agata Savary. Construction, exploitation et exploration de ressources linguistiques pour le traitement automatique des expressions polylexicales en français : le projet PARSEME-FR. Lidia Becker; Julia Kuhn; Christina Ossenkop; Claudia Polzin-Haumann; Elton Prifti. Digitale romanistische Sprachwissenschaft: Stand und Perspektiven, Narr Francke Attempto Verlag GmbH + Co. KG, pp.219-250, 2023, Romanistisches Kolloquium, 978-3-8233-8506-6. ⟨hal-04995189⟩
Rémi Uro. Détection et caractérisation des interruptions dans les interactions orales pour la description du comportement des femmes et des hommes dans les contenus audiovisuels. Informatique et langage [cs.CL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG055⟩. ⟨tel-04994439⟩
Amel Fraisse, Patrick Paroubek, Ramit Goyal, Nassreddine Znaidi. Measuring Multilingualism in Online Public Access Catalogs. The ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Dec 2024, Hong Kong, China. ⟨10.1145/3677389.3702544⟩. ⟨hal-04986773⟩
Manon Scholivet, Agata Savary, Louis Estève, Marie Candito, Carlos Ramisch. SELEXINI – a large and diverse automatically parsed corpus of French. Building and Using Comparable Corpora (BUCC), Jan 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04978746⟩
Camille Challant. Représentation formelle avec AZee et contraintes grammaticales pour la langue des signes française. Théorie et langage formel [cs.FL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG086⟩. ⟨tel-04957486⟩
Zheng Zhang, Brian Denton, Xiaolan Xie. Branch and Price for Chance-Constrained Bin Packing. INFORMS Journal on Computing, 2020, 32 (3), pp.547-564. ⟨10.1287/ijoc.2019.0894⟩. ⟨hal-04941861⟩
Simon Devauchelle, David Doukhan, Lucas Ondel Yang, Benjamin Élie, Albert Rilliard. Estimation automatique de caractéristiques acoustiques pour l’étude diachronique du français oral dans les médias. Atelier DAHLIA: DigitAl Humanities and cuLtural herItAge: data and knowledge management and analysis, Claudia Marinica; Fabrice Guillet; Florent Laroche, Jan 2025, Strasbourg, France. ⟨hal-04938377⟩
Rémi Uro, David Doukhan. Pendant le confinement, le temps de parole des femmes a baissé à la télévision et à la radio. La revue des médias, 2020. ⟨hal-04906221⟩
Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. “Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French. NAACL 2025 – Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Apr 2025, Albuquerque, United States. pp.7145-7159, ⟨10.18653/v1/2025.findings-naacl.398⟩. ⟨hal-04938811⟩
Marion Ficher, Tom Bauer, Anne-Laure Ligozat. A comprehensive review of the end-of-life modeling in LCAs of digital equipment. International Journal of Life Cycle Assessment, 2024, 30 (1), pp.20-42. ⟨10.1007/s11367-024-02367-x⟩. ⟨hal-04924691⟩
Léa-Marie Lam-Yee-Mui. Modélisations pour la reconnaissance de la parole à données contraintes. Traitement du signal et de l’image [eess.SP]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG075⟩. ⟨tel-04918814⟩
Philippe Boula de Mareüil, Plínio A. Barbosa. Picos melódicos pretônicos em final de enunciado no português brasileiro: um estudo quantitativo. Dermeval da Hora; Ángela Helmer. Interseções Linguísticas: Estudos Diversos, Líquido Editorial, pp.71-85, 2023, ALFAL, 9786599924804. ⟨hal-04893646⟩
Douglas Teodoro, Nona Naderi, Anthony Yazdani, Boya Zhang, Alban Bornet. A Scoping Review of Artificial Intelligence Applications in Clinical Trial Risk Assessment. npj Digital Medicine, 2025, 8 (1), pp.486. ⟨10.1038/s41746-025-01886-7⟩. ⟨hal-04913991⟩
Paritosh Sharma. Sign Language synthesis by a decreasing granularity system from AZee. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG092⟩. ⟨tel-04908078⟩
Laetitia Biscarrat, David Doukhan, Cyril Grouin. De Loft Story aux Marseillais à Dubaï : apport des méthodes d’analyse automatique pour la description des évolutions du dispositif télévisuel. Colloque ”La téléréalité, entre média, événement et société”, part of 89e Congrès de l’Association canadienne-française pour l’avancement des sciences (ACFAS), Association canadienne-française pour l’avancement des sciences (ACFAS), 2022, Montreal, Canada. ⟨hal-04906923⟩
Laetitia Biscarrat, David Doukhan, Cyril Grouin. De Loft Story aux Marseillais à Dubaï : 20 ans de télé-réalité, 20 ans de sexisme ? Apport des méthodes d’analyse automatique pour une approche comparative. Première journée d’études de l’Arcom, ARCOM, Nov 2022, Paris, France. ⟨hal-04905959⟩
Rémi Uro, Marie Tahon, David Doukhan, Albert Rilliard. Comprendre les phénomènes permettant la gestion des tours de parole dans les contenus de médias audiovisuels. Journée commune AFIA-TLH / AFCP – “Extraction de connaissances interprétables pour l’étude de la communication parlée”, Corinne Fredouille; Maëva Garnier; Olivier Perrotin; Marie Tahon, Dec 2023, Avignon, France. ⟨hal-04906679⟩
Leticia Rebollo Couto, Albert Rilliard. Variação Pragmática e Diminutivização: intensificação e atenuação de atos expressivos e diretivos para a dublagem de animação em português, espanhol e francês. IV Colloque International VariaR 2024, Université Paul-Valéry Montpellier 3, Jun 2024, Montpellier, France. pp.43-44, ⟨10.3726/978-3-0351-0740-1⟩. ⟨hal-04874595⟩
Sofiya Kobylyanskaya. Towards multimodal assessment of L2 level : speech and eye tracking features in a cross-cultural setting. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG111⟩. ⟨tel-04900961⟩
Leticia Rebollo Couto, Albert Rilliard. Variación pragmática y expresividad negativa: análisis multimodal en datos de doblaje. LingCor2024: Workshop on Spoken Corpus Linguistics, Jul 2024, Vienna, Austria. . ⟨hal-04874470⟩
Clémentine Bleuze, Fanny Ducel, Karën Fort, Maxime Amblard. « Vers la création d’une super-intelligence » : un corpus pour étudier les revendications des articles de TALTraitement Automatique des langues. Journées de lancement LIFT 2, Nov 2024, Orléans, France. ⟨hal-04880335⟩
Ayoub Hammal, Benno Uthayasooriyar, Caio Corro. Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection. COLING 2025 – 31st International Conference on Computational Linguistics, Jan 2025, Abu Dhabi, United Arab Emirates. pp.9902-9916, ⟨10.48550/arxiv.2412.00426⟩. ⟨hal-04877776⟩
Simon Devauchelle, Albert Rilliard, David Doukhan, Lucas Ondel Yang. Describing voice in French media archives: age and gender effects on pitch and articulation characteristics. XX Convegno Nazionale AISV, LFSAG (Laboratorio di Fonetica Sperimentale “Arturo Genre”) Dipartimento di Lingue e Letterature Straniere e Culture Moderne Università degli Studi di Torino, Feb 2024, Turin, Italy. ⟨hal-04874662⟩
Donna Erickson, João Antônio De Moraes, Albert Rilliard. Dimensões das atitudes prosódicas entre culturas. V Seminário Internacional de Fonologia, Universidade Federal do Rio de Janeiro, Nov 2024, Rio de Janeiro, Brazil. ⟨hal-04874627⟩
Delphine Bernhard, Myriam Bras, Anne-Laure Ligozat, Aleksandra Miletic, Jean Sibille, et al.. L’avenir numérique des langues minoritaires : bilan du projet RESTAURE pour l’alsacien, l’occitan et le picard. Colloque « Langues minoritaires » : quels acteurs pour quel avenir ?, Groupe d’Etudes sur le Plurilinguisme européen (EA1339 LiLPa), Nov 2019, Strasbourg, France. ⟨hal-04864670⟩
Cyril Grouin, Natalia Grabar. Year 2023 in Biomedical Natural Language Processing: A Tribute to Large Language Models and Generative AI. IMIA Yearbook of Medical Informatics, 2024, pp.241-248. ⟨10.1055/s-0044-1800751⟩. ⟨hal-04865083⟩
Natalia Grabar, Thierry Hamon. Study of the propaganda techniques occurring in Russian newspaper titles in 2022. METAPOL, université de Liège, Nov 2024, Liège, Belgium. ⟨hal-04865074⟩
Angèle Gayet-Ageron, Khaoula Ben Messaoud, Mark Richards, Cyril Jaksic, Julien Gobeill, et al.. Gender and geographical bias in the editorial decision-making process of biomedical journals: a case-control study. BMJ Evidence-Based Medicine, 2024, pp.bmjebm-2024-113083. ⟨10.1136/bmjebm-2024-113083⟩. ⟨hal-04865134⟩
Omar Adjali, Olivier Ferret, Sahar Ghannay, Hervé Le Borgne. Multi-Level Information Retrieval Augmented Generation for Knowledge-based Visual Question Answering. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Nov 2024, Miami, United States. pp.16499-16513, ⟨10.18653/v1/2024.emnlp-main.922⟩. ⟨hal-04852275⟩
Aurélie Bugeau, Anne-Laure Ligozat. L’informatique en temps de crises environnementales : comment adapter la recherche et l’enseignement ?. 1024 : Bulletin de la Société Informatique de France, 2025, 25, pp.81-95. ⟨10.48556/SIF.1024.25.81⟩. ⟨hal-04850517v2⟩
Donna Erickson, Albert Rilliard, Ela Thurgood, João Antônio de Moraes, Takaaki Shochi. Acoustic and perceptual profiles of american english social affective expressions. Journal of Speech Sciences, 2024, 13, pp.e024004. ⟨10.20396/joss.v13i00.20015⟩. ⟨hal-04850040⟩
Lucie Gianola. Traitement automatique des langues et linguistique de corpus pour la reconnaissance d’entités en analyse criminelle. Revue internationale de criminologie et de police technique et scientifique, 2021, LXXIV (3), pp.363-382. ⟨hal-04833123⟩
Mathilde Aguiar, Ying Lai, Pierre Zweigenbaum, Nona Naderi. Constituting a dataset for applying Natural Language Inference to Chinese Clinical Trials: possible approaches and challenges. Junior Conference on Data Sciences and Engineering, Sep 2024, Gif-sur-Yvette, France. ⟨hal-04837721⟩
Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, et al.. What Can Natural Language Processing Do for Peer Review?. 2024. ⟨hal-04797652⟩
Fanny Ducel, Aurélie Névéol, Karën Fort. “You’ll be a nurse, my son!” Automatically Assessing Gender Biases in Autoregressive Language Models in French and Italian. Language Resources and Evaluation, 2024, pp.1495-1523. ⟨10.1007/s10579-024-09780-6⟩. ⟨hal-04803403⟩
Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, et al.. A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Turin, Italy. pp.395-414. ⟨hal-04779777⟩
Dongfang Xu, Guillermo Lopez-Garcia, Lisa Raithel, Roland Roller, Philippe Thomas, et al.. Overview of the 9th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at ACL 2024 – Large Language Models and Generalizability for Social Media NLP. The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.183-195. ⟨hal-04781745⟩
Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp. The 17th Workshop on Building and Using Comparable Corpora (BUCC) @LREC-COLING-2024. Workshop Proceedings. 17th Workshop on Building and Using Comparable Corpora (BUCC), 2024, 978-2-493814-31-9. ⟨hal-04779272⟩
Atilla Kaan Alkan, Felix Grezes, Cyril Grouin, Fabian Schüssler, Pierre Zweigenbaum. Enriching a Time-Domain Astrophysics Corpus with Named Entity, Coreference, and Astrophysical Relationship Annotations. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Apr 2024, Turin, Italy. pp.6177-6188. ⟨hal-04780619⟩
Virgile Barthet, Marie José Aroulanda, Laura Monceaux-Cachard, Christine Jacquin, Cyril Grouin, et al.. Équilibrer qualité et quantité : comparaison de stratégies d’annotation pour la reconnaissance d’entités nommées en cardiologie. Journée Santé et IA 2024, AFIA; L3I; La Rochelle Université, Jul 2024, La Rochelle, France. ⟨hal-04780743⟩
Clément Morand, Olivier Ridoux. CRI : A Competent Reader Imitator for detecting binomial names in an historical corpus. Lingvisticae investigationes : International Journal of Linguistics and Language, 2024, 47 (1), pp.30-67. ⟨10.1075/li.00107.mor⟩. ⟨hal-04764787⟩
Fanny Ducel, Aurélie Névéol, Karën Fort. Desiderata for Actionable Bias Research. New Perspectives on Bias and Discrimination in Language Technology, Nov 2024, Amsterdam, Netherlands. ⟨hal-04755691⟩
Jamil Zaghir, Marco Naguib, Mina Bjelogrlic, Aurélie Névéol, Xavier Tannier, et al.. Prompt Engineering Paradigms for Medical Applications: Scoping Review. Journal of Medical Internet Research, 2024, 26, pp.e60501. ⟨10.2196/60501⟩. ⟨hal-04752782⟩
Mariana Neves, Cristian Grozea, Philippe Thomas, Roland Roller, Rachel Bawden, et al.. Findings of the WMT 2024 Biomedical Translation Shared Task: TestDéfinition courte Lorem ipsum Sets on Abstract Level. WMT24 – Ninth Conference on Machine Translation, Nov 2024, Miami, Florida, United States. pp.124-138, ⟨10.18653/v1/2024.wmt-1.6⟩. ⟨hal-04750560⟩
Théo Deschamps-Berger. Social Emotion Recognition with multimodal deep learning architecture in emergency call centers. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG036⟩. ⟨tel-04750508⟩
Najet Hadj Mohamed, Cherifa Ben Khelil, Agata Savary, Iskander Keskes, Jean Yves Antoine, et al.. PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines. Language Resources and Evaluation, 2024, pp.1331-1361. ⟨10.1007/s10579-024-09763-7⟩. ⟨hal-04738059⟩