SEME

SEMantics and Information Extraction (SEME)

Coordination: Cyril GROUIN

The SEME (semantics and information extraction) team is interested in the problems of accessing the meaning contained in language productions, for the purposes of analysis, comprehension, modeling or production. We apply our research to the written modality, without restriction on the original medium (text produced in electronic format, or from a speech transcription, or from optical recognition) and work on productions in open or specialized domains such as the medical field. We use both linguistic and statistical or neural learning approaches. We are particularly interested in the latter type of approach, and in the environmental costs they generate in automatic language processing, both during production and during use.

  • Information extraction
  • Corpus and modeling
  • Sémantics, poly-lexical expressions

The team comprises 10 permanent members (CNRS researchers, lecturers at Université Paris-Saclay, ENSIIE, and Université Sorbonne Paris-Nord), 14 PhD students, and 3 post-docs or fixed-term contracts. We maintain links with industry (theses under CIFRE contracts, research projects) and regularly organize scientific events (TALN conference, scientific workshops, etc.).

Coordination

  • Sciences et Technologies des Langues

    SEME

    Grouin Cyril

    Research Engineer

    Head of SEME

Members

Publications

  • Autre publication scientifique

    Louis Estève, Kaja Dobrovoljc. A new pipeline for measuring diversity across various linguistic levels. 2025. ⟨hal-04886792⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Leticia Rebollo Couto, Albert Rilliard. Variação Pragmática e Diminutivização: intensificação e atenuação de atos expressivos e diretivos para a dublagem de animação em português, espanhol e francês. IV Colloque International VariaR 2024, Université Paul-Valéry Montpellier 3, Jun 2024, Montpellier, France. pp.43-44, ⟨10.3726/978-3-0351-0740-1⟩. ⟨hal-04874595⟩

    STL

    Year of publication

    Available in free access

  • Thèse

    Sofiya Kobylyanskaya. Towards multimodal assessment of L2 level : speech and eye tracking features in a cross-cultural setting. Computation and Language [cs.CL]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG111⟩. ⟨tel-04900961⟩

    STL

    Year of publication

    Available in free access

  • Poster de conférence

    Leticia Rebollo Couto, Albert Rilliard. Variación pragmática y expresividad negativa: análisis multimodal en datos de doblaje. LingCor2024: Workshop on Spoken Corpus Linguistics, Jul 2024, Vienna, Austria. . ⟨hal-04874470⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Clémentine Bleuze, Fanny Ducel, Karën Fort, Maxime Amblard. Vers la création d’une super-intelligence » : un corpus pour étudier les revendications des articles de TALTraitement Automatique des langues. Journées de lancement LIFT 2, Nov 2024, Orléans, France. ⟨hal-04880335⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Ayoub Hammal, Benno Uthayasooriyar, Caio Corro. Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Jan 2025, Abu DHABI, France. ⟨hal-04877776⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Simon Devauchelle, Albert Rilliard, David Doukhan, Lucas Ondel Yang. Describing voice in French media archives: age and gender effects on pitch and articulation characteristics. XX Convegno Nazionale AISV, LFSAG (Laboratorio di Fonetica Sperimentale “Arturo Genre”) Dipartimento di Lingue e Letterature Straniere e Culture Moderne Università degli Studi di Torino, Feb 2024, Turin (Italie), Italy. ⟨hal-04874662⟩

    STL

    Year of publication

  • Communication dans un congrès

    Donna Erickson, João Antônio De Moraes, Albert Rilliard. Dimensões das atitudes prosódicas entre culturas. V Seminário Internacional de Fonologia, Universidade Federal do Rio de Janeiro, Nov 2024, Rio de Janeiro (BR), Brazil. ⟨hal-04874627⟩

    STL

    Year of publication

  • Communication dans un congrès

    Khanh-An C Quan, Camille Guinaudeau, Shin’Ichi Satoh. Evaluating VQA Models’ Consistency in the Scientific Domain. Multimedia Modelling 2025, Jan 2025, Nara, Japan. ⟨hal-04860239⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Saumya Yadav, Elise Lincker, Caroline Huron, Stéphanie Martin, Camille Guinaudeau, et al.. Towards Inclusive Education: Multimodal Classification of Textbook Images for Accessibility. Multimedia Modelling 2025, Jan 2025, Nara, Japan. ⟨hal-04860245⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Delphine Bernhard, Myriam Bras, Anne-Laure Ligozat, Aleksandra Miletic, Jean Sibille, et al.. L’avenir numérique des langues minoritaires : bilan du projet RESTAURE pour l’alsacien, l’occitan et le picard. Colloque « Langues minoritaires » : quels acteurs pour quel avenir ?, Groupe d’Etudes sur le Plurilinguisme européen (EA1339 LiLPa), Nov 2019, Strasbourg, France. ⟨hal-04864670⟩

    STL

    Year of publication

  • Article dans une revue

    Cyril Grouin, Natalia Grabar. Year 2023 in Biomedical Natural Language Processing: A Tribute to Large Language Models and Generative AI. IMIA Yearbook of Medical Informatics, 2024. ⟨hal-04865083⟩

    STL, STL

    Year of publication

  • Communication dans un congrès

    Natalia Grabar, Thierry Hamon. Study of the propaganda techniques occurring in Russian newspaper titles in 2022. METAPOL, université de Liège, Nov 2024, Liège (Belgique), Belgium. ⟨hal-04865074⟩

    STL

    Year of publication

  • Article dans une revue

    Angèle Gayet-Ageron, Khaoula Ben Messaoud, Mark Richards, Cyril Jaksic, Julien Gobeill, et al.. Gender and geographical bias in the editorial decision-making process of biomedical journals: a case-control study. BMJ Evidence-Based Medicine, 2024, pp.bmjebm-2024-113083. ⟨10.1136/bmjebm-2024-113083⟩. ⟨hal-04865134⟩

    STL

    Year of publication

  • Communication dans un congrès

    Omar Adjali, Olivier Ferret, Sahar Ghannay, Hervé Le Borgne. Multi-Level Information Retrieval Augmented Generation for Knowledge-based Visual Question Answering. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024, Miami, United States. pp.16499-16513, ⟨10.18653/v1/2024.emnlp-main.922⟩. ⟨hal-04852275⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Aurélie Bugeau, Anne-Laure Ligozat. L’informatique en temps de crises environnementales : comment adapter la recherche et l’enseignement ?. 2024. ⟨hal-04850517⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Donna Erickson, Albert Rilliard, Ela Thurgood, João Antônio de Moraes, Takaaki Shochi. Acoustic and perceptual profiles of american english social affective expressions. Journal of Speech Sciences, 2024, 13, pp.e024004. ⟨10.20396/joss.v13i00.20015⟩. ⟨hal-04850040⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Clément Morand, Anne-Laure Ligozat, Aurélie Névéol. How Green Can AI Be? A Study of Trends in Machine Learning Environmental Impacts. 2024. ⟨hal-04839926v3⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Lucie Gianola. Traitement automatique des langues et linguistique de corpus pour la reconnaissance d’entités en analyse criminelle. Revue internationale de criminologie et de police technique et scientifique, 2021, LXXIV (3), pp.363-382. ⟨hal-04833123⟩

    ILES, ILES, STL

    Year of publication

    Available in free access

  • Poster de conférence

    Mathilde Aguiar, Ying Lai, Pierre Zweigenbaum, Nona Naderi. Constituting a dataset for applying Natural Language Inference to Chinese Clinical Trials: possible approaches and challenges. Junior Conference on Data Sciences and Engineering, Sep 2024, Gif-sur-Yvette, France. ⟨hal-04837721⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Hansjörg Mixdorff, Albert Rilliard, Navneet Nayan. Perceptual Evaluation of Attitudinal Expressions. 5th International Symposium on Applied Phonetics (ISAPh 2024), Pärtel Lippus, Sep 2024, Tartu, Estonia. pp.60-64, ⟨10.21437/ISAPh.2024-12⟩. ⟨hal-04823812⟩

    STL

    Year of publication

    Available in free access

  • Pré-publication, Document de travail

    Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, et al.. What Can Natural Language Processing Do for Peer Review?. 2024. ⟨hal-04797652⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Fanny Ducel, Aurélie Névéol, Karën Fort. “You’ll be a nurse, my son!” Automatically Assessing Gender Biases in Autoregressive Language Models in French and Italian. Language Resources and Evaluation, 2024, ⟨10.1007/s10579-024-09780-6⟩. ⟨hal-04803403⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, et al.. A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Turin, Italy. pp.395-414. ⟨hal-04779777⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Dongfang Xu, Guillermo Lopez-Garcia, Lisa Raithel, Roland Roller, Philippe Thomas, et al.. Overview of the 9th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at ACL 2024 – Large Language Models and Generalizability for Social Media NLP. The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.183-195. ⟨hal-04781745⟩

    STL

    Year of publication

    Available in free access

  • Proceedings/Recueil des communications

    Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp. The 17th Workshop on Building and Using Comparable Corpora (BUCC) @LREC-COLING-2024. Workshop Proceedings. 17th Workshop on Building and Using Comparable Corpora (BUCC), 2024, 978-2-493814-31-9. ⟨hal-04779272⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Atilla Kaan Alkan, Felix Grezes, Cyril Grouin, Fabian Schüssler, Pierre Zweigenbaum. Enriching a Time-Domain Astrophysics Corpus with Named Entity, Coreference, and Astrophysical Relationship Annotations. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Apr 2024, Turin, Italy. pp.6177-6188. ⟨hal-04780619⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Virgile Barthet, Marie José Aroulanda, Laura Monceaux-Cachard, Christine Jacquin, Cyril Grouin, et al.. Équilibrer qualité et quantité : comparaison de stratégies d’annotation pour la reconnaissance d’entités nommées en cardiologie. Journée Santé et IA 2024, AFIA; L3I; La Rochelle Université, Jul 2024, La Rochelle, France. ⟨hal-04780743⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Clément Morand, Olivier Ridoux. CRI : A Competent Reader Imitator for detecting binomial names in an historical corpus. Lingvisticae investigationes : International Journal of Linguistics and Language, 2024, 47 (1), pp.30-67. ⟨10.1075/li.00107.mor⟩. ⟨hal-04764787⟩

    STL

    Year of publication

    Available in free access

  • Mémoire d'étudiant

    Clément Morand. Evaluation of the environmental impacts of Natural Language Processing methods. Computer Science [cs]. 2023. ⟨dumas-04758937⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Fanny Ducel, Aurélie Névéol, Karën Fort. Desiderata for Actionable Bias Research. New Perspectives on Bias and Discrimination in Language Technology, Nov 2024, Amsterdam (Pays-Bas), France. ⟨hal-04755691⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Jamil Zaghir, Marco Naguib, Mina Bjelogrlic, Aurélie Névéol, Xavier Tannier, et al.. Prompt Engineering Paradigms for Medical Applications: Scoping Review. Journal of Medical Internet Research, 2024, 26, pp.e60501. ⟨10.2196/60501⟩. ⟨hal-04752782⟩

    STL

    Year of publication

  • Communication dans un congrès

    Mariana Neves, Cristian Grozea, Philippe Thomas, Roland Roller, Rachel Bawden, et al.. Findings of the WMT 2024 Biomedical Translation Shared Task: TestDéfinition courte Lorem ipsum Sets on Abstract Level. WMT24 – Ninth Conference on Machine Translation, Nov 2024, Miami, Florida, United States. pp.124-138. ⟨hal-04750560⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Najet Hadj Mohamed, Cherifa Ben Khelil, Agata Savary, Iskander Keskes, Jean Yves Antoine, et al.. PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines. Language Resources and Evaluation, 2024, ⟨10.1007/s10579-024-09763-7⟩. ⟨hal-04738059⟩

    STL

    Year of publication

    Available in free access

  • Rapport

    David Benaben, Françoise Berthoud, Gaël Guennebaud, Anne-Laure Ligozat, S. Valcke. Estimation de l’empreinte carbone d’une heure de calcul sur un cœur CPUCognition Perception et Usages ou sur un GPU. Labos 1point5. 2024. ⟨hal-04738556⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Théo Gigant, Camille Guinaudeau, Marc Decombas, Frédéric Dufaux. Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Nov 2024, Miami (FL), United States. ⟨hal-04720645⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Emmanuella Martinod, Michael Filhol. Formal Representation of Interrogation in French Sign Language. Proceedings of the 11th Workshop on representation and processing of Sign Languages, May 2024, Turin, Italy. ⟨hal-04712681⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Michael Filhol, Thomas von Ascheberg. A software editor for the AZVD graphical Sign Language representation system. Workshop on the representation and processing Sign Language, May 2024, Turin, Italy. ⟨hal-04712674⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Emmanuella Martinod, Michael Filhol. Examining interrogative marking in French Sign Language with the AZee approach. Clause-type marking in the visual modality, workshop at the Annual Conference of the German Linguistics Society, German Linguistics Society, Feb 2024, Bochum, Germany. ⟨hal-04709019⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paritosh Sharma, Camille Challant, Michael Filhol. Facial Expressions for Sign Language Synthesis using FACSHuman and AZee. 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, May 2024, Turin, Italy. ⟨hal-04709105⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Paritosh Sharma, Michael Filhol. Sign Language Synthesis using Pose Priors. MOCO ’24: 9th International Conference on Movement and Computing, May 2024, Utrecht Netherlands, France. pp.1-4, ⟨10.1145/3658852.3659080⟩. ⟨hal-04709203⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Pierre La Rocca, Gaël Guennebaud, Aurélie Bugeau, Anne-Laure Ligozat. Estimating The Carbon Footprint Of Digital Agriculture Deployment: A Parametric Bottom-Up Modelling Approach.. Journal of Industrial Ecology, In press, 28 (6), pp.1801-1815. ⟨10.1111/jiec.13568⟩. ⟨hal-04708774⟩

    STL

    Year of publication

    Available in free access

  • Article dans une revue

    Fanny Ducel, Aurélie Névéol, Karën Fort. La recherche sur les biais dans les modèles de langue est biaisée : état de l’art en abyme. Revue TALTraitement Automatique des langues : traitement automatique des langues, 2024, 64 (3), pp.119-143. ⟨hal-04710191⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès, Communication dans un congrès

    Carlos Cuevas Villarmin, Sarah Cohen-Boulakia, Nona Naderi. Reproducibility in Named Entity Recognition: A Case Study Analysis. 2024 IEEE 20th International Conference on e-Science (e-Science), Sep 2024, Osaka, Japan. pp.1-10, ⟨10.1109/e-Science62913.2024.10678721⟩. ⟨hal-04706673⟩

    BioInfo, BioInfo, STL

    Year of publication

  • Communication dans un congrès

    Rémi Uro, Marie Tahon, David Doukhan, Antoine Laurent, Albert Rilliard. Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content. Interspeech 2024, Itshak Lapidot; Sharon Gannot, Sep 2024, Kos, Greece. pp.3560 – 3564, ⟨10.21437/interspeech.2024-1163⟩. ⟨hal-04694968⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Donna Erickson, Albert Rilliard, Malin Svensson Lundmark, Adelaide Silva, Leticia Rebollo Couto, et al.. Collecting Mandible Movement in Brazilian Portuguese. Interspeech 2024, Itshak Lapidot; Sharon Gannot, Sep 2024, Kos, Greece. pp.3145-3149, ⟨10.21437/interspeech.2024-1216⟩. ⟨hal-04694958⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Benjamin Elie, David Doukhan, Rémi Uro, Lucas Ondel Yang, Albert Rilliard, et al.. Articulatory Configurations across Genders and Periods in French Radio and TV archives. Interspeech 2024, Itshak Lapidot; Sharon Gannot, Sep 2024, Kos, Greece. pp.3085-3089, ⟨10.21437/interspeech.2024-1177⟩. ⟨hal-04694868⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Rémi Uro, Marie Tahon, Jane Wottawa, David Doukhan, Albert Rilliard, et al.. Annotation of Transition-Relevance Places and Interruptions for the Description of Turn-Taking in Conversations in French Media Content. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Sep 2024, Torino, Italy. pp.1225–1232. ⟨hal-04694997⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès, Communication dans un congrès

    Luc Mottin, Nona Naderi, Anaïs Mottaz, Pierre-André Michel, Gerieke Been, et al.. Comparing Sequence-Based and Literature-Based Pathogenicity Scoring Methods for Human Variants. 34th Medical Informatics Europe Conference, Aug 2024, Athens (Greece), Greece. ⟨10.3233/SHTI240747⟩. ⟨hal-04682928⟩

    STL

    Year of publication

    Available in free access

  • Communication dans un congrès

    Annelies Braffort, Patrice Dalle. Sign language processing: models, representations, tools for video analysis, for signing avatars and for communication. 2nd International Society for Gesture Studies (ISGS 2005) conference: “Interacting bodies”, 2005, Lyon, France. ⟨hal-04678548⟩

    STL

    Year of publication