ILES

Information Langue Écrite et Signée (ILES)

The ILES group is dedicated to the processing of written language data (their analysis, comprehension or production as well as the acquisition of the necessary knowledge to do so) and signed languages (modelling and automatic processing of sign languages).

The ambition is to maintain varied and complementary skills to combine both symbolic and statistical approaches, as well as fundamental and more applied aspects.

The research activities of the ILES group are organised around four themes

The theme Corpus and Representations concerns the study of linguistic events as they manifest themselves in the graphical and signed representation systems used by humans to communicate. In our research, we explore corpora, i.e. collections of documents, created according to a working hypothesis, with diverse origins: speech transcripts, books, articles, newspapers, reports, web pages, blogs, microblogs, sign language videos etc.

This research theme is devoted to the analysis of language productions with the same meaning but different forms, a problem at the heart of semantics. This question extends to multilingualism, a recurrent issue in the development of a system. This theme interacts transversally with each of the three other themes of the ILES group, as well as with the translation activity of the TLP group.

Two main axes in this theme:

The first focuses on the recognition of specific information in texts, with two main fields of study:

  • Information extraction: recognition and typing of information to build knowledge bases or analyse texts
  • Precise information retrieval: finding information in texts or knowledge bases in response to questions in natural language

A second axis concerns the modelling of processes allowing interaction in natural language to query the machine, whether for information retrieval, in a particular domain (e.g. on a commercial site, in scientific texts) or in an open domain (search in a knowledge base or in encyclopaedic texts), and also as a personal assistant.

Through numerous collaborations, we produce linguistic resources and address issues of analysis, representation and processing of FSL in an interdisciplinary manner, with perspectives from several fields of computer science (NLP, signal processing, computer vision, computer graphics), as well as from language, motion and perception sciences.

Coordination

Recent publications

  • Communication dans un congrès

    Yagmur Ozturk, Najet Hadj Mohamed, Adam Lion-Bouton, Agata Savary. Enhancing the PARSEME Turkish Corpus of Verbal Multiword Expressions. 18th Workshop on Multiword Expressions (MWE 2022) @LREC2022, Jun 2022, Marseille, France. ⟨hal-03925083⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Communication dans un congrès

    Mariana Neves, Antonio Jimeno Yepes, Amy Siu, Roland Roller, Philippe Thomas, et al.. Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports. WMT22 – Seventh Conference on Machine Translation, Dec 2022, Abu Dhabi, United Arab Emirates. pp.694-723. ⟨hal-03932275⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Article dans une revue

    Natalia Grabar, Cyril Grouin. Year 2021: COVID-19, Information Extraction and BERTization among the most Hot Topics in Medical Natural Language Processing. IMIA Yearbook of Medical Informatics, 2022. ⟨hal-03931852⟩

    ILES

    Année de publication 2022

  • Pré-publication, Document de travail

    Laurent Lefèvre, Anne-Laure Ligozat, Denis Trystram, Sylvain Bouveret, Aurélie Bugeau, et al.. Environmental assessment of projects involving AI methods. 2023. ⟨hal-03922093⟩

    ILES

    Année de publication 2023

    Disponible en libre accès

  • Proceedings/Recueil des communications

    Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff. Proceedings of the LREC 2022 15th Workshop on Building and Using Comparable Corpora (BUCC 2022). 2022, 15th Workshop on Building and Using Comparable Corpora (BUCC 2022), 979-10-95546-94-8. ⟨hal-03876674⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Thèse

    Amine Benamara. Conception d'une patiente virtuelle Alzheimer interactive et expressive : modélisation de biais d'évaluation pour la génération de comportements non-verbaux. Interface homme-machine [cs.HC]. Université Paris-Saclay, 2022. Français. ⟨NNT : 2022UPASG057⟩. ⟨tel-03894634⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Thèse

    Marion Kaczmarek. Spécification d'un logiciel de traduction assistée par ordinateur à destination des langues signées. Informatique et langage [cs.CL]. Université Paris-Saclay, 2022. Français. ⟨NNT : 2022UPASG065⟩. ⟨tel-03889967⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Rapport

    Laurent Lefèvre, Anne-Laure Ligozat, Denis Trystram, Sylvain Bouveret, Aurélie Bugeau, et al.. Proposition de document de cadrage Évaluation environnementale de projets impliquant des méthodes d'IA. EcoInfo. 2022, pp.1-8. ⟨hal-03853135⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Communication dans un congrès

    Victoria Arranz, Khalid Choukri, Montse Cuadros, Aitor García-Pablos, Lucie Gianola, et al.. MAPA Project: Ready-to-Go Open-Source Datasets and Deep Learning Technology to Remove Identifying Information from Text Documents. Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Language Resources (LEGAL – MDLR 2022), Jun 2022, Marseille, France. pp.64-72. ⟨hal-03873042⟩

    ILES

    Année de publication 2022

    Disponible en libre accès

  • Communication dans un congrès

    Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller, et al.. Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective. 13th Conference on Language Resources and Evaluation, Jun 2022, Marseille, France. ⟨hal-03866409⟩

    ILES

    Année de publication 2022

    Disponible en libre accès