ILES’s research topics

ILES stands for “Information, Langue Écrite et Signée” (Information, written and signed language). ILES investigates natural language processing on written language data (analysis, understanding or production, and related knowledge acquisition) and sign language modeling and processing.

Our ambition is to maintain varied and complementary skills to combine both knowledge-based and machine-learning approaches, as well as fundamental and applied dimensions.

ILES’s research topics are organised along the following four topics:

The theme Corpora and Representations studies the linguistic events which occur in the graphic or signed representation systems used by humans to communicate. Our approach is corpus-based. We study documents from various origins: book, newspapers, speech transcriptions, technical reports, scientific articles, web pages blogs, microblogs, sign language videos etc.

By working on language productions with similar meanings but different forms, this research theme provides handles on semantics, the core of human language. At the same time, cross-language portability is a recurring issue in system development. This topic interacts in a transverse way with each of the other three research themes of the ILES group, as well as with the Machine Translation activity of the TLP group.

Two main axes in this theme.

The first goal is focused on the recognition of precise information in texts, with two main fields of study:

  • Information extraction: recognition and typing of targeted information in texts (such as entity and relation extraction) in order to build knowledge bases or analyze texts.
  • Focused information retrieval: locating target information in documents or knowledge bases in order to answer a query or natural language question.

The second goal focuses on modeling processes using natural language to query machines in the context of personal assistants or information retrieval either in specialized domains (e.g. on a commercial site, in scientific texts) or in open domain (search in a knowledge base or encyclopedia).

Sign Languages (SL) are natural visual-gestural languages whose linguistic system exploits these specific channels: a lot of information is expressed simultaneously and organised in space, and iconicity plays a central role. To date, SL do not have a standard writing or graphical system for transcription. They are still poorly described and under-resourced. Computer modelling of SL requires the design of appropriate representations. We produce linguistic resources and address issues of analysis, representation and processing of French SL (LSF) in an interdisciplinary manner, with perspectives from several fields of computer science (NLP, signal processing, computer vision, computer graphics), as well as from language, motion and perception sciences.