LAHDAK

LAHDAK Research Themes

The team objective of LaHDAK is to address the challenges related to massive data and knowledge.
We propose to design solutions to deal with complex, structured, semantically heterogeneous, uncertain, missing and evolving data from any source, including social web data, linked open data or scientific data. Semantic relevance, efficiency, scalability and robustness are the main issues studied.

The LaHDAK team brings together two complementary areas of expertise, data management and artificial intelligence. Its research activities are organized in several axes and each of them is structured in several themes. We have selected three main axes that we describe hereafter.

(i) Massive and heterogeneous data: storage and integration. The LaHDAK team works on information extraction and integration from structured and unstructured data sources (such as text and images). The LaHDAK team is interested in data indexing and query optimization in the context of new polystore-type data integration systems, where the systems managing the data sources are themselves heterogeneous, i.e. relational databases, graph databases, no-SQL databases, etc.

(ii) Knowledge graphs: refinement and reasoning. We are interested in the refinement of knowledge graphs, which consists in both enriching them and detecting inconsistent knowledge. The LaHDAK team works on the one hand on the extraction of semantic modules from large ontologies using reasoning techniques in symbolic AI. On the other hand, the LaHDAK team develops methods for the detection and invalidation of identity links in linked open data. Different rule discovery algorithms have also been developed within the LaHDAK team for data linking, causal relationship explanation or decision support (e.g. fraud detection).

(iii) Graph mining and optimization. We are interested in learning graph integration models using reinforcement learning for influence maximization in social networks and recommendation. In this axis, the LaHDAK team works on querying and indexing large graphs of uncertain data.