From
Time -
Location LISN Site Plaine
Data Science, Thesis
Speaker : Armita KHAJEH NASSIRI
Link Zoom : https://cnrs.zoom.us/j/95042504893?pwd=alRYazdYTTFCQTB1aUJwUGRIOWpqdz09#success
Knowledge graphs (KGs) are heterogeneous graph structures representing facts in a machine-readable format. They find applications in tasks such as question answering, disambiguation, and entity linking. However, KGs are inherently incomplete, and refining them is crucial to improve their effectiveness in downstream tasks. It’s possible to complete the KGs by predicting missing links within a knowledge graph or integrating external sources and KGs. By extracting rules from the KG, we can leverage them to complete the graph while providing explainability. Various approaches have been proposed to mine rules efficiently. Yet, the literature lacks effective methods for effectively incorporating numerical predicates in rules. To address this gap, we propose REGNUM, which mines numerical rules with interval constraints. REGNUM builds upon the rules generated by an existing rule mining system and enriches them by incorporating numerical predicates guided by quality measures. Additionally, the interconnected nature of web data offers significant potential for completing and refining KGs, for instance, by data linking, which is the task of finding sameAs links between entities of different KGs. We introduce RE-miner, an approach that mines referring expressions (REs) for a class in a knowledge graph and uses them for data linking. REs are rules that are only applied to one entity. They support knowledge discovery and serve as an explainable way to link data. We employ pruning strategies to explore the search space efficiently, and we define characteristics to generate REs that are more relevant for data linking. Furthermore, we aim to explore the advantages and opportunities of fine-tuning language models to bridge the gap between KGs and textual data. We propose GilBERT, which leverages fine-tuning techniques on language models like BERT using a triplet loss. GilBERT demonstrates promising results for refinement tasks of relation prediction and triple classification tasks. By considering these challenges and proposing novel approaches, this thesis contributes to KG refinement, particularly emphasizing explainability and knowledge discovery. The outcomes of this research open doors to more research questions and pave the way for advancing towards more accurate and comprehensive KGs.