FAIRClinical – FAIR-ification of Supplementary Data to Support Clinical Research

Obtention :

Date de fin :

Budget : 524702 €

ANR

Université du Luxembourg

Université de Leicester

HES-SO

Nona Naderi

This project aims to enhance the FAIR-ness of all supplementary data files and significantly improve the reuse of unstructured clinical case report forms (CRFs). Supplementary data are commonly attached to a scientific publication, either directly in biomedical libraries such as PMC, or via generalist deposition platforms such as Zenodo. The file types and formats are highly heterogeneous (e.g. PDF, XLS, CSV, GIF, etc.). CRFs collect the patient data in clinical research studies and trials, and represent an information-rich subset of clinical research literature and unstructured clinical study supplementary data. We propose to specifically enrich the contents - and therefore the interoperability, findability and reusability - of all supplementary data by delivering more normalised contents. The envisaged normalisation will be performed according to four dimensions, which are common in dataset management catalogues: 1. administrative metadata (e.g. author names, affiliations, licensing models), 2. descriptive metadata (e.g. diseases, gene or gene products, size of populations, experimental settings), 3. structural metadata (e.g. textual contents, images) and finally, 4. cross-references to other data deposition catalogues (e.g. URL, PID). Regarding the descriptive metadata layer, which may significantly vary depending on specialised life and health science areas, we propose to explore the semantic enrichment of clinical information. We will provide broad FAIR-ification of supplementary data files, covering all PMC contents, combined with a specific effort to structure CRFs into an Electronic Health Records (EHR)-like dataset.