The Data Science department brings together four teams with recognized and complementary expertise, covering the modeling, collection, management, analysis and construction of data and knowledge (A&O, Bioinfo, LaHDAK, Rocs), making it possible to explore synergies between expertise related to data, learning and optimization, particularly in connection with the fields of bioinformatics, IoT and data graphs.
Digital traces of all human activities are now available in all fields, data that is often massive, heterogeneous, dynamic and of variable quality (the 4 V’s of Big Data: Volume, Variety, Velocity, Veracity). Their exploitation leads to the definition of a fourth scientific paradigm: the design and validation of hypotheses, theoretical models and algorithms, guided by the data and in interaction with domain experts. The Data Science department is interested in robustly addressing the challenges of the 4Vs, in terms of scaling up in the face of data volume and velocity, and resisting diversity and quality bias. These goals define new computational issues in storage, communication, analysis and processing optimization, data query and enrichment, knowledge discovery, and model learning.
With nearly 40 researchers and teacher-researchers, the Data Science department covers a broad spectrum of fundamental and application-related topics: databases, data mining, semantic web, knowledge representation, algorithms, combinatorics, stochastic and distributed optimization, statistical learning and neural networks, communication networks, simulation. It also has extensive expertise in interdisciplinary research and dialogue with experts in the application domains (particularly in biology, medicine, human and social sciences, and experimental physics), allowing privileged access to data of interest and to the evaluation of models and algorithms.
Coordination
Algorithmes, apprentissage et calcul, Sciences des Données
Carl Abou Saada Nujaim, Victoria Meifeng Myot, Michel Beaudouin-Lafon, Wendy E Mackay. “Figma is a big black box:” The mismatch between the interface and its underlying structure. CHI 2026 ACM – Conference on Human Factors in Computing Systems, Apr 2026, Barcelona, Spain. ⟨10.1145/3772363.3798460⟩. ⟨hal-05559726⟩
Cyprien Tilmant, Alexis Trécourt, Constance Béchu, Brigitte Séroussi, Françoise Berthoud, et al.. What is the carbon footprint of a 100% digital pathology scenario in France?. Pathology, 2026, ⟨10.1016/j.pathol.2025.12.008⟩. ⟨hal-05549595⟩
E. Millour, G. Labrosse, E. Tric. Sensitivity of binary liquid thermal convection to confinement. Physics of Fluids, 2003, 15 (10), pp.2791-2802. ⟨10.1063/1.1600439⟩. ⟨hal-00407037⟩
A. Mambole, G. Labrosse, E. Tric, L. Fleitout. Linear stability of a double diffusive layer of an infinite prandtl number fluid with temperature-dependent viscosity. Studia Geophysica et Geodaetica, 2004, 48 (3), pp.519-537. ⟨10.1023/B:SGEG.0000037470.80659.e5⟩. ⟨hal-00407332⟩
E. Millour, G. Labrosse, E. Tric. Axisymmetric convective states of pure and binary liquids enclosed in a vertical cylinder and boundary conditions’ influence thereupon. Physics of Fluids, 2005, 17 (4), pp.44102-44121. ⟨10.1063/1.1863257⟩. ⟨hal-00407455⟩
Natalia Grabar, Cyril Grouin. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook. IMIA Yearbook of Medical Informatics, 2019, 28 (01), pp. 218-222. ⟨10.1055/s-0039-1677937⟩. ⟨hal-05549686⟩
Dalya Moultaka, Tifanie Bouchara, Alma Guilbert. Dissecting pseudoneglect in real and virtual environments: effects of tool and stimulus, not distance or environment. Laterality, 2026, pp.1-31. ⟨10.1080/1357650X.2026.2622714⟩. ⟨hal-05555439⟩
Natalia Grabar, Cyril Grouin. Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing. IMIA Yearbook of Medical Informatics, 2022, 31 (01), pp.254-260. ⟨10.1055/s-0042-1742547⟩. ⟨hal-03931852⟩
Jean Cury, Théophile Sanchez, Erik Madison Bray, Jazeps Medina-Tretmanis, María Ávila-Arcos, et al.. Inferring effective population sizes of bacterial populations while accounting for unknown recombination and selection: a deep learning approach. ECML PKDD 2022 – Machine Learning for Microbial Genomics workshop, Sep 2022, Grenoble, France. ⟨hal-05554282⟩
George Marchment, Sarah Cohen-Boulakia, Frédéric Lemoine. Computational Reproducibility With Scientific Workflows: Analysing viral genomes with Nextflow. ACM REP’25 ACM Conference on Reproducibility and Replicability, Jul 2025, Vancouver, Canada. ⟨hal-05525260⟩
Anne-Flore Cabouat, Samuel Huron, Tobias Isenberg, Petra Isenberg. Readability as a multi-measure construct in data visualization. CHI 2026 STAR Workshop – Science and Technology for Augmenting Reading, Mar 2026, Barcelona, Spain. ⟨hal-05548028⟩
Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset. Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER. The Fifteenth biennial Language Resources and Evaluation Conference (LREC 2026), May 2026, Palma de Majorque, Spain. ⟨hal-05546569⟩
Julien Bezançon, Gaël Lejeune, Marceau Hernandez. How I Met Your Snowclone: Unsupervised Discovery of Snowclone Patterns in Large Datasets. LREC, May 2026, Palma de Mallorca – Iles Baléares, Spain. ⟨hal-05538898⟩
Amine Benamara, Céline Clavel, Brian Ravenet, Nicolas Sabouret, Julien Saunier. Exploring the role of embodiment on intimacy perception in a multiparty collaborative task. ACM International Conference on Intelligent Virtual Agents (IVA ’24), ACE Workshop Proceedings, Sep 2024, Glasgow, United Kingdom. ⟨hal-04842778⟩
Communication dans un congrès, Communication dans un congrès
Julien Rauch, Damien Rontani, Stéphane Vialle. Towards a Quantum Generative Graph-Based Clustering for Molecule Discovery. Quest-IS, Dec 2025, Palaiseau, France. pp.243-251, ⟨10.1007/978-3-032-13855-2_22⟩. ⟨hal-05549507⟩
Clémentine Bleuze, Fanny Ducel, Maxime Amblard, Karën Fort. COCOA: Creation and Exploratory Investigation of a Corpus of Claims from NLP Articles. LREC 2026 – International Conference on Language Resources and Evaluation, ELRA Language Resources Association, May 2026, Palma de Mallorca, Spain. ⟨hal-05547842⟩
Philippe Boula de Mareüil, Béatrice Akissi Boutin. Évaluation et identification perceptives d’accents ouest-africains en français. Journal of French Language Studies, 2011, 21 (3), pp.361-379. ⟨10.1017/S0959269510000621⟩. ⟨hal-01411677⟩
Victor Spitzer, Francois Sanson. Managing Perturbations in Decision-Focused Learning with Cost Regularization. 27ème édition du congrès annuel de la Société Française de Recherche Opérationnelle et d’Aide à la Décision (ROADEF 2026), Université de Tours, Feb 2026, Tours, France. ⟨hal-05534820⟩
Yucheng Lu, Tobias Rau, Benjamin Lee, Andreas Köhn, Michael Sedlmair, et al.. Design Considerations for Visualization Transitions of 3D Spatial Data in Hybrid AR‐Desktop Environments. Computer Graphics Forum, In press, 45, ⟨10.1111/cgf.70305⟩. ⟨hal-05542621⟩
Mathilde Sassier–Roublin, Amine Benamara, Céline Clavel, Julien Saunier, Alexandre Pauchet. Measuring Group Cohesion: Development and Validation of the Group Cohesion Questionnaire (GCQ). 2026. ⟨hal-05532877⟩
Arthur Fages, Caroline Appert, Olivier Chapuis. DoubleMe: Local Blending in Multi-Display Environments with Augmented Reality to Facilitate Co-Located Collaboration. CHI 2026 – ACM Conference on Human Factors in Computing Systems, Apr 2026, Barcelona, Spain. ⟨10.1145/3772318.3791000⟩. ⟨hal-05542927⟩
Behnoosh Mohammadzadeh, Jules Françoise, Michèle Gouiffès, Baptiste Caramiaux. Sensemaking in User-Driven Algorithm Auditing: A Case Study on Gender Bias in an Image Captioning Model. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, Apr 2026, Barcelona, Spain. ⟨10.1145/3772318.3790784⟩. ⟨hal-05534067⟩
Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials. 2026. ⟨hal-05533706⟩
Hugo Boulier, David Coudert, Frédéric Havet, François Pirot. Colouring the interference digraph of a set of requests in a bidirected tree. 2026. ⟨hal-05536580⟩
E. Delouche, G. Labrosse, E. Tric. The Oscillatory 2D Convective States of a Binary Fluid Confined in Small Cavity. Journal de Physique III, 1996, 6 (11), pp.1527-1534. ⟨10.1051/jp3:1996200⟩. ⟨jpa-00249543⟩
Emmanuel Leriche, G. Labrosse. Vector Potential – Vorticity correlation and corner behaviors from the Stokes eigenmodes in square and cube. CFM 2007 – 18ème Congrès Français de Mécanique, Aug 2007, Grenoble, France. ⟨hal-03362023⟩
Cédric Gendrot, Carolin Schmid, Martine Adda-Decker. F0 Declination in French : Broadcast News versus spontaneous speech.. Nijmegen workshop in Production and Comprehension of Conversational Speech., Dec 2011, Nijmegen, Netherlands. pp.15-17. ⟨halshs-00691494⟩
Can Selcuk, Ivan Delbende, Maurice Rossi. Helical vortex systems: linear analysis and nonlinear dynamics. CFM 2015 – 22ème Congrès Français de Mécanique, Aug 2015, Lyon, France. ⟨hal-03446113⟩
Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset, Khaled Zaouk, et al.. Diart: A Python Library for Real-Time Speaker Diarization. Journal of Open Source Software, 2024, 9 (99), pp.5266. ⟨10.21105/joss.05266⟩. ⟨hal-05530961⟩
Cyriaque Rousselot, Olivier Allais, Philippe Caillou, Julia Mink, Florian Yger. Pesticide Aerial Concentration : Estimations Toward Population Health Impact Assessment. Engineering for Health Annual Forum 2025, Nov 2025, Palaiseau (Ecole Polytechnique), France. ⟨hal-05447129⟩
Clémentine Bleuze, Karën Fort, Vincent P. Martin, Aurélie Névéol. Grands modèles de langue pour la détection de pathologies psychiatriques : promesses, réalité, et enjeux. Journée d’étude “LLM@hopital”, ATALA, Mar 2026, Paris, France. ⟨hal-05532823⟩
Enrico Marchesini, Eva Boguslawski, Alessandro Leite, Christopher Amato, Matthieu Dussartre, et al.. MARL2GRID-TR: A multi-agent RL benchmark in power grid operations. ICLR 2026 – The Fourteenth International Conference on Learning Representations, Apr 2026, Rio de Janeiro, Brazil. ⟨hal-05532479⟩
Fanny Ducel, Aurélie Névéol, Vidit Khazanchi, Loïc Leclere, Arthur Pedrini, et al.. Code-switching as a Bias Indicator in LLMs: “The consequences are not the same para nosotros”. LREC 2026 – 15th biennial Language Resources and Evaluation Conference, May 2026, Palma De Mallorca, Spain. ⟨hal-05529786⟩
Nicolas Hernandez. Un Indice de Structuration de Texte Combinant Finesse et Disponibilité au Niveau Global et Local. SEMAINE DU DOCUMENT NUMERIQUE, Journée d’ATALA, Jun 2004, La Rochelle, France. ⟨sic_00001225⟩
Sharon Peperkamp, Sonya Kaiser, Lori Lamel, Martine Adda-Decker. Cross-linguistic transfer of phonological assimilation in early and late bilinguals. Proceedings of the 46th Annual Meeting of the Cognitive Science Society, 2024, Rotterdam, Netherlands. ⟨hal-04688580⟩
Pierre-Philippe Cortet, Éric Falcon, Mathieu Gibert, Jean-Baptiste Gorce, Marc Lefranc, et al.. Recueil des contributions de la 28e Rencontre du Non-Linéaire 2025. 28e Rencontre du Non-Linéaire 2025, Mar 2025, Paris, France. 2025, 978-2-9576145-4-7. ⟨hal-05523688⟩
Philippe Rambaud. Analyse vidéographique de la motricité spontanée du nouveau-né et de l’enfant. Vision par ordinateur et reconnaissance de formes [cs.CV]. Université Paris-Saclay, 2026. Français. ⟨NNT : 2026UPASG007⟩. ⟨tel-05525738⟩
Athanaël Jousselin, Victor Spitzer, Olivier Péton, Evgeny Gurevsky. Planification optimisée des livraisons d’hydrogène par conteneur de stockage mobile. 27ème Congrès Annuel de la Société Française de Recherche Opérationnelle et d’Aide à la Décision (ROADEF 2026), Feb 2026, Tours, France. ⟨hal-05525251⟩
Younes Djemmal, You Zuo, Kim Gerdes, Kirian Guiller. Citation-Driven Multi-View Training for Patent Embeddings: QaECTER and Sophia-Bench. 2026. ⟨hal-05524063⟩
A. Sergent, Patrice Joubert, P. Le Quéré. Large Eddy Simulation of Rayleigh-Bénard convection in an infinite fluid layer. XXI ICTAM, Aug 2004, Varsovie, Poland. pp.n.a. ⟨hal-00312583⟩
Estelle Chaix, Bertrand Dubreucq, Dialekti Valsamou, Abdelhak Fatihi, Robert Bossy, et al.. Information extraction challenge Gene Regulation Network in Arabidopsis thaliana (GRNA). Paris-Saclay Center for Data Science: Open Software Initiative (OSI), Oct 2015, Orsay, France. pp.14 slides, ⟨10.13140/RG.2.1.4286.6326⟩. ⟨hal-01603453⟩
Ioana Chitoran, Ioana Vasilescu, Lori Lamel, Bianca Vieru. Connected speech in Romanian: Exploring sound change through an ASR system. D. Recasens and F. Sánchez Miret (Eds.). Production and perception mechanisms of sound change, Lincom Europa, pp.129-143, 2018. ⟨hal-03127939⟩
Jonathan Colin. Enabling Diversity of Content Sources in Online Social Networks. Social and Information Networks [cs.SI]. Université Paris-Saclay, 2025. English. ⟨NNT : 2025UPASG100⟩. ⟨tel-05523131⟩
Charles Perin, Tica Lin, Lijie Yao, Yalong Yang, Maxime Cordeil, et al.. First-Person Visualizations for Outdoor Physical Activities: Challenges and Opportunities. 2024. ⟨hal-05519489⟩
Nicanor Carrasco-Vargas, Benjamin Hellouin de Menibus, Rémi Pallen. Parametrized complexity of relations between multidimensional subshifts. 2026. ⟨hal-05499852⟩
Oralie Cattan, Christophe Servan, Sophie Rosset. On the Usability of Transformers-based models for a French Question-Answering task. Joint Conference of the Information Retrieval Communities in Europe (CIRCLE) 2022, Jul 2022, Samatan, France. ⟨hal-03701740⟩
Marie Schmit, Ulysse Le Clanche, George Marchment, Sarah Cohen-Boulakia, Olivier Dameron, et al.. ShareFAIR-KG. 2025, ⟨swh:1:rev:a96c28345857df13aafbc85461df2ab0cf259ef8;origin=https://gitlab.liris.cnrs.fr/sharefair/knowledge_base_workflow_annotations/ShareFAIR-KG.git;visit=swh:1:snp:e53f57ef45e971e1b9b76477d1f39f4fbe3de77f⟩. ⟨hal-05517690⟩
Gen Wu, Nicolas Grenier, Caroline Nore. Two-fluid Compressible Flows with Multiresolution Adaptive Mesh Refinement. 12th International Conference on Multiphase Flow, May 2025, Toulouse, France. ⟨hal-05499499⟩
Nicolas Grenier, Marie-Christine Duluc. Strong interaction between a gas bubble and a free surface in a confined domain at subatmospheric pressure. 12th International Conference on Multiphase flow, May 2025, Toulouse, France. ⟨hal-05498135⟩
Marie Schmit, Ulysse Le Clanche, George Marchment, Sarah Cohen-Boulakia, Olivier Dameron, et al.. A Standards-Based Knowledge Graph that Bridges Scientific Workflows, Run-Time Provenance, and Tool Registries. SWAT4HCLS 2026, Mar 2026, Amsterdam, Netherlands. ⟨hal-05517640⟩
Anton Batliner, Stefan Steidl, Björn Schuller, Dino Seppi, Thurid Vogt, et al.. Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech. Computer Speech and Language, 2010, 25 (1), pp.4. ⟨10.1016/j.csl.2009.12.003⟩. ⟨hal-00661911⟩
S. Al Moubayed, M. Baklouti, M. Chetouani, T. Dutoit, A. Mahdhaoui, et al.. Generating Robot/Agent backchannels during a storytelling experiment. 2009 IEEE International Conference on Robotics and Automation (ICRA), May 2009, Kobe, Japan. pp.3749-3754, ⟨10.1109/ROBOT.2009.5152572⟩. ⟨hal-02423468⟩
Stéphanie Pellerin, William Daussin. Simulation numérique d’une écoulement au-dessus d’une rampe à l’aide de la méthode des frontières immergées. CFM 2013 – 21ème Congrès Français de Mécanique, Aug 2013, Bordeaux, France. ⟨hal-03439770⟩
Yoren Gaffary, Victoria Eyharabide, Jean-Claude Martin, Mehdi Ammi. The Impact of Combining Kinesthetic and Facial Expression Displays on Emotion Recognition by Users. International Journal of Human-Computer Interaction, 2014, 30 (11), pp.904-920. ⟨10.1080/10447318.2014.941276⟩. ⟨hal-03278306⟩
Ali Oker, Matthieu Courgeon, Elise Prigent, Victoria Eyharabide, Nadine Bazin, et al.. A Virtual Reality Study of Help Recognition and Metacognition with an Affective Agent. International Journal of Synthetic Emotions, 2015, 6 (1), pp.60-73. ⟨10.4018/IJSE.2015010104⟩. ⟨hal-03278297⟩
Gabriela Gomes Fernandes, Xavier Sanchez, Jean-Claude Martin, Brian Ravenet. Application motivationnelle pour la course à pied : démarche de conception participative et analyse des besoins des participants. 2024. ⟨hal-04450126⟩
Brigitte Grau, Anne-Laure Ligozat, Isabelle Robba, Anne Vilnat, Laura Monceaux. FRASQUES : A Question-Answering System in the EQueR Evaluation Campaign. LREC 2006, 2006, Genoa, Italy. pp.N/P. ⟨hal-00456700⟩
Jean-Camille Chassaing, C. T. , Nitschke, Angela Vincenti, Paola Cinnella, D Lucor. Advances in Parametric and Model-Form Uncertainty Quantification in Canonical Aeroelastic Systems. Aerospace Lab, 2018, 14, ⟨10.12762/2018.AL14-07⟩. ⟨hal-03892600⟩
Kevin Bouaou, Thomas Dietenbeck, Gilles Soulat, Sophia Houriez–Gombaud-Saintonge, Ioannis Bargiotas, et al.. 5.6 Aortic Pressure Behind Flow Disorganization in Aneurismal Aorta: a Magnetic Resonance Imaging Study. Artery Research, 2019, Budapest, Hungary. pp.S42, ⟨10.2991/artres.k.191224.035⟩. ⟨hal-03943714⟩
Jean-Marie Burkhardt, Francine Behar-Cohen, Ouriel Grynszpan, Evelyne Klinger, Régis Lobjois, et al.. Effets sanitaires liés à une exposition aux technologies de réalité virtuelle et/ou augmentée. [0] Saisine n° 2017-SA-0076, Anses. 2021, 293 p. ⟨anses-03788395⟩
Léa Pacini, Jérôme Dupire, Isabelle Barbet, Olivier Pons, Camille Guinaudeau, et al.. Textbook’s accessibility for children with dyspraxia and visual disability. 17th International Conference of the Association for the Advancement of Assistive Technology in Europe, AAATE 2023, Association for the Advancement of Assistive Technology in Europe, Aug 2023, Paris, France. ⟨hal-04410340⟩
S. Ghannay, Y. Estève, Nathalie Camelin, Dutrey Camille, Fabian Santiago Vargas, et al.. Utilisation des représentations continues des mots et des paramètres prosodiques pour la détection des erreurs dans les transcriptions automatiques de la parole. Journée d’Études sur la Parole (2016), Jul 2016, Paris, France. pp.723-731. ⟨halshs-01428013⟩
M. Crialesi-Esposito, G. Boffetta, L. Brandt, Sergio Chibbaro, S. Musacchio. How small droplets form in turbulent multiphase flows. Physical Review Fluids, 2024, 9 (7), pp.L072301. ⟨10.1103/PhysRevFluids.9.L072301⟩. ⟨hal-05356010⟩
Chloé Clavel, Ioana Vasilescu, Gael Richard, Laurence Devillers. Voiced and Unvoiced Content of fear-type emotions in the SAFE Corpus. Speech Prosody, May 2006, Dresden, Germany. ⟨10.21437/SpeechProsody.2006-153⟩. ⟨hal-03153935⟩
Chloé Clavel, Ioana Vasilescu, Laurence Devillers, Gael Richard, Thibaut Ehrette, et al.. The SAFE Corpus: illustrating extreme emotions in dynamic situations. LREC Workshop on Corpora for Research on Emotion and Affect, May 2006, Genoa, Italy. ⟨hal-03153931⟩
Yann Fraigneau, Patrick Le Quéré. Simulations numériques 3D d’écoulements de convection naturelle à haut nombre de. CFM 2007 – 18ème Congrès Français de Mécanique, Aug 2007, Grenoble, France. ⟨hal-03361618⟩
Li Lorang, Florian Abéguilé, Yann Fraigneau, Christian Tenaud. Identification de systèmes dynamiques dans un canal plan turbulent à l’aide de réseaux de neurones. CFM 2007 – 18ème Congrès Français de Mécanique, Aug 2007, Grenoble, France. ⟨hal-03361700⟩
Florian Abéguilé, Yann Fraigneau, Li Lorang, Christian Tenaud. Génération de conditions aux limites amont pour les simulations de type LES des écoulements de paroi. CFM 2007 – 18ème Congrès Français de Mécanique, Aug 2007, Grenoble, France. ⟨hal-03361612⟩
J. Basley, N. Delprat, F. Lusseyran, L.R. Pastur, Peter J. Schmid. Dynamic/Koopman modes of an experimental incompressible cavity flow. 8th European Fluid Mechanics Conference (EFMC8), Sep 2010, Bad Reichenhall, Germany. ⟨hal-01052841⟩
Laurence Devillers, Annie Blandin-Obernesser, Elodie Gentina, Fabrice Le Guel, Michel Robert, et al.. Portrait(s) de France(s) : Numérique, quels enjeux pour la société ?. The Conversation France, 2022. ⟨hal-04069014⟩