The Data Science department brings together four teams with recognized and complementary expertise, covering the modeling, collection, management, analysis and construction of data and knowledge (A&O, Bioinfo, LaHDAK, Rocs), making it possible to explore synergies between expertise related to data, learning and optimization, particularly in connection with the fields of bioinformatics, IoT and data graphs.
Digital traces of all human activities are now available in all fields, data that is often massive, heterogeneous, dynamic and of variable quality (the 4 V’s of Big Data: Volume, Variety, Velocity, Veracity). Their exploitation leads to the definition of a fourth scientific paradigm: the design and validation of hypotheses, theoretical models and algorithms, guided by the data and in interaction with domain experts. The Data Science department is interested in robustly addressing the challenges of the 4Vs, in terms of scaling up in the face of data volume and velocity, and resisting diversity and quality bias. These goals define new computational issues in storage, communication, analysis and processing optimization, data query and enrichment, knowledge discovery, and model learning.
With nearly 40 researchers and teacher-researchers, the Data Science department covers a broad spectrum of fundamental and application-related topics: databases, data mining, semantic web, knowledge representation, algorithms, combinatorics, stochastic and distributed optimization, statistical learning and neural networks, communication networks, simulation. It also has extensive expertise in interdisciplinary research and dialogue with experts in the application domains (particularly in biology, medicine, human and social sciences, and experimental physics), allowing privileged access to data of interest and to the evaluation of models and algorithms.
Djamel Mesbah, Nour El Madhoun, Khaldoun Al Agha, Hani Chalouati. Exploring NLP Techniques for Code Smell Detection: A Comparative Study. The 39th International Conference on Advanced Information Networking and Applications (AINA-2025), Apr 2025, Barcelone, Spain. ⟨hal-05014865⟩
Vanessa Peña-Araya, Consuelo Martínez Fontaine, Xiang Wei, Guillaume Delpech, Anastasia Bezerianos. Uncertainty in Science is Malleable. Advocating for User-Agency in Defining Uncertainty in Visualizations: a Case Study in Geology. CHI 2025 – 43rd SIGCHI conference on Human Factors in computing systems, ACM, Apr 2025, Yokohama, Japan. ⟨10.1145/3706598.3713972⟩. ⟨hal-05004600⟩
Ying Wang, Anne Sergent, Didier Saury, Denis Lemonnier, Patrice Joubert. Gas radiation effect on a turbulent thermal plume in a confined cavity using direct numerical simulation. International Journal of Thermal Sciences, 2025, 213, pp.109820. ⟨10.1016/j.ijthermalsci.2025.109820⟩. ⟨hal-04997739⟩
Lisa Raithel, Philippe Thomas, Bhuvanesh Verma, Roland Roller, Hui-Syuan Yeh, et al.. Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese. The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.170-182. ⟨hal-04781015⟩
Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi. Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient’s Point of View. 2025. ⟨hal-04992084⟩
Marc Baboulin, Oguz Kaya, Theo Mary, Matthieu Robeyns. Numerical stability of tree tensor network operations, and a stable rounding algorithm. 2025. ⟨hal-04996127⟩
Mathieu Constant, Marie Candito, Yannick Parmentier, Carlos Ramisch, Agata Savary. Construction, exploitation et exploration de ressources linguistiques pour le traitement automatique des expressions polylexicales en français : le projet PARSEME-FR. Lidia Becker; Julia Kuhn; Christina Ossenkop; Claudia Polzin-Haumann; Elton Prifti. Digitale romanistische Sprachwissenschaft: Stand und Perspektiven, Narr Francke Attempto Verlag GmbH + Co. KG, pp.219-250, 2023, Romanistisches Kolloquium, 978-3-8233-8506-6. ⟨hal-04995189⟩
Inoussa Ouedraogo, Huyen Nguyen, Patrick Bourdot. Where to Draw the Line: Physical Space Partitioning and View Privacy in AR-based Co-located Collaboration for Immersive Analytics. ACM 2024 Symposium on Spatial User Interaction (SUI ’24), Oct 2024, Trier, Germany. pp.20, ⟨10.1145/3677386.3682085⟩. ⟨hal-04701730⟩
Rémi Uro. Détection et caractérisation des interruptions dans les interactions orales pour la description du comportement des femmes et des hommes dans les contenus audiovisuels. Informatique et langage [cs.CL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG055⟩. ⟨tel-04994439⟩
Cyriaque Rousselot, Olivier Allais, Philippe Caillou, Julia Mink, Florian Yger. Assessing the Impact of Pesticide Exposure on Population Health Using Large-Scale Data Integration and Modelling. E4H Annual Forum, Nov 2024, Palaiseau (Ecole Polytechnique), France. ⟨hal-04991195⟩
Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, et al.. PETRA: Parallel End-to-end Training with Reversible Architectures. 2024. ⟨hal-04594647v1⟩
Cyriaque Rousselot, Olivier Allais, Philippe Caillou, Julia Mink, Florian Yger. Assessing Impact of Pesticide Exposure on Child Health using Large-Scale Data Integration and Modelling. Workshop Qualité de l’Air, Agriculture et Santé Humaine, Nov 2024, Saint Rémy lès Chevreuses, France. ⟨hal-04991175⟩
Artur Gesla. Numerical investigation of the dynamics of an axisymmetric rotor-stator flow. Fluid mechanics [physics.class-ph]. Sorbonne Université, 2024. English. ⟨NNT : 2024SORUS470⟩. ⟨tel-04990045⟩
Vincent Cavez, Catherine Letondal, Caroline Appert, Emmanuel Pietriga. EuterPen: Unleashing Creative Expression in Music Score Writing. CHI 2025 – 43rd SIGCHI conference on Human Factors in computing systems, Apr 2025, Yokohama, Japan. 2025. ⟨hal-04989074⟩
Ivan Delbende, Maurice Rossi, C Selçuk. Saute-mouton de vortex hélicoïdaux : une approche de type système dynamique. congres francais de mécanique, Aug 2019, brest, France. ⟨hal-03064492⟩
Victor Spitzer, Céline Gicquel, Evgeny Gurevsky, François Sanson. An approximate dynamic programming approach for multi-stage stochastic lot-sizing under a Decision-Hazard-Decision information structure. 2025. ⟨hal-04987947v2⟩
Vincenzo Maria Schimmenti, Giuseppe Petrillo, Alberto Rosso, François P. Landes. Assessing the Predictive Power of GPS‐Based Ground Deformation Data for Aftershock Forecasting. Seismological Research Letters, 2024, 95 (6), pp.3243-3249. ⟨10.1785/0220240008⟩. ⟨hal-04986097⟩
Amel Fraisse, Patrick Paroubek, Ramit Goyal, Nassreddine Znaidi. Measuring Multilingualism in Online Public Access Catalogs. The ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Dec 2024, Hong Khong, China. ⟨hal-04986773⟩
Manon Scholivet, Agata Savary, Louis Estève, Marie Candito, Carlos Ramisch. SELEXINI – a large and diverse automatically parsed corpus of French. Building and Using Comparable Corpora (BUCC), Jan 2025, Abu DHABI, United Arab Emirates. ⟨hal-04978746⟩
Michele de Bonis, Huyen Nguyen, Patrick Bourdot. Context-Based Annotation Visualisation in Virtual Reality: A Use Case in Archaeological Data Exploration. EuroXR 21st International Conference 2024, Nov 2024, Athen, Greece. pp.95-119, ⟨10.1007/978-3-031-78593-1_7⟩. ⟨hal-04978190⟩
Katerina Batziakoudi, Florent Cabric, Stéphanie Rey, Jean-Daniel Fekete. Lost in Magnitudes: Exploring Visualization Designs for Large Value Ranges. CHI 2025 – ACM Conference on Human Factors in Computing Systems, Apr 2025, Yokohama, Japan. ⟨10.1145/3706598.3713487⟩. ⟨hal-04981028⟩
Yue Zhao, Yunhai Wang, Xu Luo, Yanyan Wang, Jean-Daniel Fekete. Libra: An Interaction Model for Data Visualization. CHI 2025 – ACM Conference on Human Factors in Computing Systems, Apr 2025, Yokohama, Japan. ⟨10.1145/3706598.3713769⟩. ⟨hal-04979983⟩
Daeun Jeong, Sungbok Shin, Jongwook Jeong. Conversation Progress Guide : UI System for Enhancing Self-Efficacy in Conversational AI. ACM Conference on Human Factors in Computing Systems (CHI2025′), Apr 2025, Yokohama, Japan. ⟨10.1145/3706598.3714222⟩. ⟨hal-04977135v2⟩
William Turner, Jean-Baptiste Meyer, Paul de Guchteneire, Asmaa Azizi. Diaspora Wissensnetzwerke. Uwe Hunger; Kathrin Kissau. Internet und Migration : Theoretische Zugange und empirische Befunde, VS Verlag, pp.90-130, 2009, ⟨10.1007/978-3-531-91902-7_6⟩. ⟨hal-04972745⟩
Michel Beaudouin-Lafon. L’usage du son dans les systèmes interactifs. Journées d’informatique musicale 1994, 1994, Bordeaux, France. 6 p. ⟨hal-03110430⟩
Renaud Blanch, Michel Beaudouin-Lafon, Stéphane Conversy, Yannick Jestin, Thomas Baudel, et al.. INDIGO: une architecture pour la conception d’applications graphiques interactives distribuées. IHM’05 17ème Conférence Francophone sur l’Interaction Homme-Machine, ACM, Sep 2005, Toulouse, France. pp.139-146 / ISBN: 978-1-4503-3844-8, ⟨10.1145/1148550.1148568⟩. ⟨hal-01285967⟩
Jean Caelen, Michel Beaudouin-Lafon, Francesco Cara, Jean-Michel Hoc, Isabelle Bazet, et al.. Interfaces Homme-Machine. GANASCIA, Jean-Gabrie. Communication et connaissance : Supports et médiations à l’âge de l’information, CNRS Éditions; Editions du CNRS, pp.79-96, 2006, Sciences et techniques de l’ingénieur, 978-2-271-12824-9. ⟨10.4000/books.editionscnrs.30758⟩. ⟨hal-04493868⟩
Xiao Xiao, Sarah Fdili Alaoui. Tuning In to Intangibility : Reflections from My First 3 Years of Theremin Learning. DIS ’24: Designing Interactive Systems Conference, Jul 2024, IT University of Copenhagen Denmark, France. pp.2649-2659, ⟨10.1145/3643834.3661584⟩. ⟨hal-04930549⟩
Maria Sayu Yamamoto. Addressing the Large Variability of EEG Data with Riemannian Geometry : Toward Designing Reliable Brain-Computer Interfaces. Machine Learning [cs.LG]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG098⟩. ⟨tel-04967163⟩
Victor Spitzer, Céline Gicquel, Evgeny Gurevsky, François Sanson. Structural learning of electricity prices for production. 26th Congress of the French Society of Operations Research and Decision Aid (ROADEF 2025), Feb 2025, Champs-sur-Marne, France. ⟨hal-04967831⟩
Micheline Elias, Marie-Aude Aufaure, Anastasia Bezerianos. Storytelling in Visual Analytics Tools for Business Intelligence. 14th International Conference on Human-Computer Interaction (INTERACT), Sep 2013, Cape Town, South Africa. pp.280-297, ⟨10.1007/978-3-642-40477-1_18⟩. ⟨hal-00817732v2⟩
Lyes Bordja, Laurette Tuckerman, Laurent Martin Witkowski, Mérida Cruz Navarro, Rachid Bessaih. Oscillations dans la convection de Rayleigh Bénard en cavité cylindrique. CFM 2009 – 19ème Congrès Français de Mécanique, Aug 2009, Marseille, France. ⟨hal-03390736⟩
Camille Challant. Représentation formelle avec AZee et contraintes grammaticales pour la langue des signes française. Théorie et langage formel [cs.FL]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG086⟩. ⟨tel-04957486⟩
Luc Lebon, Chi-Tuong Pham, Paul Boniface, Laurent Limat. Bénard-von Kármán vortex street in a confined geometry: wavelength selection by Kelvin-Helmholtz instabilities. 2025. ⟨hal-04957697⟩
Benjamin Hellouin de Menibus, Pacôme Perrotin. Subshifts Defined by Nondeterministic and Alternating Plane-walking Automata. 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025), Mar 2025, Iena, Germany. pp.56, ⟨10.4230/LIPIcs.STACS.2025.56⟩. ⟨hal-04951292⟩
Eva Boguslawski, Alessandro Leite, Matthieu Dussartre, Benjamin Donnot, Marc Schoenauer. Emulation of Zonal Controllers for the Power System Transport Problem. RJCIA – PFIA 2024 – 23èmes Rencontres des Jeunes Chercheurs en Intelligence Artificielle, Jul 2024, La Rochelle, France. ⟨hal-04942704⟩
Mathieu Lerouge, Céline Gicquel, Vincent Mousseau, Wassila Ouerdane. Modeling and generating user‐centered contrastive explanations for the workforce scheduling and routing problem. International Transactions in Operational Research, 2024, ⟨10.1111/itor.13594⟩. ⟨hal-04933808⟩
Zheng Zhang, Brian Denton, Xiaolan Xie. Branch and Price for Chance-Constrained Bin Packing. INFORMS Journal on Computing, 2020, 32 (3), pp.547-564. ⟨10.1287/ijoc.2019.0894⟩. ⟨hal-04941861⟩
Stacy Hsueh, Marianela Ciolfi Felice, Sarah Fdili Alaoui, Wendy E Mackay. What Counts as ‘Creative’ Work? Articulating Four Epistemic Positions in Creativity-Oriented HCI Research. CHI 2024 – Conference on Human Factors in Computing Systems, May 2024, Honolulu, United States. pp.1 – 15, ⟨10.1145/3613904.3642854⟩. ⟨hal-04930572⟩
Simon Devauchelle, David Doukhan, Lucas Ondel Yang, Benjamin Élie, Albert Rilliard. Estimation automatique de caractéristiques acoustiques pour l’étude diachronique du français oral dans les médias. Atelier DAHLIA: DigitAl Humanities and cuLtural herItAge: data and knowledge management and analysis, Claudia Marinica; Fabrice Guillet; Florent Laroche, Jan 2025, Strasbourg, France. ⟨hal-04938377⟩
Dominique Laurent, Nicolas Spyratos. Consistent query answering in multi-relation databases. Information and Computation, 2025, 303, pp.105279. ⟨10.1016/j.ic.2025.105279⟩. ⟨hal-04938667⟩
Victor Bréhault, Emmanuel Dubois, Arnaud Prouzeau, Marcos Serrano. A Systematic Literature Review to Characterize Asymmetric Interaction in Collaborative Systems. CHI 2025 – Conference on Human Factors in Computing Systems, ACM, Apr 2025, Yokohama, Japan. ⟨hal-04940558⟩
Rémi Uro, David Doukhan. Pendant le confinement, le temps de parole des femmes a baissé à la télévision et à la radio. La revue des médias, 2020. ⟨hal-04906221⟩
Laetitia Biscarrat, Marlène Coulomb-Gully, Cécile Méadel, David Doukhan, Lucie Alexis, et al.. L’invisibilité des femmes dans les médias : un combat dépassé ?. Journée de restitution et d’échanges, Laetitia Biscarrat, Dec 2021, Nice, France. ⟨10.58079/17q5⟩. ⟨hal-04935784⟩
Marie-Christine Volk, Didier Lucor, Anne Sergent, Michael Mommert, Christian Bauer, et al.. A PINN Methodology for Temperature Field Inference in the PIV Measurement Plane: Case of Rayleigh-Bénard Convection. Joint event Euromech Colloquium on Data-Driven Fluid Dynamics/2nd ERCOFTAC Workshop on Machine Learning for Fluid Dynamics, Apr 2025, London, United Kingdom. ⟨hal-04924426⟩
Jai Kumar, Anne Sergent, Francesca Chillà, Julien Salort, Didier Lucor. Bridging Experimental shadowgraphs and DNS in Turbulent Convection Using physically-informed U-Net. Joint event Euromech Colloquium on Data-Driven Fluid Dynamics/2nd ERCOFTAC Workshop on Machine Learning for Fluid Dynamics, Apr 2025, London, United Kingdom. ⟨hal-04924440⟩
Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. “Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Apr 2025, Albuquerque, United States. ⟨hal-04938811⟩
Léa Paymal, Sarah Homewood. Good Days, Bad Days: Understanding the Trajectories of Technology Use During Chronic Fatigue Syndrome. CHI 2024 – CHI Conference on Human Factors in Computing Systems, ACM, May 2024, Honolulu (HI), United States. pp.1-10, ⟨10.1145/3613904.3642553⟩. ⟨hal-04933033⟩
Nathan Carbonneau, Julien Salort, Yann Fraigneau, Anne Sergent. Influence of wind on heat transfer in turbulent convection with roughness. 2025. ⟨hal-04928422⟩
Benoît Choffin, Fabrice Popineau, Yolaine Bourda, Jill-Jênn Vie. DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills. JDSE 2019 – Paris-Saclay Junior Conference on Data Science and Engineering, Sep 2019, Gif-sur-Yvette, France. ⟨hal-03427048⟩
Maxime Mahout, Ross Carlson, Sabine Peres. Answer Set Programming for Computing Constraints-Based Elementary Flux Modes: Application to Escherichia coli Core Metabolism. Processes, 2020, 8 (12), pp.1649. ⟨10.3390/pr8121649⟩. ⟨hal-03209519⟩
Sarah Fdili Alaoui. Les Intelligences Artificielles à l’interface du corps dansant. Bulletin de l’Association Française pour l’Intelligence Artificielle, 2024, 125, pp.40-44. ⟨hal-04930835⟩
Apolline Mellot. Machine learning and domain adaptation for enhancing the measure of brain health with MEG and EEG signals. Artificial Intelligence [cs.AI]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG068⟩. ⟨tel-04906458⟩
Seokho Ahn, Hyungjin Kim, Sungbok Shin, Young-Duk Seo. Real-Time Calibration Model for Low-Cost Sensor in Fine-Grained Time Series. AAAI 2025 – The 39th Annual AAAI Conference on Artificial Intelligence, AAAI – Association for the Advancement of Artificial Intelligence, Feb 2025, Philadelphia (PA), United States. ⟨hal-04906168⟩
Nathan Carbonneau, Julien Salort, Anne Sergent. Spatial modulation of the small coherent structures above a rough plate in turbulent Rayleigh-Bénard convection. 1st European Fluid Dynamics Conference (EFDC1), Sep 2024, Aachen, Germany. ⟨hal-04924300⟩
Marion Ficher, Tom Bauer, Anne-Laure Ligozat. A comprehensive review of the end-of-life modeling in LCAs of digital equipment. International Journal of Life Cycle Assessment, 2024, 30 (1), pp.20-42. ⟨10.1007/s11367-024-02367-x⟩. ⟨hal-04924691⟩
Laetitia Biscarrat, Lucie Alexis, Evi Basile-Commaille, Laure Beaulieu, David Doukhan, et al.. Geschlechterordnung und Radiowellen. Frauen in den französischen Morgennachrichten. Rundfunk und Geschichte , 2024, 50 (1-2), p. 68-81. ⟨hal-04924717⟩
Alexandre Battut, Kevin Ratovo, Michel Beaudouin-Lafon. OneTrace : Improving Event Recall and Coordination With Cross-Application Interaction Histories. International Journal of Human-Computer Interaction, 2024, pp.1-18. ⟨10.1080/10447318.2024.2332848⟩. ⟨hal-04923157⟩
Wendy E Mackay, Alexandre Battut, Germàn Leiva, Michel Beaudouin-Lafon. VideoClipper: Rapid Prototyping with the “Editing-in-the-Camera” Method. CHI 2024 – CHI Conference on Human Factors in Computing Systems, May 2024, Honolulu, United States. pp.1-14, ⟨10.1145/3613904.3642458⟩. ⟨hal-04923092⟩
Anne Sergent, Soufiane Mrini, Elian Bernard, Didier Lucor. Lagrangian Measurements and Physics-Informed Neural Network for Rayleigh-Bénard Flow Reconstruction. 26th International Congress of Theoretical and Applied Mechanics (ICTAM2024), Aug 2024, Daegu, South Korea. ⟨hal-04924332⟩
Eva Boguslawski, Alessandro Leite, Benjamin Donnot, Matthieu Dussartre, Marc Schoenauer. Emulation of Zonal Controllers for the Power System Transport Problem. ML4SPS 2024 – Machine Learning for Sustainable Power Systems ECML 2024 Workshop, European Conference of Machine Learning, Sep 2024, Vilnius, Lithuania. ⟨hal-04924271⟩
Françoise Detienne, Chloé Le Bail, Michael J Baker. Mobilisation des valeurs dans le processus de conception. Stéphane Safin. Les activités cognitives de conception en architecture, ISTE Group, pp.59-80, 2025, 9781789482041. ⟨10.51926/ISTE.9204.ch2⟩. ⟨hal-04925972⟩
Roger Ballester Claret, Nicolò Fabbiane, Christian Fagiano, Cédric Julien, Didier Lucor. Reliability Based Aeroelastic Design Optimization of Composite Wings via Surrogate Modeling. AIAA SCITECH 2025 Forum, Jan 2025, Orlando, United States. pp.AIAA 2025-0964, ⟨10.2514/6.2025-0964⟩. ⟨hal-04923527⟩
Ying Wang, Anne Sergent, Didier Saury, Denis Lemonnier, Patrice Joubert. Gas radiation effect on a turbulent thermal plume in a confined cavity using direct numerical simulation. 2025. ⟨hal-04924224⟩
Nicolas Atienza, Christophe Labreuche, Johanne Cohen, Michèle Sebag. Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach. ICLR 2025 – The Thirteenth International Conference on Learning Representations, Apr 2025, Singapore (SG), Singapore. ⟨hal-04922382⟩
Mehdi Chakhchoukh. Visualization to Support Multi-Criteria Decision-making in Agronomy. Human-Computer Interaction [cs.HC]. Université Paris-Saclay, 2024. English. ⟨NNT : 2024UPASG085⟩. ⟨tel-04926672⟩
Roland Cahen, Matthieu Savary, Roman Weil, Bianca Pica, Charlie Gouin, et al.. Carnet de recherches documentaires de l’Atelier de Projets IMPACT 2025, ENSCi les Ateliers. 2024. ⟨hal-04868538v2⟩
Giovanni Catania, Aurélien Decelle, Beatriz Seoane. The Copycat Perceptron: Smashing Barriers Through Collective Learning. Physical Review E , 2024, 109 (6), pp.065313. ⟨10.1103/PhysRevE.109.065313⟩. ⟨hal-04918895⟩
Léa-Marie Lam-Yee-Mui. Modélisations pour la reconnaissance de la parole à données contraintes. Traitement du signal et de l’image [eess.SP]. Université Paris-Saclay, 2024. Français. ⟨NNT : 2024UPASG075⟩. ⟨tel-04918814⟩