From
Time -
Location
Algorithmes Learning and Computation, Data Science, IA, Thesis
Speaker : Bruno ARISTIMUNHA PINTO
Electroencephalography (EEG) decoding faces a fundamental challenge: deep learning methods that transformed the field require massive datasets, yet EEG studies typically collect only hundreds to thousands of trials per subject. This data scarcity, combined with high inter-subject variability and low signal-to-noise ratios, limits the application of modern machine learning approaches. This thesis investigates representation learning strategies to address EEG data constraints, proposing methods to reuse, create, and constrain representations while establishing infrastructure for reproducible evaluation. In the first part, we focused on reusing representations through transfer learning across cognitive tasks. We investigated whether features learned on one EEG paradigm could improve decoding on another using linear probing to measure transferability across 11 paradigms from two datasets comprising 135 subjects. We found that transfer is highly asymmetric: some tasks serve as effective sources for multiple targets, while others benefit from no pre-training. Linear probing yielded accuracy gains, and hierarchical clustering of transfer matrices revealed that tasks recruiting similar cognitive processes cluster together. In the second part, we focused on creating representations through generative modeling to augment limited datasets. We trained latent diffusion models with a novel spectral loss on sleep EEG data from two datasets to generate realistic 30-second epochs. The spectral loss enforces frequency-domain fidelity by minimizing differences between short-time Fourier transforms of real and synthetic signals, preserving physiologically relevant oscillations in delta, theta, and alpha bands. The generated signals achieved low Fréchet Inception Distance and correct power spectral density profiles, demonstrating that diffusion models can synthesize EEG suitable for privacy-preserving data sharing. In the third part, we addressed the challenge of parameter efficiency by constraining representations with geometric priors. We proposed Phase-SPDNet, which combines Riemannian geometry on the symmetric positive-definite manifold with Takens’ theorem-based phase-space reconstruction. This approach enriches spatial covariance matrices with temporal dynamics through delay embedding, enabling effective decoding with minimal electrodes. Evaluated on six motor imagery datasets using only three electrodes, Phase-SPDNet achieved state-of-the-art performance. In the fourth part, we addressed the reproducibility crisis in EEG decoding by developing standardized benchmarking infrastructure. We organized a large-scale challenge on 3,000+ subjects with six cognitive tasks, revealing that cross-subject generalization remains substantially harder than cross-task generalization. Beyond the competition, we developed three open-source tools: MOABB for standardized benchmarking, Braindecode as a PyTorch model zoo, and EEGDash for dataset cataloging. These tools have collectively exceeded one million installations.
Keywords: Deep Learning,Electroencephalography,Decoding