Stage
Position type : Sciences et Technologies des langues
1 document Published on
The goal of this internship is to explore the most effective PEFT methods specifically designed for speech-related tasks, such as Spoken Language Understanding (SLU). SLU task aims to extract semantic information from the speech signal such as weather type, date, location…
Recent approaches are based on End-to-end architectures based on an SSL speech encoder with a few additional layers [Laperriere et al., 2023, Dinarelli et al., 2022, Laperriere et al., 2024].
The different tasks of the internship are summarized as follows
Speech technologies have advanced rapidly, powering applications such as virtual assistants and voice-controlled systems; however, adapting these models to new domains remains a significant challenge, particularly in lowresource settings. Self-supervised learning (SSL) improves generalization by learning from unlabeled data, yet fine-tuning large SSL models is costly and risks overfitting with limited labeled data. This has driven interest in Parameter-Efficient Fine-Tuning (PEFT) methods, which adapt models by updating
only a small set of task-specific parameters.
Compared to full fine-tuning, PEFT significantly reduces memory, computation, and overfitting risks, making
it a promising solution for efficient and flexible model adaptation. PEFT methods have gained interest for their ability to prevent catastrophic forgetting and reduce computational costs by updating only a small subset of task-specific parameters while keeping most of the SSL model frozen. They fall into four main categories [Han et al., 2024]: Additive PEFT (e.g., adapters) that introduce new modules [Lester et al., 2021]; Selective PEFT (e.g., BitFit) that fine-tune selected parameters [Zaken et al., 2022]; Re-parameterized PEFT (e.g., LoRA) using low-rank updates [Hu et al., 2021]; and Hybrid PEFT (e.g., MAM Adapter) that combine multiple strategies [He et al., ].
PEFT methods, initially explored for Large Language Models (LLMs), are now gaining attention in speech
processing, especially for automatic speech recognition (ASR) tasks. Sparse LoRA has been used to adapt
Whisper for child speech recognition and multilingual ASR with reduced memory usage [Radford et al., 2023,
Song et al., 2024, Liu et al., 2024]. These approaches are also applied in federated learning for ASR and enhance performance in speech emotion recognition, spoken language understanding, and multilingual Text-to-Speech tasks [Ali et al., 2025, Lashkarashvili et al., 2024, Kim et al., 2023, Kwon et al., 2025].
Within this framework, the internship aims to assess the performance of various PEFT methods on the Spoken Language Understanding (SLU) task, through systematic exploration of different SSL models.
Parameter-Efficient Fine-Tuning, Self-Supervised Learning, Automatic Speech processing.
ere et al., 2024] Laperriere, G., Ghannay, S., Jabaian, B., and Esteve, Y. (2024). A dual task learning approach to fine-tune a multilingual semantic speech encoder for spoken language understanding. In Interspeech. [Laperriere et al., 2023] Laperriere, G., Pelloin, V., Rouvier, M., Stafylakis, T., and Esteve, Y. (2023). On the use of semantically-aligned speech representations for spoken language understanding. In SLT. [Lashkarashvili et al., 2024] Lashkarashvili, N., Wu, W., Sun, G., and Woodland, P. C. (2024). Parameter efficient finetuning for speech emotion recognition and domain adaptation. In ICASSP. [Lee et al., 2024] Lee, B., Calapodescu, I., Gaido, M., Negri, M., and Besacier, L. (2024). Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond. In Interspeech. [Lester et al., 2021] Lester, B., Al-Rfou, R., and Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In EMNLP. [Liu et al., 2024] Liu, W., Qin, Y., Peng, Z., and Lee, T. (2024). Sparsely shared lora on whisper for child speech recognition. In ICASSP. [Mdhaffar et al., 2024] Mdhaffar, S., Bougares, F., De Mori, R., Zaiem, S., Ravanelli, M., and Esteve, Y.(2024). Taric-slu: A tunisian benchmark dataset for spoken language understanding. COLING.Around 680 €/months + access to staff dining facilities and the campus’s many sports facilities