- Departements et équipes
- Publications scientifiques
- Actions transversales
- Nous Rejoindre
In the context of the spoken language understanding (SLU) field for dialogue systems, the problem of contextual representation remains a hot topic despite the many works on it [Tomashenko et al., 2020].
Focusing on this problem, the main objective of this study is to build a context-aware representation of dialog turns, enriched with multilingual multimodal semantic information.
A recent study [Laperriere et al., 2023] investigates a specific in-domain semantic enrichment of the SSL (self-supervised learning) SAMU-XLSR model by specializing it on a small amount of transcribed data from a challenging SLU task, to better semantic information extraction on this downstream task. Thus, we propose to enrich the SAMU-XLSR [Khurana et al., 2022] model with contextual information of dialog turns in addition to the previously acquired multilingual multimodal semantic information. We are also interested in semantic information extraction from speech signals using end-to-end approaches. The performance of the Contextual-SAMU-XLSR model will be evaluated on SLU task in different languages and domains. The experiments will be performed on two challenging SLU datasets. I) A new version of the MEDIA [Bonneau-Maynard et al., 2005] French corpus enriched with intent information in addition to the slots. II) The TARIC corpus [Masmoudi et al., ] in Tunisian dialect, enriched with semantic annotations ( slots and dialog acts). Both corpora will be publicly available soon. In addition, we propose to use the DailyDialog [Li et al., 2017] corpus to enrich the SAMU-XLSR model with contextual information. The objectives of the internship are : — Extend the recent work [Laperriere et al., 2023] to develop an end-to-end SLU system for joint slot and
intent detection on the new version of MEDIA TASK.
The SLU models will be implemented using the open-source SpeechBrain toolkit [?] dedicated to neural
ere et al., 2023] Laperriere, G., Nguyen, H., Ghannay, S., Jabaian, B., and Est
eve, Y. (2023). Specialized semantic enrichment of speech representations. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pages 1–5.
[Li et al., 2017] Li, Y., Su, H., Shen, X., Li, W., Cao, Z., and Niu, S. (2017). Dailydialog : A manually labelled multi-turn dialogue dataset. ArXiv, abs/1710.03957.
[Masmoudi et al., ] Masmoudi, A., Esteve, Y., Belguith, L. H., and Habash, N. A corpus and phonetic dictionary for tunisian arabic speech recognition. [Tomashenko et al., 2020] Tomashenko, N., Raymond, C., Caubriere, A., Mori, R. D., and Est`eve, Y. (2020).