Stage
1 document Published on
sahar.ghannay@lisn.upsaclay.fr, sophie.rosset@lisn.upsaclay.fr
In the context of the spoken language understanding (SLU) field for dialogue systems, the problem of contextual representation remains a hot topic despite the many works on it [Tomashenko et al., 2020].
Focusing on this problem, the main objective of this study is to build a context-aware representation of dialog turns, enriched with multilingual multimodal semantic information.
A recent study [Laperriere et al., 2023] investigates a specific in-domain semantic enrichment of the SSL (self-supervised learning) SAMU-XLSR model by specializing it on a small amount of transcribed data from a challenging SLU task, to better semantic information extraction on this downstream task. Thus, we propose to enrich the SAMU-XLSR [Khurana et al., 2022] model with contextual information of dialog turns in addition to the previously acquired multilingual multimodal semantic information. We are also interested in semantic information extraction from speech signals using end-to-end approaches. The performance of the Contextual-SAMU-XLSR model will be evaluated on SLU task in different languages and domains. The experiments will be performed on two challenging SLU datasets. I) A new version of the MEDIA [Bonneau-Maynard et al., 2005] French corpus enriched with intent information in addition to the slots. II) The TARIC corpus [Masmoudi et al., ] in Tunisian dialect, enriched with semantic annotations ( slots and dialog acts). Both corpora will be publicly available soon. In addition, we propose to use the DailyDialog [Li et al., 2017] corpus to enrich the SAMU-XLSR model with contextual information. The objectives of the internship are : — Extend the recent work [Laperriere et al., 2023] to develop an end-to-end SLU system for joint slot and
intent detection on the new version of MEDIA TASK.
The SLU models will be implemented using the open-source SpeechBrain toolkit [?] dedicated to neural
speech processing.
ere et al., 2023] Laperri
ere, G., Nguyen, H., Ghannay, S., Jabaian, B., and Esteve, Y. (2023). Specialized semantic enrichment of speech representations. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pages 1–5.
[Li et al., 2017] Li, Y., Su, H., Shen, X., Li, W., Cao, Z., and Niu, S. (2017). Dailydialog : A manually labelled multi-turn dialogue dataset. ArXiv, abs/1710.03957.
[Masmoudi et al., ] Masmoudi, A., Esteve, Y., Belguith, L. H., and Habash, N. A corpus and phonetic dictionary for tunisian arabic speech recognition. [Tomashenko et al., 2020] Tomashenko, N., Raymond, C., Caubri
ere, A., Mori, R. D., and Est`eve, Y. (2020).