Du
Horaire -
Lieu LISN Site Belvédère
Séminaires, STL
Orateur : Tommaso Raso (Federal University of Minas Gerais, Brazil)
I will present the methodology of spontaneous speech corpora compilation for the C-ORAL family corpora. I will discuss the importance of the diaphasic variation for spontaneous speech data, and describe the corpora architecture. Then, I will discuss the different phases of the corpus compilation, and the methodologic issues they present: recording, transcription, prosodic segmentation, revision, alignment, and statistical validation. Special attention will be given to prosodic segmentation: why it is important and how it can be performed.