From
Time
Location LISN Site Belvédère
STL, Thesis
Speaker : Armand STRICKER
The ability to hold natural, human-like conversations has long been regarded as a hallmark of intelligence. Contemporary dialogue systems are typically divided into two categories: task-oriented dialogue (TOD), focused on efficiency and goal completion, and open-domain chitchat, centered on engagement and social connection. Yet human conversations rarely conform to this dichotomy. Everyday dialogue interleaves transactional and social goals, suggesting that dialogue systems should likewise integrate both modes to achieve more natural interactions.
This thesis investigates the integration of chitchat into task-oriented dialogue, advancing the argument that these modes are not competing but complementary. The work is structured in two parts. The first part provides a state-of-the-art overview of dialogue systems, tracing the historical evolution of TOD and chitchat agents, the datasets and evaluation paradigms that shaped them, and the emergence of integrated “inter-mode” approaches. It also introduces relevant natural language processing and machine learning foundations, positioning recent advances in large language models as critical enablers of hybrid dialogue systems.
The second part presents original research contributions that explore how chitchat can enhance task-oriented dialogue in practice. Four studies are reported. First, a comparative study evaluates different strategies for injecting chitchat into TOD, revealing distinct lexical patterns and showing how certain forms of social talk improve diversity and engagement. Second, a unified modeling approach demonstrates that incorporating user emotion detection into an end-to-end TOD system improves both task success and conversational naturalness, underscoring the value of affective signals in practical interactions. Third, we augment TODs with synthetic user backstories, simulating inter-mode scenarios by embedding elements of a user narrative into task requests. This training data supports both the evaluation and improvement of model robustness to such common inter-mode inputs. Finally, a few-shot prompting approach shows that large language models can flexibly combine task-oriented and chitchat capabilities without architectural modifications, highlighting the practicality of LLM-based methods for building unified dialogue agents.
Together, these contributions advance the development of inter-mode dialogue systems by addressing key challenges such as dataset limitations, unified modeling of social and task signals, and robustness to blended user input. The thesis demonstrates that chitchat and TOD can be coherently integrated without necessarily sacrificing task performance. Ultimately, it argues that moving beyond the artificial separation of TOD and chitchat is essential for creating dialogue systems that are not only effective and reliable, but also more natural, empathetic, and enjoyable.