Communication dans un congrès
From
Time -
Location LISN Site Belvédère
STL, Thesis
Speaker : Pierre Lepagnol
Small Generative Models in Industrial Context: Adaptation by Prompting with Limited Data
Natural language processing enables automated text analysis for classification and information extraction. However, developing such systems faces three major challenges: the scarcity of annotated data, limited computational resources. This industrial thesis, conducted with the company SCIAM, explores small generative models for utterance classification, slot filling, and named entity recognition in an industrial setting with limited annotations.
Our work is structured around four axes. The first concerns model selection without annotated data: a zero-shot evaluation of 72 models on 15 classification datasets demonstrates that size is not the determining factor; architecture (seq2seq vs decoder-only) and instruction-tuning play a more significant role, enabling 1–3B parameter models to compete with those exceeding 7B. The second axis focuses on context optimization with few examples: a dynamic prompting approach via information retrieval (BM25) improves slot filling performances up to 21 F1 points compared to intent-based or random selection, across four datasets (ATIS, SNIPS, SLURP, MEDIA), with no inference overhead. The third axis analyzes the impact of structured output format: studying three formats (Key-Value, JSON, XML) on 13 models and 7 datasets reveals gaps of 2 to 46 F1 points, and we propose an automatic selection method that identifies the optimal format at lower cost. Finally, the fourth axis exposes the risks of benchmark contamination and proposes detection methods based on similarity and extraction, enabling assessment of evaluation reliability.
This work, accompanied by public code and reproducible protocols, establishes methodological foundations for efficient, auditable NLP systems suited to industrial constraints.
Communication dans un congrès
Communication dans un congrès
Communication dans un congrès