Developing the Framed Multi30k (FM30k) multimodal-multilingual dataset 1/2

Adriana Pagano, Federal University of Minas Gerais, Brazil

Orateur : Adriana Pagano

Abstract

This talk will present Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset (Viridiano et al., 2024) which i) extends the Multi30K dataset (Elliott et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 31,104 Brazilian Portuguese translations from original English descriptions; ii) adds 4,577,122 frame and frame element labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; and (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 169,560 frames and Frame Elements correlations with the existing phrase-to-region correlations. The dataset adds image annotation within FrameNet, thus departing from a three-decade tradition of text-only annotation, and augments Flick30k by adding FrameNet Labels to Flick30k entities. The dataset also increases the representation of Brazilian Portuguese in NLP and constitutes a rich resource for exploring multimodal and multilingual perspectivism. The talk will briefly present upcoming work with Framed Multi30K (FM30K) regarding MWEs annotation of both the original and the translated captions.

Short Bio

Adriana S. Pagano is Full Professor in Applied Linguistics at Universidade Federal de Minas Gerais, Brazil. She is a research fellow of CNPq (National Council for Scientific and Technological Development, Ministry of Science and Technology, Brazil) and FAPEMIG (Research Foundation of the State of Minas Gerais, Brazil). Her research interests include (i) language modelling from the perspective of systemic-functional linguistics; (ii) quantitative approaches to translation and multilingual textual production; and (iii) development of corpora and other resources for Natural Language Processing. She currently coordinates the project Algorithms for Fair Representation: Debiasing Large Language Models with Culturally-Diverse Datasets, funded by the Worldwide Universities Network (WUN RDF 2024), a joint project between UFMG, the University of Alberta, the University of Exeter, Mahidol University and Makerere University. She also coordinates the project Multimodal and Multilingual Perspectivisation for AIArtificial Intelligence-Responsible Resources and Applications, funded by MCTI/CNPQ International Cooperation Projects (2025-2026), and the project Generative AIArtificial Intelligence applied to Maternal and Child Health, funded by UFMG’s Centre for Innovation in Artificial Intelligence for Health (CI-IAIntelligence Artificielle Saúde). She participates in academic cooperation and research exchange agreements with Università di Torino (Italy), Univerzita Karlova (Czech Republic), Copenhagen Business School (Denmark), Universidad Autonoma de Barcelona (Spain), University of Sydney (Australia), University of Southampton (United Kingdom), University of Ghana (Ghana) and Macau Polytechnic University (Macau). She is a researcher at Reinventa – Research and Innovation Network for Visual and Textual Analysis of Multimodal Objects (RED-00106-21) and is a member of the teams working on the projects Responsible and ethical applications of Artificial Intelligence in Public Health: AIArtificial Intelligence-empowered Child and Maternal Health (Worldwide Universities Network RDF 2023) and Development and evaluation of an intelligent system for generating guidelines for safe prescription and accessible medication adapted to different cultural contexts (CNPq/Bill & Melinda Gates Foundation). ORCID 0000-0002-3150-3503

Abstract

Short Bio

Lieu de l'événement Lieu de l'événement

Lieu de l'événement