Pré-publication, Document de travail, STL, Informatique, Vision par ordinateur et reconnaissance de formes

Entity-aware cross-modal pretraining for Knowledge-Based Visual Question Answering

Omar Adjali, Olivier Ferret, Sahar Ghannay, Hervé Le Borgne. Entity-aware cross-modal pretraining for Knowledge-Based Visual Question Answering. 2024. ⟨cea-04910767⟩

Publié le