Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.10436
Cited By
Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching
21 June 2022
Nicola Messina
D. Coccomini
Andrea Esuli
Fabrizio Falchi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching"
2 / 2 papers shown
Title
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
577
0
22 Apr 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
197
310
0
02 Mar 2021
1