Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.08849
Cited By
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
16 March 2021
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models"
39 / 39 papers shown
Title
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
45
1
0
31 Dec 2024
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Yabing Wang
Le Wang
Qiang-feng Zhou
Zhibin Wang
Hao Li
Gang Hua
Wei Tang
22
7
0
30 Sep 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
77
9
0
14 Jun 2024
m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt
Jian Yang
Hongcheng Guo
Yuwei Yin
Jiaqi Bai
Bing Wang
Jiaheng Liu
Xinnian Liang
Linzheng Cahi
Liqun Yang
Zhoujun Li
33
9
0
26 Mar 2024
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
Yabing Wang
Fan Wang
Jianfeng Dong
Hao Luo
VLM
13
8
0
14 Dec 2023
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
M. Gwilliam
Michael Cogswell
Meng Ye
Karan Sikka
Abhinav Shrivastava
Ajay Divakaran
3DV
10
1
1
30 Nov 2023
FLAP: Fast Language-Audio Pre-training
Ching-Feng Yeh
Po-Yao Huang
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
CLIP
VLM
20
8
0
02 Nov 2023
Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual Grounding
Timothee Mickus
Elaine Zosa
Denis Paperno
23
0
0
18 Oct 2023
Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval
Yabing Wang
Shuhui Wang
Hao Luo
Jianfeng Dong
F. Wang
Meng Han
Xun Wang
Meng Wang
4
8
0
11 Sep 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
23
9
0
29 Aug 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
26
7
0
06 Jul 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Y. Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
16
49
0
08 Jun 2023
TG-VQA: Ternary Game of Video Question Answering
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
21
10
0
17 May 2023
Grafting Pre-trained Models for Multimodal Headline Generation
Lingfeng Qiao
Chen Wu
Ye Liu
Haoyuan Peng
Di Yin
Bo Ren
30
5
0
14 Nov 2022
Multilingual Multimodality: A Taxonomical Survey of Datasets, Techniques, Challenges and Opportunities
Khyathi Raghavi Chandu
A. Geramifard
21
3
0
30 Oct 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
39
21
0
21 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
10
7
0
19 Oct 2022
Low-resource Neural Machine Translation with Cross-modal Alignment
Zhe Yang
Qingkai Fang
Yang Feng
VLM
13
9
0
13 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David F. Harwath
Hilde Kuehne
James R. Glass
VLM
21
4
0
07 Oct 2022
Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
Yabing Wang
Jianfeng Dong
Tianxiang Liang
Minsong Zhang
Rui Cai
Xun Wang
13
19
0
26 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
34
3
0
24 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Bingning Wang
Feiya Lv
Ting Yao
Yiming Yuan
Jin Ma
Yu Luo
Haijin Liang
20
3
0
05 Aug 2022
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
A. Luu
VLM
CLIP
19
2
0
05 Jul 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
17
3
0
07 Jun 2022
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition
Liang Zhang
Anwen Hu
Qin Jin
VLM
20
5
0
29 May 2022
CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection
Zhen Li
Bing Xu
Conghui Zhu
T. Zhao
20
70
0
12 Apr 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
E. Ponti
Ivan Vulić
MLLM
VLM
ELM
33
62
0
27 Jan 2022
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Yehao Li
Jiahao Fan
Yingwei Pan
Ting Yao
Weiyao Lin
Tao Mei
MLLM
ObjD
14
19
0
11 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
19
56
0
13 Sep 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
17
133
0
29 Jul 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
8
272
0
09 Jun 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
...
Tamara L. Berg
Mohit Bansal
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
19
100
0
08 Jun 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
21
126
0
20 May 2021
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
20
242
0
06 Oct 2020
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
242
489
0
16 Oct 2019
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
Xilun Chen
Yu Sun
Ben Athiwaratkun
Claire Cardie
Kilian Q. Weinberger
214
315
0
06 Jun 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
214
7,687
0
17 Aug 2015
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,150
0
16 Jan 2013
1