Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.05078
Cited By
v1
v2 (latest)
Visual Grounding in Video for Unsupervised Word Translation
Computer Vision and Pattern Recognition (CVPR), 2020
11 March 2020
Gunnar Sigurdsson
Jean-Baptiste Alayrac
Aida Nematzadeh
Lucas Smaira
Mateusz Malinowski
João Carreira
Phil Blunsom
Andrew Zisserman
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Grounding in Video for Unsupervised Word Translation"
29 / 29 papers shown
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
327
0
0
12 Nov 2024
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning
Sikiru Adewale
Tosin Ige
Bolanle Hafiz Matti
VLM
241
7
0
02 Oct 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
IEEE International Conference on Computer Vision (ICCV), 2023
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
293
27
0
29 Aug 2023
Divert More Attention to Vision-Language Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
331
17
0
19 Jul 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
445
107
0
31 Mar 2023
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLM
VLM
356
80
0
14 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David Harwath
Hilde Kuehne
James R. Glass
VLM
244
8
0
07 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
484
42
0
05 Oct 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
526
6
0
24 Aug 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
Jialu Li
Hao Tan
Joey Tianyi Zhou
LM&Ro
246
13
0
05 Jul 2022
VALHALLA: Visual Hallucination for Machine Translation
Computer Vision and Pattern Recognition (CVPR), 2022
Yi Li
Yikang Shen
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
549
54
0
31 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
280
0
0
23 May 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Information Fusion (Inf. Fusion), 2022
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Lin Wang
456
274
0
16 Apr 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Findings (Findings), 2022
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
264
110
0
12 Mar 2022
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Computer Vision and Pattern Recognition (CVPR), 2022
Mannat Singh
Laura Gustafson
Aaron B. Adcock
Vinicius de Freitas Reis
B. Gedik
Raj Prateek Kosaraju
D. Mahajan
Ross B. Girshick
Piotr Dollár
Laurens van der Maaten
VLM
352
153
0
20 Jan 2022
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
390
24
0
13 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
618
8
0
08 Nov 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
ACM Multimedia (ACM MM), 2021
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
232
28
0
25 Aug 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
...
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
322
121
0
08 Jun 2021
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems
Journal of Artificial Intelligence Research (JAIR), 2021
E. Razumovskaia
Goran Glavaš
Olga Majewska
Edoardo Ponti
Anna Korhonen
Ivan Vulić
588
38
0
17 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2021
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
284
110
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
IEEE International Conference on Computer Vision (ICCV), 2021
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
399
140
0
30 Mar 2021
Source-Free Domain Adaptation for Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
Yuang Liu
Wei Zhang
Jun Wang
324
320
0
30 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
381
61
0
16 Mar 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Transactions of the Association for Computational Linguistics (TACL), 2021
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
298
129
0
31 Jan 2021
Globetrotter: Connecting Languages by Connecting Images
Dídac Surís
Dave Epstein
Carl Vondrick
VLM
384
9
0
08 Dec 2020
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
180
5
0
19 Nov 2020
Visual Pivoting for (Unsupervised) Entity Alignment
Fangyu Liu
Muhao Chen
Dan Roth
Nigel Collier
OCL
440
157
0
28 Sep 2020
Video Understanding as Machine Translation
Bruno Korbar
Fabio Petroni
Rohit Girdhar
Lorenzo Torresani
SSL
270
29
0
12 Jun 2020
1
Page 1 of 1