ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.05078
  4. Cited By
Visual Grounding in Video for Unsupervised Word Translation
v1v2 (latest)

Visual Grounding in Video for Unsupervised Word Translation

Computer Vision and Pattern Recognition (CVPR), 2020
11 March 2020
Gunnar Sigurdsson
Jean-Baptiste Alayrac
Aida Nematzadeh
Lucas Smaira
Mateusz Malinowski
João Carreira
Phil Blunsom
Andrew Zisserman
    VGen
ArXiv (abs)PDFHTML

Papers citing "Visual Grounding in Video for Unsupervised Word Translation"

29 / 29 papers shown
Grounded Video Caption Generation
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
327
0
0
12 Nov 2024
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video
  Captioning
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning
Sikiru Adewale
Tosin Ige
Bolanle Hafiz Matti
VLM
241
7
0
02 Oct 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for
  Multimodal Machine Translation
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine TranslationIEEE International Conference on Computer Vision (ICCV), 2023
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
293
27
0
29 Aug 2023
Divert More Attention to Vision-Language Object Tracking
Divert More Attention to Vision-Language Object TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
331
17
0
19 Jul 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
445
107
0
31 Mar 2023
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-trainingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLMVLM
356
80
0
14 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual
  Text-Video Retrieval
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video RetrievalIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David Harwath
Hilde Kuehne
James R. Glass
VLM
244
8
0
07 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
484
42
0
05 Oct 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
526
6
0
24 Aug 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
  Environment-Agnostic Representations
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
Jialu Li
Hao Tan
Joey Tianyi Zhou
LM&Ro
246
13
0
05 Jul 2022
VALHALLA: Visual Hallucination for Machine Translation
VALHALLA: Visual Hallucination for Machine TranslationComputer Vision and Pattern Recognition (CVPR), 2022
Yi Li
Yikang Shen
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
549
54
0
31 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual
  Word Alignment
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word AlignmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
280
0
0
23 May 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Visual Attention Methods in Deep Learning: An In-Depth SurveyInformation Fusion (Inf. Fusion), 2022
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Lin Wang
456
274
0
16 Apr 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
  Distillation
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge DistillationFindings (Findings), 2022
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
264
110
0
12 Mar 2022
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Revisiting Weakly Supervised Pre-Training of Visual Perception ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Mannat Singh
Laura Gustafson
Aaron B. Adcock
Vinicius de Freitas Reis
B. Gedik
Raj Prateek Kosaraju
D. Mahajan
Ross B. Girshick
Piotr Dollár
Laurens van der Maaten
VLM
352
153
0
20 Jan 2022
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
390
24
0
13 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
618
8
0
08 Nov 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual
  Pre-training
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-trainingACM Multimedia (ACM MM), 2021
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
232
28
0
25 Aug 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
  Evaluation
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
...
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
322
121
0
08 Jun 2021
Crossing the Conversational Chasm: A Primer on Natural Language
  Processing for Multilingual Task-Oriented Dialogue Systems
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue SystemsJournal of Artificial Intelligence Research (JAIR), 2021
E. Razumovskaia
Goran Glavaš
Olga Majewska
Edoardo Ponti
Anna Korhonen
Ivan Vulić
588
38
0
17 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language
  Pre-training
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-trainingComputer Vision and Pattern Recognition (CVPR), 2021
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLMVLM
284
110
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video LearningIEEE International Conference on Computer Vision (ICCV), 2021
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSLAI4TS
399
140
0
30 Mar 2021
Source-Free Domain Adaptation for Semantic Segmentation
Source-Free Domain Adaptation for Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2021
Yuang Liu
Wei Zhang
Jun Wang
324
320
0
30 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual
  Transfer of Vision-Language Models
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLMVLM
381
61
0
16 Mar 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal TransformersTransactions of the Association for Computational Linguistics (TACL), 2021
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
298
129
0
31 Jan 2021
Globetrotter: Connecting Languages by Connecting Images
Globetrotter: Connecting Languages by Connecting Images
Dídac Surís
Dave Epstein
Carl Vondrick
VLM
384
9
0
08 Dec 2020
Using Text to Teach Image Retrieval
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
180
5
0
19 Nov 2020
Visual Pivoting for (Unsupervised) Entity Alignment
Visual Pivoting for (Unsupervised) Entity Alignment
Fangyu Liu
Muhao Chen
Dan Roth
Nigel Collier
OCL
440
157
0
28 Sep 2020
Video Understanding as Machine Translation
Bruno Korbar
Fabio Petroni
Rohit Girdhar
Lorenzo Torresani
SSL
270
29
0
12 Jun 2020
1
Page 1 of 1