ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.01300
  4. Cited By
Masking Modalities for Cross-modal Video Retrieval
v1v2 (latest)

Masking Modalities for Cross-modal Video Retrieval

IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
1 November 2021
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
ArXiv (abs)PDFHTML

Papers citing "Masking Modalities for Cross-modal Video Retrieval"

22 / 22 papers shown
Title
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
A. Fragomeni
Dima Damen
Michael Wray
191
0
0
29 May 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
394
1
0
02 Apr 2025
Generating Illustrated Instructions
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
237
6
0
07 Dec 2023
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion PredictionIEEE International Conference on Computer Vision (ICCV), 2023
Chenxin Xu
R. Tan
Yuhong Tan
Siheng Chen
Xinchao Wang
Yanfeng Wang
3DH
217
31
0
17 Aug 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New
  Benchmarks and Model
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and ModelIEEE Transactions on Image Processing (IEEE TIP), 2023
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
350
45
0
24 Jul 2023
Modality Influence in Multimodal Machine Learning
Modality Influence in Multimodal Machine Learning
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
195
3
0
10 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetNeural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
446
170
0
29 May 2023
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using
  Corrupted Multimodal Data
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal DataInternational Journal of Computer Vision (IJCV), 2023
Arthur Josi
Mahdi Alehdaghi
Rafael M. O. Cruz
Mohammadhadi Shateri
197
4
0
29 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
330
148
0
17 Apr 2023
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Jae Myung Kim
A. Sophia Koepke
Cordelia Schmid
Zeynep Akata
220
42
0
06 Apr 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
335
4
0
05 Jan 2023
Multimodal Data Augmentation for Visual-Infrared Person ReID with
  Corrupted Data
Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data
Arthur Josi
Mahdi Alehdaghi
Rafael M. O. Cruz
Mohammadhadi Shateri
135
19
0
22 Nov 2022
Expectation-Maximization Contrastive Learning for Compact
  Video-and-Language Representations
Expectation-Maximization Contrastive Learning for Compact Video-and-Language RepresentationsNeural Information Processing Systems (NeurIPS), 2022
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David Clifton
Jing Chen
VLM
264
85
0
21 Nov 2022
Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
Cross-Lingual Cross-Modal Retrieval with Noise-Robust LearningACM Multimedia (ACM MM), 2022
Yabing Wang
Jianfeng Dong
Tianxiang Liang
Minsong Zhang
Rui Cai
Xun Wang
254
27
0
26 Aug 2022
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video
  Retrieval
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
Burak Satar
Erik Cambria
Hanwang Zhang
J. Lim
155
13
0
26 Jun 2022
Symmetric Network with Spatial Relationship Modeling for Natural
  Language-based Vehicle Retrieval
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval
Chuyang Zhao
Haobo Chen
Wenyuan Zhang
Junru Chen
Sipeng Zhang
Yadong Li
Boxun Li
128
13
0
22 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
463
809
0
13 Jun 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
390
73
0
17 May 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundEuropean Conference on Computer Vision (ECCV), 2022
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
330
53
0
06 Apr 2022
Learning Audio-Video Modalities from Image Captions
Learning Audio-Video Modalities from Image CaptionsEuropean Conference on Computer Vision (ECCV), 2022
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
166
95
0
01 Apr 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical
  Instructional Video
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional VideoIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
587
64
0
13 Mar 2022
Multi-Query Video Retrieval
Multi-Query Video RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
245
23
0
10 Jan 2022
1