Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15049
Cited By
v1
v2 (latest)
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
28 March 2021
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval"
50 / 82 papers shown
Frame-Difference Guided Dynamic Region Perception for CLIP Adaptation in Text-Video Retrieval
Jiaao Yu
Mingjie Han
Tao Gong
Jian Zhang
Man Lan
VGen
VLM
135
0
0
21 Oct 2025
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
A. Fragomeni
Dima Damen
Michael Wray
268
0
0
29 May 2025
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
Information Fusion (Inf. Fusion), 2025
Xiaolun Jing
Genke Yang
Jian Chu
256
5
0
07 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
609
1
0
02 Apr 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
460
2
0
17 Mar 2025
Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jian Xiao
Zhenzhen Hu
Jia Li
Richang Hong
180
0
0
09 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
279
7
0
30 Sep 2024
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
271
2
0
13 Aug 2024
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
Yuchen Yang
Yingxuan Duan
VGen
229
0
0
19 Jun 2024
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
CLIP
301
3
0
25 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
439
37
0
22 May 2024
A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Charles Raude
Prajwal K R
Liliane Momeni
Hannah Bull
Samuel Albanie
Andrew Zisserman
Gül Varol
SLR
374
9
0
16 May 2024
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
Haiwen Diao
Ying Zhang
Shang Gao
Xiang Ruan
Huchuan Lu
438
7
0
28 Apr 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
397
3
0
18 Apr 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
354
47
0
20 Mar 2024
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li
183
19
0
01 Jan 2024
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
326
5
0
19 Dec 2023
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
383
1
0
10 Dec 2023
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
332
7
0
07 Dec 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Xiangru Jian
Bo Xue
258
24
0
17 Oct 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
ACM Multimedia (ACM MM), 2023
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming-Hsuan Yang
Yuan Qi
458
19
0
20 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
466
86
0
18 Sep 2023
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
277
0
0
31 Aug 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
International Conference on Language Resources and Evaluation (LREC), 2023
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
Ningyu Zhang
315
34
0
10 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
403
119
0
03 Aug 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
IEEE Transactions on Image Processing (IEEE TIP), 2023
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
473
55
0
24 Jul 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Tao Gui
S. Zheng
Qin Jin
288
2
0
20 Jul 2023
Hierarchical Matching and Reasoning for Multi-Query Image Retrieval
Neural Networks (Neural Netw.), 2023
Zhong Ji
Zhihao Li
Yan Zhang
Haoran Wang
Yanwei Pang
Xuelong Li
335
16
0
26 Jun 2023
Iterative Adversarial Attack on Image-guided Story Ending Generation
IEEE transactions on multimedia (IEEE TMM), 2023
Youze Wang
Wenbo Hu
Richang Hong
279
10
0
16 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
ACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
297
8
0
13 May 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
313
84
0
25 Mar 2023
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
Computer Vision and Pattern Recognition (CVPR), 2023
Jiahao Zhang
A. Cherian
Yanbin Liu
Yizhak Ben-Shabat
Cristian Rodriguez-Opazo
Stephen Gould
308
12
0
24 Mar 2023
Plug-and-Play Regulators for Image-Text Matching
IEEE Transactions on Image Processing (IEEE TIP), 2023
Haiwen Diao
Yanzhe Zhang
Wen Liu
Xiang Ruan
Huchuan Lu
245
32
0
23 Mar 2023
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Yiting Cheng
Fangyun Wei
Jianmin Bao
Dong Chen
Wenqian Zhang
SLR
282
45
0
22 Mar 2023
CLIP4MC: An RL-Friendly Vision-Language Model for Minecraft
European Conference on Computer Vision (ECCV), 2023
Haobin Jiang
Hao Luo
Ke Li
Junpeng Yue
Tiejun Huang
Zongqing Lu
VLM
285
9
0
19 Mar 2023
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
Computer Vision and Pattern Recognition (CVPR), 2023
Dezhao Luo
Jiabo Huang
S. Gong
Hailin Jin
Yang Liu
VGen
406
45
0
28 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
International Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
254
35
0
24 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yizhen Chen
Jie Wang
Lijian Lin
Chen Ma
Jin Ma
Ying Shan
VLM
298
36
0
30 Jan 2023
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval
IEEE Transactions on Image Processing (IEEE TIP), 2023
Yan Zhang
Zhong Ji
Dingrong Wang
Yanwei Pang
Xuelong Li
VLM
234
39
0
17 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
640
460
0
13 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
435
5
0
05 Jan 2023
Multi-queue Momentum Contrast for Microvideo-Product Retrieval
Web Search and Data Mining (WSDM), 2022
Yali Du
Yin-wei Wei
Wei Ji
Fan Liu
Xin Luo
Liqiang Nie
220
20
0
22 Dec 2022
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
209
30
0
07 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
213
14
0
02 Dec 2022
Normalized Contrastive Learning for Text-Video Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yookoon Park
Mahmoud Azab
Bo Xiong
Seungwhan Moon
Florian Metze
Gourab Kundu
Kirmani Ahmed
205
13
0
30 Nov 2022
Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval
Damianos Galanopoulos
Vasileios Mezaris
270
7
0
21 Nov 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
Asian Conference on Computer Vision (ACCV), 2022
A. Fragomeni
Michael Wray
Dima Damen
CLIP
ViT
177
4
0
09 Oct 2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
Neural Information Processing Systems (NeurIPS), 2022
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
292
42
0
27 Sep 2022
LGDN: Language-Guided Denoising Network for Video-Language Modeling
Neural Information Processing Systems (NeurIPS), 2022
Haoyu Lu
Mingyu Ding
Nanyi Fei
Yuqi Huo
Zhiwu Lu
VLM
402
20
0
23 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Computer Vision and Pattern Recognition (CVPR), 2022
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
768
85
0
04 Sep 2022
1
2
Next
Page 1 of 2