Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.08271
Cited By
v1
v2 (latest)
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
16 April 2021
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval"
50 / 77 papers shown
Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning
Rongrong Xie
Yizhou Xu
Guido Sanguinetti
122
0
0
15 Oct 2025
Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval
Jianfeng Dong
Lei Huang
Daizong Liu
Xianke Chen
Xun Yang
Changting Lin
Xun Wang
Meng Wang
127
0
0
14 Oct 2025
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan
Ruobing Xie
Ruixiang Zhao
Xingwu Sun
Zhanhui Kang
Gang Yang
Xirong Li
110
0
0
05 Sep 2025
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Haoyu Zhao
Jiaxi Gu
Shicong Wang
Xing Zhang
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
164
0
0
20 Aug 2025
Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
Zhengxin Pan
Haishuai Wang
Fangyu Wu
Peng Zhang
Jiajun Bu
162
0
0
04 Aug 2025
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
Bingqing Zhang
Zhuo Cao
Heming Du
Y. Li
Xue Li
Jiajun Liu
Sen Wang
259
2
0
21 Jul 2025
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
A. Fragomeni
Dima Damen
Michael Wray
243
0
0
29 May 2025
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
Information Fusion (Inf. Fusion), 2025
Xiaolun Jing
Genke Yang
Jian Chu
234
3
0
07 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
500
1
0
02 Apr 2025
Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective
Jinjing Zhu
Songze Li
Lin Wang
325
0
0
13 Jan 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
414
4
0
31 Dec 2024
Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence
AAAI Conference on Artificial Intelligence (AAAI), 2024
Wenbo Huang
Jinghui Zhang
Ge Li
Lei Zhang
Shuoyuan Wang
Fang Dong
Jiahui Jin
Takahiro Ogawa
Miki Haseyama
Mamba
547
5
0
10 Dec 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Asian Conference on Computer Vision (ACCV), 2024
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
307
0
0
16 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
210
6
0
30 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
248
12
0
31 Jul 2024
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition
ACM Multimedia (MM), 2024
Wenbo Huang
Jinghui Zhang
Xuwei Qian
Zhen Wu
Meng Wang
Lei Zhang
322
9
0
23 Jul 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
283
66
0
27 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
317
19
0
29 May 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
285
3
0
18 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
338
64
0
26 Mar 2024
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li
152
14
0
01 Jan 2024
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
347
1
0
10 Dec 2023
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
ViT
239
41
0
29 Oct 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian
Yimu Wang
259
6
0
20 Oct 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Xiangru Jian
Bo Xue
204
22
0
17 Oct 2023
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning
IEEE Access (IEEE Access), 2023
Yanan Wang
Donghuo Zeng
Shinya Wada
Satoshi Kurihara
201
12
0
27 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
383
77
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
232
6
0
16 Sep 2023
Simple Baselines for Interactive Video Retrieval with Questions and Answers
IEEE International Conference on Computer Vision (ICCV), 2023
Kaiqu Liang
Samuel Albanie
201
8
0
21 Aug 2023
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
L. Bicsi
B. Alexe
Radu Tudor Ionescu
Marius Leordeanu
257
2
0
09 Aug 2023
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Kaibin Tian
Rui Zhao
Hu Hu
Runquan Xie
Fengzong Lian
Zhanhui Kang
Xirong Li
CLIP
98
1
0
02 Aug 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
IEEE International Conference on Computer Vision (ICCV), 2023
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
287
34
0
24 Jul 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
282
52
0
20 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
ACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
246
5
0
13 May 2023
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Jae Myung Kim
A. Sophia Koepke
Cordelia Schmid
Zeynep Akata
256
44
0
06 Apr 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
278
81
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
IEEE International Conference on Computer Vision (ICCV), 2023
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
348
84
0
17 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
International Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
230
31
0
24 Feb 2023
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Peng Shi
238
9
0
19 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yizhen Chen
Jie Wang
Lijian Lin
Chen Ma
Jin Ma
Ying Shan
VLM
257
34
0
30 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Bo Fang
Wenhao Wu
Chang-rui Liu
Can Ma
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
267
84
0
16 Jan 2023
Normalized Contrastive Learning for Text-Video Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yookoon Park
Mahmoud Azab
Bo Xiong
Seungwhan Moon
Florian Metze
Gourab Kundu
Kirmani Ahmed
177
13
0
30 Nov 2022
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Neural Information Processing Systems (NeurIPS), 2022
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David Clifton
Jing Chen
VLM
307
87
0
21 Nov 2022
RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xing Wu
Chaochen Gao
Zijia Lin
Zhongyuan Wang
Jizhong Han
Songlin Hu
162
10
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Neural Information Processing Systems (NeurIPS), 2022
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
297
84
0
12 Oct 2022
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Findings (Findings), 2022
Pedro Rodriguez
Mahmoud Azab
Becka Silvert
Renato Sanchez
Linzy Labson
Hardik Shah
Seungwhan Moon
226
2
0
10 Oct 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
Asian Conference on Computer Vision (ACCV), 2022
A. Fragomeni
Michael Wray
Dima Damen
CLIP
ViT
158
4
0
09 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David Harwath
Hilde Kuehne
James R. Glass
VLM
218
8
0
07 Oct 2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
Neural Information Processing Systems (NeurIPS), 2022
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
218
38
0
27 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Neural Information Processing Systems (NeurIPS), 2022
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
294
178
0
15 Sep 2022
1
2
Next
Page 1 of 2