ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08271
  4. Cited By
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
v1v2 (latest)

TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval

IEEE International Conference on Computer Vision (ICCV), 2021
16 April 2021
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
    VGen
ArXiv (abs)PDFHTML

Papers citing "TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval"

50 / 77 papers shown
Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning
Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning
Rongrong Xie
Yizhou Xu
Guido Sanguinetti
119
0
0
15 Oct 2025
Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval
Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval
Jianfeng Dong
Lei Huang
Daizong Liu
Xianke Chen
Xun Yang
Changting Lin
Xun Wang
Meng Wang
119
0
0
14 Oct 2025
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan
Ruobing Xie
Ruixiang Zhao
Xingwu Sun
Zhanhui Kang
Gang Yang
Xirong Li
110
0
0
05 Sep 2025
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Haoyu Zhao
Jiaxi Gu
Shicong Wang
Xing Zhang
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
152
0
0
20 Aug 2025
Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
Zhengxin Pan
Haishuai Wang
Fangyu Wu
Peng Zhang
Jiajun Bu
161
0
0
04 Aug 2025
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
Bingqing Zhang
Zhuo Cao
Heming Du
Y. Li
Xue Li
Jiajun Liu
Sen Wang
248
2
0
21 Jul 2025
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
A. Fragomeni
Dima Damen
Michael Wray
233
0
0
29 May 2025
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video RetrievalInformation Fusion (Inf. Fusion), 2025
Xiaolun Jing
Genke Yang
Jian Chu
228
3
0
07 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
486
1
0
02 Apr 2025
Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective
Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective
Jinjing Zhu
Songze Li
Lin Wang
317
0
0
13 Jan 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
395
4
0
31 Dec 2024
Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence
Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-SequenceAAAI Conference on Artificial Intelligence (AAAI), 2024
Wenbo Huang
Jinghui Zhang
Ge Li
Lei Zhang
Shuoyuan Wang
Fang Dong
Jiahui Jin
Takahiro Ogawa
Miki Haseyama
Mamba
528
5
0
10 Dec 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Beyond Coarse-Grained Matching in Video-Text RetrievalAsian Conference on Computer Vision (ACCV), 2024
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
306
0
0
16 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
TokenBinder: Text-Video Retrieval with One-to-Many Alignment ParadigmIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
210
5
0
30 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
246
12
0
31 Jul 2024
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action RecognitionACM Multimedia (MM), 2024
Wenbo Huang
Jinghui Zhang
Xuwei Qian
Zhen Wu
Meng Wang
Lei Zhang
276
9
0
23 Jul 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
277
66
0
27 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
302
19
0
29 May 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
283
3
0
18 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
333
61
0
26 Mar 2024
Towards Efficient and Effective Text-to-Video Retrieval with
  Coarse-to-Fine Visual Representation Learning
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation LearningAAAI Conference on Artificial Intelligence (AAAI), 2024
Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li
151
13
0
01 Jan 2024
Leveraging Generative Language Models for Weakly Supervised Sentence
  Component Analysis in Video-Language Joint Learning
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
303
1
0
10 Dec 2023
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language
  Understanding
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
ViT
229
40
0
29 Oct 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian
Yimu Wang
235
6
0
20 Oct 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and
  Gallery Banks
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery BanksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Xiangru Jian
Bo Xue
190
22
0
17 Oct 2023
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer
  Learning
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer LearningIEEE Access (IEEE Access), 2023
Yanan Wang
Donghuo Zeng
Shinya Wada
Satoshi Kurihara
189
11
0
27 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Unified Coarse-to-Fine Alignment for Video-Text RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
380
76
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
229
6
0
16 Sep 2023
Simple Baselines for Interactive Video Retrieval with Questions and
  Answers
Simple Baselines for Interactive Video Retrieval with Questions and AnswersIEEE International Conference on Computer Vision (ICCV), 2023
Kaiqu Liang
Samuel Albanie
200
8
0
21 Aug 2023
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset
  Student-Teacher Scenario for Video Action Recognition
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
L. Bicsi
B. Alexe
Radu Tudor Ionescu
Marius Leordeanu
257
2
0
09 Aug 2023
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Kaibin Tian
Rui Zhao
Hu Hu
Runquan Xie
Fengzong Lian
Zhanhui Kang
Xirong Li
CLIP
84
1
0
02 Aug 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
283
33
0
24 Jul 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set
  Alignment
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set AlignmentInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
279
50
0
20 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text
  Retrieval
Mask to reconstruct: Cooperative Semantics Completion for Video-text RetrievalACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
240
5
0
13 May 2023
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Jae Myung Kim
A. Sophia Koepke
Cordelia Schmid
Zeynep Akata
248
43
0
06 Apr 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for
  Cross-Modal Representation Learning
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation LearningComputer Vision and Pattern Recognition (CVPR), 2023
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
271
78
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
DiffusionRet: Generative Text-Video Retrieval with Diffusion ModelIEEE International Conference on Computer Vision (ICCV), 2023
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffMVGen
340
82
0
17 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a ReviewInternational Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
226
28
0
24 Feb 2023
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Video-Text Retrieval by Supervised Sparse Multi-Grained LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Peng Shi
233
9
0
19 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text
  Retrieval
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2023
Yizhen Chen
Jie Wang
Lijian Lin
Chen Ma
Jin Ma
Ying Shan
VLM
245
34
0
30 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
UATVR: Uncertainty-Adaptive Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Bo Fang
Wenhao Wu
Chang-rui Liu
Can Ma
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
246
82
0
16 Jan 2023
Normalized Contrastive Learning for Text-Video Retrieval
Normalized Contrastive Learning for Text-Video RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yookoon Park
Mahmoud Azab
Bo Xiong
Seungwhan Moon
Florian Metze
Gourab Kundu
Kirmani Ahmed
155
13
0
30 Nov 2022
Expectation-Maximization Contrastive Learning for Compact
  Video-and-Language Representations
Expectation-Maximization Contrastive Learning for Compact Video-and-Language RepresentationsNeural Information Processing Systems (NeurIPS), 2022
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David Clifton
Jing Chen
VLM
300
85
0
21 Nov 2022
RaP: Redundancy-aware Video-language Pre-training for Text-Video
  Retrieval
RaP: Redundancy-aware Video-language Pre-training for Text-Video RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xing Wu
Chaochen Gao
Zijia Lin
Zhongyuan Wang
Jizhong Han
Songlin Hu
149
10
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal
  Contrastive Learning
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive LearningNeural Information Processing Systems (NeurIPS), 2022
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TSVLM
273
84
0
12 Oct 2022
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video
  Retrieval Benchmarks
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval BenchmarksFindings (Findings), 2022
Pedro Rodriguez
Mahmoud Azab
Becka Silvert
Renato Sanchez
Linzy Labson
Hardik Shah
Seungwhan Moon
214
2
0
10 Oct 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalAsian Conference on Computer Vision (ACCV), 2022
A. Fragomeni
Michael Wray
Dima Damen
CLIPViT
144
4
0
09 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual
  Text-Video Retrieval
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video RetrievalIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David Harwath
Hilde Kuehne
James R. Glass
VLM
210
8
0
07 Oct 2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text
  Retrieval
Text-Adaptive Multiple Visual Prototype Matching for Video-Text RetrievalNeural Information Processing Systems (NeurIPS), 2022
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
213
37
0
27 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language TasksNeural Information Processing Systems (NeurIPS), 2022
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLMVLM
284
178
0
15 Sep 2022
12
Next