ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.04290
  4. Cited By
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual
  Softmax Loss
v1v2v3 (latest)

Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

9 September 2021
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
ArXiv (abs)PDFHTML

Papers citing "Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss"

50 / 103 papers shown
Title
Table Comprehension in Building Codes using Vision Language Models and Domain-Specific Fine-Tuning
Table Comprehension in Building Codes using Vision Language Models and Domain-Specific Fine-Tuning
Mohammad Aqib
Mohd Hamza
Ying Hei Chui
Qipei Mei
LMTD
289
0
0
23 Nov 2025
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Haoyu Zhao
Jiaxi Gu
Shicong Wang
Xing Zhang
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
108
0
0
20 Aug 2025
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
Yili Li
Gang Xiong
Gaopeng Gou
Xiangyan Qu
Jiamin Zhuang
Zhen Li
Junzheng Shi
120
0
0
28 Jul 2025
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalComputer Vision and Pattern Recognition (CVPR), 2025
Leqi Shen
Guoqiang Gong
Tianxiang Hao
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Jungong Han
Guiguang Ding
174
4
0
10 Jun 2025
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review
A. Fragomeni
Dima Damen
Michael Wray
191
0
0
29 May 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-Text RetrievalComputer Vision and Pattern Recognition (CVPR), 2025
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
229
3
0
03 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
394
1
0
02 Apr 2025
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Video-ColBERT: Contextualized Late Interaction for Text-to-Video RetrievalComputer Vision and Pattern Recognition (CVPR), 2025
Arun V. Reddy
Alexander Martin
Eugene Yang
Andrew Yates
Kate Sanders
Kenton W. Murray
Reno Kriz
Celso M. De Melo
Benjamin Van Durme
Rama Chellappa
276
9
0
24 Mar 2025
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
202
1
0
18 Mar 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen
Zhengrong Yue
Siran Chen
Xiping Hu
Yang Liu
Ziwei Sun
Longji Xu
VLM
1.1K
17
0
13 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal RetrievalComputer Vision and Pattern Recognition (CVPR), 2025
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
187
2
0
13 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level CaptionsComputer Vision and Pattern Recognition (CVPR), 2025
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
507
5
0
07 Mar 2025
Language-based Audio Retrieval with Co-Attention Networks
Language-based Audio Retrieval with Co-Attention Networks
Haoran Sun
Xiping Hu
Qiuyi Chen
Jianjun Chen
Jia Wang
Haiyang Zhang
130
0
0
31 Dec 2024
GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network
GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural NetworkNeural Networks (NN), 2024
Xianfeng Song
Yi Zou
Zheng Shi
Zheng Liu
226
0
0
24 Dec 2024
Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video
  Retrieval
Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2024
Jian Xiao
Zhenzhen Hu
Jia Li
Richang Hong
88
0
0
09 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
TokenBinder: Text-Video Retrieval with One-to-Many Alignment ParadigmIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
182
5
0
30 Sep 2024
T2VIndexer: A Generative Video Indexer for Efficient Text-Video
  Retrieval
T2VIndexer: A Generative Video Indexer for Efficient Text-Video RetrievalACM Multimedia (MM), 2024
Yili Li
Jing Yu
Keke Gai
Bang Liu
Gang Xiong
Qi Wu
DiffMVGen
155
5
0
21 Aug 2024
TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition
TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition
Bin Wang
W. Li
Wenqian Wang
Mingliang Gao
Runmin Cong
Wei Emma Zhang
VLM
167
1
0
20 Aug 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
243
64
0
27 Jun 2024
Multi-Granularity and Multi-modal Feature Interaction Approach for Text
  Video Retrieval
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval
Wenjun Li
Shudong Wang
Dong Zhao
Shenghui Xu
Zhaoming Pan
Zhimin Zhang
111
1
0
21 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
274
19
0
29 May 2024
An Empirical Study of Excitation and Aggregation Design Adaptions in
  CLIP4Clip for Video-Text Retrieval
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
CLIP
179
2
0
25 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
236
2
0
12 May 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
336
4
0
22 Apr 2024
Anchor-aware Deep Metric Learning for Audio-visual Retrieval
Anchor-aware Deep Metric Learning for Audio-visual Retrieval
Donghuo Zeng
Yanan Wang
Kazushi Ikeda
Yi Yu
158
3
0
21 Apr 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
262
3
0
18 Apr 2024
Improving Continuous Sign Language Recognition with Adapted Image Models
Improving Continuous Sign Language Recognition with Adapted Image Models
Lianyu Hu
Tongkai Shi
Liqing Gao
Zekang Liu
Wei Feng
VLM
204
9
0
12 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
288
59
0
26 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLMAI4TS
176
8
0
21 Mar 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based
  Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
222
5
0
31 Jan 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
414
7
0
03 Jan 2024
COMMA: Co-Articulated Multi-Modal Learning
COMMA: Co-Articulated Multi-Modal LearningAAAI Conference on Artificial Intelligence (AAAI), 2023
Lianyu Hu
Liqing Gao
Zekang Liu
Chi-Man Pun
Wei Feng
VLM
170
7
0
30 Dec 2023
D3Former: Jointly Learning Repeatable Dense Detectors and
  Feature-enhanced Descriptors via Saliency-guided Transformer
D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer
Junjie Gao
Pengfei Wang
Qiujie Dong
Qiong Zeng
Shiqing Xin
Caiming Zhang
161
0
0
20 Dec 2023
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling
  Vision-Language Models Through Open-Vocabulary Knowledge
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary KnowledgeIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Huy Le
Tung Kieu
Anh Nguyen
Ngan Le
VGen
222
6
0
15 Dec 2023
RTQ: Rethinking Video-language Understanding Based on Image-text Model
RTQ: Rethinking Video-language Understanding Based on Image-text ModelACM Multimedia (ACM MM), 2023
Xiao Wang
Yaoyu Li
Tian Gan
Zheng Zhang
Jingjing Lv
Liqiang Nie
228
12
0
01 Dec 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient
  Image-to-Video Transfer Learning
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
258
13
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
221
4
0
25 Nov 2023
Sinkhorn Transformations for Single-Query Postprocessing in Text-Video
  Retrieval
Sinkhorn Transformations for Single-Query Postprocessing in Text-Video RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Konstantin Yakovlev
Gregory Polyakov
I. Alimova
Alexander Podolskiy
A. Bout
Sergey I. Nikolenko
Irina Piontkovskaya
CLIP
182
2
0
14 Nov 2023
An Empirical Study of Frame Selection for Text-to-Video Retrieval
An Empirical Study of Frame Selection for Text-to-Video RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mengxia Wu
Min Cao
Yang Bai
Ziyin Zeng
Chen Chen
Liqiang Nie
Min Zhang
232
4
0
01 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIPVLMVGen
310
3
0
30 Oct 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian
Yimu Wang
191
6
0
20 Oct 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and
  Gallery Banks
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery BanksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Xiangru Jian
Bo Xue
174
22
0
17 Oct 2023
OAAFormer: Robust and Efficient Point Cloud Registration Through
  Overlapping-Aware Attention in Transformer
OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer
Junjie Gao
Qiujie Dong
Ruian Wang
Shuangmin Chen
Shiqing Xin
Changhe Tu
Wenping Wang
184
4
0
15 Oct 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive LearningACM Multimedia (ACM MM), 2023
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming-Hsuan Yang
Yuan Qi
309
16
0
20 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Unified Coarse-to-Fine Alignment for Video-Text RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
346
76
0
18 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer LearningIEEE International Conference on Computer Vision (ICCV), 2023
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
188
31
0
14 Sep 2023
DePT: Decoupled Prompt Tuning
DePT: Decoupled Prompt TuningComputer Vision and Pattern Recognition (CVPR), 2023
Ji Zhang
Shihan Wu
Lianli Gao
Hengtao Shen
Jingkuan Song
VLM
224
59
0
14 Sep 2023
Multi-event Video-Text Retrieval
Multi-event Video-Text RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Gengyuan Zhang
Jisen Ren
Jindong Gu
Volker Tresp
167
18
0
22 Aug 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
234
33
0
24 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Video-Mined Task Graphs for Keystep Recognition in Instructional VideosNeural Information Processing Systems (NeurIPS), 2023
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
266
35
0
17 Jul 2023
123
Next