Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.07852
Cited By
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
16 July 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval"
22 / 22 papers shown
Title
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
50
1
0
21 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
42
0
0
13 Mar 2025
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
69
8
0
31 Oct 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
76
7
0
02 Sep 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Haonan Zhang
Pengpeng Zeng
Lianli Gao
Jingkuan Song
Yihang Duan
Xinyu Lyu
Hengtao Shen
VLM
CLIP
23
2
0
21 May 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
26
3
0
11 Dec 2023
Latent Wander: an Alternative Interface for Interactive and Serendipitous Discovery of Large AV Archives
Yuchen Yang
Linyida Zhang
14
2
0
09 Oct 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan-Chia Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming Yang
Yuan Qi
16
10
0
20 Sep 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
18
31
0
20 May 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Mohit Bansal
31
129
0
11 May 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
25
48
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
16
52
0
17 Mar 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
17
45
0
16 Jan 2023
Cross-Modal Adapter for Text-Video Retrieval
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
40
35
0
17 Nov 2022
CLIP-Driven Fine-grained Text-Image Person Re-identification
Shuanglin Yan
Neng Dong
Liyan Zhang
Jinhui Tang
19
86
0
19 Oct 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
34
3
0
24 Aug 2022
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
145
166
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
309
771
0
18 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
410
594
0
21 Jul 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
1