ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.02824
  4. Cited By
Support-set bottlenecks for video-text representation learning

Support-set bottlenecks for video-text representation learning

6 October 2020
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
ArXivPDFHTML

Papers citing "Support-set bottlenecks for video-text representation learning"

37 / 37 papers shown
Title
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set
  Alignment
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
18
31
0
20 May 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for
  Cross-Modal Representation Learning
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
23
48
0
25 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
23
30
0
21 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
16
52
0
17 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
16
10
0
12 Mar 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text
  Retrieval
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
13
18
0
30 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
17
45
0
16 Jan 2023
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language
  Pre-training
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
26
15
0
21 Nov 2022
Grafting Pre-trained Models for Multimodal Headline Generation
Grafting Pre-trained Models for Multimodal Headline Generation
Lingfeng Qiao
Chen Wu
Ye Liu
Haoyuan Peng
Di Yin
Bo Ren
30
5
0
14 Nov 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual
  Text-Video Retrieval
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David F. Harwath
Hilde Kuehne
James R. Glass
VLM
21
4
0
07 Oct 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
24
3
0
24 Aug 2022
Robustness Analysis of Video-Language Models Against Visual and Language
  Perturbations
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Y. S. Rawat
Vibhav Vineet
VLM
109
17
0
05 Jul 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
Support-set based Multi-modal Representation Enhancement for Video
  Captioning
Support-set based Multi-modal Representation Enhancement for Video Captioning
Xiaoya Chen
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Hengtao Shen
19
4
0
19 May 2022
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Alex Falcon
Swathikiran Sudhakaran
G. Serra
Sergio Escalera
O. Lanz
13
7
0
27 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
6
43
0
26 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
  Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
17
54
0
15 Apr 2022
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with
  Multi-Level Representations
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations
Jie Jiang
Shaobo Min
Weijie Kong
Dihong Gong
Hongfa Wang
Zhifeng Li
Wei Liu
VLM
18
17
0
07 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
26
39
0
06 Apr 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
17
148
0
28 Mar 2022
Disentangled Representation Learning for Text-Video Retrieval
Disentangled Representation Learning for Text-Video Retrieval
Qiang Wang
Yanhao Zhang
Yun Zheng
Pan Pan
Xiansheng Hua
45
76
0
14 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
  More Step Towards Generalization
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
17
16
0
14 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
13
164
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
Sign Language Video Retrieval with Free-Form Textual Queries
Sign Language Video Retrieval with Free-Form Textual Queries
A. Duarte
Samuel Albanie
Xavier Giró-i-Nieto
Gül Varol
SLR
19
29
0
07 Jan 2022
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
22
23
0
02 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
15
79
0
01 Dec 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
215
0
24 Nov 2021
Support-Set Based Cross-Supervision for Video Grounding
Support-Set Based Cross-Supervision for Video Grounding
Xinpeng Ding
N. Wang
Shiwei Zhang
De-Chun Cheng
Xiaomeng Li
Ziyuan Huang
Mingqian Tang
Xinbo Gao
25
42
0
24 Aug 2021
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
Peng Wu
Xiangteng He
Mingqian Tang
Yiliang Lv
Jing Liu
16
50
0
26 Jul 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
11
291
0
21 Jun 2021
Understanding Chinese Video and Language via Contrastive Multimodal
  Pre-Training
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
C. Miao
Houqiang Li
20
41
0
19 Apr 2021
On Semantic Similarity in Video Retrieval
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
16
66
0
18 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual
  Transfer of Vision-Language Models
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
11
56
0
16 Mar 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
401
594
0
21 Jul 2020
1