ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.12408
  4. Cited By
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

European Conference on Computer Vision (ECCV), 2022
26 April 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
    VLM
ArXiv (abs)PDFHTML

Papers citing "MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval"

26 / 26 papers shown
Title
X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning
X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning
Prasanna Reddy Pulakurthi
Jiamian Wang
Majid Rabbani
S. Dianat
Raghuveer Rao
Zhiqiang Tao
VGenLRM
129
0
0
25 Sep 2025
Few-Shot Classification of Interactive Activities of Daily Living
  (InteractADL)
Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)
Zane Durante
Robathan Harries
Edward Vendrow
Zelun Luo
Yuta Kyuragi
Kazuki Kozuka
Fei-Fei Li
Ehsan Adeli
VLM
231
2
0
03 Jun 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
256
2
0
12 May 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language
  Pre-training
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
242
0
0
01 Mar 2024
Unifying Latent and Lexicon Representations for Effective Video-Text
  Retrieval
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
170
1
0
26 Feb 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
267
27
0
31 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLMMLLM
225
65
0
11 Dec 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
  Interactions
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
165
1
0
18 Oct 2023
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
  Retrieval
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalNeural Information Processing Systems (NeurIPS), 2023
Hao Li
Marie-Jeanne Lesot
Lianli Gao
Xiaosu Zhu
Christophe Marsala
EDL
245
28
0
29 Sep 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video
  Retrieval Models
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
266
4
0
28 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelInternational Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLMCLIP
185
11
0
15 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language
  Pre-training
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
216
8
0
12 Jun 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
  Scale
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
255
13
0
23 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text
  Retrieval
Mask to reconstruct: Cooperative Semantics Completion for Video-text RetrievalACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
228
5
0
13 May 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
354
149
0
17 Apr 2023
Improving Vision-and-Language Navigation by Generating Future-View Image
  Semantics
Improving Vision-and-Language Navigation by Generating Future-View Image SemanticsComputer Vision and Pattern Recognition (CVPR), 2023
Jialu Li
Joey Tianyi Zhou
222
55
0
11 Apr 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial
  Grounding
Structured Video-Language Modeling with Temporal Grouping and Spatial GroundingInternational Conference on Learning Representations (ICLR), 2023
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
213
0
0
28 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a ReviewInternational Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
214
28
0
24 Feb 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Learning Trajectory-Word Alignments for Video-Language TasksIEEE International Conference on Computer Vision (ICCV), 2023
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
181
7
0
05 Jan 2023
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
177
13
0
02 Dec 2022
Seeing What You Miss: Vision-Language Pre-training with Semantic
  Completion Learning
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningComputer Vision and Pattern Recognition (CVPR), 2022
Yatai Ji
Rong-Cheng Tu
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
241
17
0
24 Nov 2022
Learning Transferable Spatiotemporal Representations from Natural Script
  Knowledge
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeComputer Vision and Pattern Recognition (CVPR), 2022
Ziyun Zeng
Yuying Ge
Xihui Liu
Bin Chen
Ping Luo
Shutao Xia
Yixiao Ge
AI4TS
169
9
0
30 Sep 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
334
6
0
24 Aug 2022
Clover: Towards A Unified Video-Language Alignment and Fusion Model
Clover: Towards A Unified Video-Language Alignment and Fusion ModelComputer Vision and Pattern Recognition (CVPR), 2022
Jingjia Huang
Yinan Li
Jiashi Feng
Xinglong Wu
Xiaoshuai Sun
Rongrong Ji
VLM
249
55
0
16 Jul 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingEuropean Conference on Computer Vision (ECCV), 2022
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
316
44
0
29 Mar 2022
All in One: Exploring Unified Video-Language Pre-training
All in One: Exploring Unified Video-Language Pre-trainingComputer Vision and Pattern Recognition (CVPR), 2022
Alex Jinpeng Wang
Yixiao Ge
Rui Yan
Yuying Ge
Xudong Lin
Guanyu Cai
Jianping Wu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
257
235
0
14 Mar 2022
1