ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01194
  4. Cited By
Video-Text Pre-training with Learned Regions

Video-Text Pre-training with Learned Regions

2 December 2021
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
ArXivPDFHTML

Papers citing "Video-Text Pre-training with Learned Regions"

21 / 21 papers shown
Title
Symbolic Representation for Any-to-Any Generative Tasks
Symbolic Representation for Any-to-Any Generative Tasks
J. Chen
Xiaoye Zhu
Y. Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
J. Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
36
0
0
24 Apr 2025
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai
Jie Zhou
Xingjiao Wu
Qin Chen
Qingchun Bai
Ze Zhou
Liang He
MoE
30
0
0
01 Mar 2025
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
27
1
0
12 May 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
173
0
24 Jan 2024
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal
  Localized Alignment
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
14
1
0
12 Oct 2023
Patch-aware Batch Normalization for Improving Cross-domain Robustness
Patch-aware Batch Normalization for Improving Cross-domain Robustness
Lei Qi
Dongjia Zhao
Yinghuan Shi
Xin Geng
OOD
20
0
0
06 Apr 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial
  Grounding
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
16
0
0
28 Mar 2023
Plug-and-Play Regulators for Image-Text Matching
Plug-and-Play Regulators for Image-Text Matching
Haiwen Diao
Y. Zhang
W. Liu
Xiang Ruan
Huchuan Lu
16
11
0
23 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
24
195
0
20 Feb 2023
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
19
27
0
07 Dec 2022
LocVTP: Video-Text Pre-training for Temporal Localization
LocVTP: Video-Text Pre-training for Temporal Localization
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
6
64
0
21 Jul 2022
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval
Jinbin Bai
Chunhui Liu
Feiyue Ni
Haofan Wang
Mengying Hu
Xiaofeng Guo
Lele Cheng
34
11
0
11 Jul 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
113
60
0
17 May 2022
All in One: Exploring Unified Video-Language Pre-training
All in One: Exploring Unified Video-Language Pre-training
Alex Jinpeng Wang
Yixiao Ge
Rui Yan
Yuying Ge
Xudong Lin
Guanyu Cai
Jianping Wu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
11
199
0
14 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
24
27
0
08 Mar 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
79
208
0
18 Feb 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
A Straightforward Framework For Video Retrieval Using CLIP
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
302
116
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
193
375
0
01 Feb 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
401
594
0
21 Jul 2020
1