ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.03490
  4. Cited By
SimVTP: Simple Video Text Pre-training with Masked Autoencoders

SimVTP: Simple Video Text Pre-training with Masked Autoencoders

7 December 2022
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
ArXivPDFHTML

Papers citing "SimVTP: Simple Video Text Pre-training with Masked Autoencoders"

9 / 9 papers shown
Title
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
38
1
0
18 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
56
23
0
31 Dec 2024
M-BEV: Masked BEV Perception for Robust Autonomous Driving
M-BEV: Masked BEV Perception for Robust Autonomous Driving
Siran Chen
Yue Ma
Yu Qiao
Yali Wang
19
8
0
19 Dec 2023
Strategic Preys Make Acute Predators: Enhancing Camouflaged Object
  Detectors by Generating Camouflaged Objects
Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects
Chunming He
Kai Li
Yachao Zhang
Yulun Zhang
Z. Guo
Xiu Li
Martin Danelljan
F. I. F. Richard Yu
AAML
25
44
0
06 Aug 2023
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
143
166
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
303
771
0
18 Apr 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
401
594
0
21 Jul 2020
1