ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.16669
  4. Cited By
Training a Large Video Model on a Single Machine in a Day

Training a Large Video Model on a Single Machine in a Day

28 September 2023
Yue Zhao
Philipp Krahenbuhl
    VLM
ArXivPDFHTML

Papers citing "Training a Large Video Model on a Single Machine in a Day"

25 / 25 papers shown
Title
LLaVAction: evaluating and training multi-modal large language models for action recognition
LLaVAction: evaluating and training multi-modal large language models for action recognition
Shaokai Ye
Haozhe Qi
Alexander Mathis
Mackenzie W. Mathis
60
1
0
24 Mar 2025
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu
Yuting Mei
Xinbi Liu
Sipeng Zheng
Qin Jin
VLM
MDE
57
0
0
19 Mar 2025
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Haoyu Zhang
Qiaohui Chu
Meng Liu
Yunxiao Wang
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Yaowei Wang
Liqiang Nie
EgoV
58
0
0
12 Mar 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei
Y. Huang
Jilan Xu
Guo Chen
Yuping He
...
Yali Wang
Weidi Xie
Yu Qiao
Fei Wu
Limin Wang
39
0
0
02 Mar 2025
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Y. Huang
Jilan Xu
Baoqi Pei
Yuping He
Guo Chen
...
Kunpeng Li
C. Yuan
Y. Wang
Yu Qiao
L. Wang
75
4
0
31 Dec 2024
Extending Video Masked Autoencoders to 128 frames
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
67
1
0
20 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
26
9
0
07 Nov 2024
Mamba Fusion: Learning Actions Through Questioning
Mamba Fusion: Learning Actions Through Questioning
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
40
2
0
17 Sep 2024
EgoVideo: Exploring Egocentric Foundation Model and Downstream
  Adaptation
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation
Baoqi Pei
Guo Chen
Jilan Xu
Yuping He
Yicheng Liu
...
Yifei Huang
Yali Wang
Tong Lu
Limin Wang
Yu Qiao
EgoV
16
10
0
26 Jun 2024
Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance
  Retrieval Challenge 2024
Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024
Xiaoqi Wang
Yi Wang
Lap-Pui Chau
23
0
0
18 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
29
1
0
05 Jun 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
19
0
0
08 Apr 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
51
68
0
14 Mar 2024
Distilling Vision-Language Models on Millions of Videos
Distilling Vision-Language Models on Millions of Videos
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
11
6
0
11 Jan 2024
Real-time Online Video Detection with Temporal Smoothing Transformers
Real-time Online Video Detection with Temporal Smoothing Transformers
Yue Zhao
Philipp Krahenbuhl
ViT
69
56
0
19 Sep 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
222
0
20 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
7,337
0
11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
212
682
0
13 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
239
554
0
28 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
272
1,939
0
09 Feb 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
246
1,982
0
28 Jul 2020
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Zuxuan Wu
Caiming Xiong
Chih-Yao Ma
R. Socher
L. Davis
110
194
0
29 Nov 2018
ECO: Efficient Convolutional Network for Online Video Understanding
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
111
495
0
24 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,214
0
17 Apr 2017
1