ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09552
  4. Cited By
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

17 November 2022
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
    ViT
ArXivPDFHTML

Papers citing "UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer"

50 / 80 papers shown
Title
Towards Application-Specific Evaluation of Vision Models: Case Studies in Ecology and Biology
Towards Application-Specific Evaluation of Vision Models: Case Studies in Ecology and Biology
A. H. H. Chan
Otto Brookes
Urs Waldmann
Hemal Naik
I. Couzin
...
Lukas Boesch
M. Arandjelovic
H. Kühl
T. Burghardt
Fumihiro Kano
42
0
0
05 May 2025
Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
Wenxuan Liu
X. Zhong
Zhuo Zhou
S. Yang
Chia-Wen Lin
Alex Chichung Kot
32
0
0
29 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
25
0
0
16 Apr 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Min Shi
Shihao Wang
Chieh-Yun Chen
Jitesh Jain
Kai Wang
Junjun Xiong
Guilin Liu
Zhiding Yu
Humphrey Shi
31
1
0
02 Apr 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
34
0
0
31 Mar 2025
OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition
OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition
Shihao Cheng
Jinlu Zhang
Yue Liu
Zhigang Tu
VLM
37
0
0
30 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
42
0
0
24 Mar 2025
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
Lei Li
55
0
0
22 Mar 2025
A Real-Time Human Action Recognition Model for Assisted Living
A Real-Time Human Action Recognition Model for Assisted Living
Yixuan Wang
Paul Stynes
Pramod Pathak
Cristina Muntean
29
0
0
18 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
48
0
0
17 Mar 2025
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin
Haisheng Su
Kai Liu
Cong Ma
Wei Yu Wu
Fei Hui
Junchi Yan
Mamba
70
0
0
15 Mar 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong
Yean Cheng
Z. Yang
Weihan Wang
Lefan Wang
Xiaotao Gu
Shiyu Huang
Yuxiao Dong
J. Tang
CoGe
VLM
71
4
0
06 Jan 2025
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Xinrui Zhou
Yuhao Huang
Haoran Dou
Shijing Chen
Ao Chang
...
Jie Jessie Ren
Ruobing Huang
Jun Cheng
Wufeng Xue
Dong Ni
MedIm
57
0
0
25 Sep 2024
Real-time Accident Anticipation for Autonomous Driving Through Monocular
  Depth-Enhanced 3D Modeling
Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling
Haicheng Liao
Yongkang Li
Chengyue Wang
Songning Lai
Zhenning Li
Zilin Bian
Jaeyoung Lee
Zhiyong Cui
Guohui Zhang
Chengzhong Xu
24
8
0
02 Sep 2024
ToddlerAct: A Toddler Action Recognition Dataset for Gross Motor
  Development Assessment
ToddlerAct: A Toddler Action Recognition Dataset for Gross Motor Development Assessment
Hsiang-Wei Huang
Jiacheng Sun
Cheng-Yen Yang
Zhongyu Jiang
Li-Yu Huang
Jenq-Neng Hwang
Yu-Ching Yeh
22
0
0
31 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
45
87
0
29 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action
  Anticipation
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
18
1
0
05 Aug 2024
UniForensics: Face Forgery Detection via General Facial Representation
UniForensics: Face Forgery Detection via General Facial Representation
Ziyuan Fang
Hanqing Zhao
Tianyi Wei
Wenbo Zhou
Ming Wan
Zhanyi Wang
Weiming Zhang
Neng H. Yu
CVBM
26
1
0
26 Jul 2024
CRASH: Crash Recognition and Anticipation System Harnessing with
  Context-Aware and Temporal Focus Attentions
CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions
Haicheng Liao
Haoyu Sun
Huanming Shen
Chengyue Wang
Kahou Tam
Chunlin Tian
Li Li
Chengzhong Xu
Zhenning Li
21
5
0
25 Jul 2024
Motion Capture from Inertial and Vision Sensors
Motion Capture from Inertial and Vision Sensors
Xiaodong Chen
Wu Liu
Qian Bao
Xinchen Liu
Quanwei Yang
Ruoli Dai
Tao Mei
40
3
0
23 Jul 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
M. Zhang
Tat-Seng Chua
Shuicheng Yan
AI4TS
34
37
0
27 Jun 2024
The SkatingVerse Workshop & Challenge: Methods and Results
The SkatingVerse Workshop & Challenge: Methods and Results
Jian Zhao
Lei Jin
Jianshu Li
Zheng Zhu
Yinglei Teng
...
Shiníchi Satoh
Yandong Guo
Cewu Lu
Junliang Xing
Jane Shengmei Shen
AI4TS
18
0
0
27 May 2024
Bidirectional Progressive Transformer for Interaction Intention
  Anticipation
Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang
Hongcheng Luo
Wei Zhai
Yang Cao
Yu Kang
22
5
0
09 May 2024
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
Chunyi Li
Tengchuan Kou
...
Qi Yan
Youran Qu
Xiaohui Zeng
Lele Wang
Renjie Liao
48
29
0
25 Apr 2024
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text
  Consistency and Domain Distribution Gap
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap
Bowen Qu
Xiaoyu Liang
Shangkun Sun
Wei-Nan Gao
EGVM
20
6
0
21 Apr 2024
On the Content Bias in Fréchet Video Distance
On the Content Bias in Fréchet Video Distance
Jason S. Hoffman
Aniruddha Mahapatra
Gaurav Parmar
Jun-Yan Zhu
Jia-Bin Huang
EGVM
47
15
0
18 Apr 2024
The 8th AI City Challenge
The 8th AI City Challenge
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
Yue Yao
...
Xunlei Wu
S. Pusegaonkar
Yizhou Wang
Sujit Biswas
Rama Chellappa
28
31
0
15 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
26
30
0
01 Apr 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
27
1
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
27
44
0
22 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
30
174
0
11 Mar 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of
  Foundation Models for Open-World Video Recognition
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
28
3
0
29 Feb 2024
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video
  Generative Models
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
Pum Jun Kim
Seojun Kim
Jaejun Yoo
EGVM
11
3
0
30 Jan 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action
  Recognition
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Boyuan Jiang
Jun Chen
Jianbiao Mei
Xingxing Zuo
Guang Dai
Jingdong Wang
Yong-Jin Liu
VLM
26
3
0
22 Jan 2024
Hierarchical Augmentation and Distillation for Class Incremental
  Audio-Visual Video Recognition
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
Yukun Zuo
Hantao Yao
Liansheng Zhuang
Changsheng Xu
13
2
0
11 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
17
0
0
10 Jan 2024
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
22
10
0
04 Dec 2023
Generating Action-conditioned Prompts for Open-vocabulary Video Action
  Recognition
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
26
4
0
04 Dec 2023
LEAP: LLM-Generation of Egocentric Action Programs
LEAP: LLM-Generation of Egocentric Action Programs
Eadom Dessalene
Michael Maynord
Cornelia Fermuller
Yiannis Aloimonos
16
3
0
29 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
46
398
0
28 Nov 2023
VLM-Eval: A General Evaluation on Video Large Language Models
VLM-Eval: A General Evaluation on Video Large Language Models
Shuailin Li
Yuang Zhang
Yucheng Zhao
Qiuyue Wang
Fan Jia
Yingfei Liu
Tiancai Wang
MLLM
ELM
10
2
0
20 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
30
2
0
30 Oct 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures,
  Optimization and Data
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
28
21
0
08 Oct 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
16
18
0
14 Sep 2023
Progression-Guided Temporal Action Detection in Videos
Progression-Guided Temporal Action Detection in Videos
Chongkai Lu
Man-Wai Mak
Ruimin Li
Z. Chi
Hong Fu
AI4TS
12
0
0
18 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
51
37
0
15 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
22
9
0
10 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video
  Understanding
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
17
259
0
31 Jul 2023
12
Next