Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.04676
Cited By
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
12 January 2022
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning"
50 / 141 papers shown
Title
Kernel Dynamic Mode Decomposition For Sparse Reconstruction of Closable Koopman Operators
Nishant Panda
Himanshu Singh
J. Nathan Kutz
14
0
0
11 May 2025
CANet: ChronoAdaptive Network for Enhanced Long-Term Time Series Forecasting under Non-Stationarity
Mert Sonmezer
Seyda Ertekin
AI4TS
24
0
0
24 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
32
0
0
02 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
50
0
0
01 Apr 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
31
0
0
31 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
51
0
0
30 Mar 2025
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu
Kunlun Xu
Bing-Huang Su
Xu Zou
Yuxin Peng
Jiahuan Zhou
VLM
AI4TS
65
1
0
20 Mar 2025
A Real-Time Human Action Recognition Model for Assisted Living
Yixuan Wang
Paul Stynes
Pramod Pathak
Cristina Muntean
29
0
0
18 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
54
0
0
17 Mar 2025
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin
Haisheng Su
Kai Liu
Cong Ma
Wei Yu Wu
Fei Hui
Junchi Yan
Mamba
70
0
0
15 Mar 2025
AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements
Calvin Yeung
Tomohiro Suzuki
Ryota Tanaka
Zhuoer Yin
Keisuke Fujii
3DH
68
1
0
10 Mar 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Jinwei Gu
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
51
0
0
21 Jan 2025
Reinforcement Learning from Wild Animal Videos
Elliot Chane-Sane
Constant Roux
O. Stasse
Nicolas Mansard
84
0
0
05 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
99
1
0
21 Nov 2024
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia wei
Jun-Jie Zhu
Jianfei Chen
VLM
MQ
56
2
0
17 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
28
0
0
12 Nov 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
77
18
0
03 Oct 2024
Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection
Yuncheng Jiang
Zixun Zhang
Jun Wei
Chun-Mei Feng
Guanbin Li
Xiang Wan
Shuguang Cui
Zhen Li
ViT
MedIm
26
1
0
26 Aug 2024
MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
Wenqing Gan
Yaoyu Li
Jian Li
Zhangang Lin
ViT
30
0
0
01 Aug 2024
Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
Qian Liang
Yan Chen
Yang Hu
CLL
36
1
0
19 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
42
5
0
08 Jul 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
33
3
0
21 Jun 2024
Infinite-Dimensional Feature Interaction
Chenhui Xu
Fuxun Yu
Maoliang Li
Zihao Zheng
Zirui Xu
Jinjun Xiong
Xiang Chen
25
1
0
22 May 2024
Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network
Min Hun Lee
AI4TS
ViT
FAtt
24
3
0
18 May 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
25
1
0
09 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
76
35
0
06 May 2024
CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
Damith Chamalke Senadeera
Xiaoyun Yang
Dimitrios Kollias
Gregory G. Slabaugh
19
0
0
27 Apr 2024
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
Yang He
Joey Tianyi Zhou
ViT
27
1
0
21 Apr 2024
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Min-man Wu
Xiaoli Li
AI4TS
AIFin
28
32
0
12 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
54
56
0
04 Apr 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
37
32
0
29 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
69
14
0
26 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
27
1
0
24 Mar 2024
CoReEcho: Continuous Representation Learning for 2D+time Echocardiography Analysis
F. Maani
Numan Saeed
Aleksandr Matsun
Mohammad Yaqub
SyDa
47
3
0
15 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
68
0
14 Mar 2024
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia
Xinliang Wang
Feng Lv
Xin Hao
Yifeng Shi
ViT
18
45
0
12 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
30
174
0
11 Mar 2024
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
Dan Guo
Kun Li
Bin Hu
Yan Zhang
Meng Wang
51
37
0
08 Mar 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
79
70
0
15 Feb 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
14
1
0
12 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition
Xingyu Song
Zhan Li
Shi Chen
K. Demachi
14
1
0
24 Jan 2024
Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts
Kiyoon Kim
Shreyank N. Gowda
Panagiotis Eustratiadis
Antreas Antoniou
Robert B Fisher
26
2
0
21 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
17
0
0
10 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
17
4
0
08 Jan 2024
Explore Human Parsing Modality for Action Recognition
Jinfu Liu
Runwei Ding
Yuhang Wen
Nan Dai
Fanyang Meng
Shen Zhao
Mengyuan Liu
20
6
0
04 Jan 2024
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu
Yossi Gandelsman
Amir Bar
Jianwei Yang
Jianfeng Gao
Trevor Darrell
Xiaolong Wang
VLM
16
3
0
04 Dec 2023
D
2
^2
2
ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei
Qizhong Tan
Guangming Lu
Jiandong Tian
39
3
0
03 Dec 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
22
10
0
30 Nov 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
17
15
0
30 Oct 2023
1
2
3
Next