ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.04676
  4. Cited By
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

12 January 2022
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
    ViT
ArXivPDFHTML

Papers citing "UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning"

50 / 141 papers shown
Title
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized
  Self-Attention for Human Activity Recognition
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition
Rachid Reda Dokkar
F. Chaieb
Hassen Drira
Arezki Aberkane
ViT
12
2
0
22 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to
  Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
27
8
0
02 Oct 2023
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time
  Echocardiograms with Self- and Weakly-Supervised Learning
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning
F. Maani
Asim Ukaye
Nada Saadi
Numan Saeed
Mohammad Yaqub
36
1
0
30 Sep 2023
Telling Stories for Common Sense Zero-Shot Action Recognition
Telling Stories for Common Sense Zero-Shot Action Recognition
Shreyank N. Gowda
Carolina Scarton
LM&Ro
14
2
0
29 Sep 2023
ADU-Depth: Attention-based Distillation with Uncertainty Modeling for
  Depth Estimation
ADU-Depth: Attention-based Distillation with Uncertainty Modeling for Depth Estimation
Zizhang Wu
Zhuozheng Li
Zhi-Gang Fan
Yunzhe Wu
Xiaoquan Wang
Rui Tang
Jian Pu
18
1
0
26 Sep 2023
CINFormer: Transformer network with multi-stage CNN feature injection
  for surface defect segmentation
CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Xiaoheng Jiang
Kaiyi Guo
Yang Lu
Feng Yan
Hao Liu
Jiale Cao
Mingliang Xu
Dacheng Tao
MedIm
ViT
UQCV
8
1
0
22 Sep 2023
RMT: Retentive Networks Meet Vision Transformers
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
30
65
0
20 Sep 2023
Multi-spectral Entropy Constrained Neural Compression of Solar Imagery
Multi-spectral Entropy Constrained Neural Compression of Solar Imagery
Ali Zafari
Atefeh Khoshkhahtinat
P. Mehta
Nasser M. Nasrabadi
Barbara J. Thompson
M. Kirk
D. D. Silva
15
0
0
19 Sep 2023
Selective Volume Mixup for Video Action Recognition
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
28
2
0
18 Sep 2023
TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic
  Segmentation
TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation
Rong Li
Shijie Li
Xieyuanli Chen
Teli Ma
Juergen Gall
Junwei Liang
3DPC
14
25
0
14 Sep 2023
Unified Contrastive Fusion Transformer for Multimodal Human Action
  Recognition
Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition
Kyoung Ok Yang
Junho Koh
Jun-Won Choi
23
0
0
10 Sep 2023
Large Content And Behavior Models To Understand, Simulate, And Optimize
  Content And Behavior
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
Ashmit Khandelwal
Aditya Agrawal
Aanisha Bhattacharyya
Yaman Kumar Singla
Somesh Singh
...
Ishita Dasgupta
Stefano Petrangeli
R. Shah
Changyou Chen
Balaji Krishnamurthy
11
8
0
01 Sep 2023
IndGIC: Supervised Action Recognition under Low Illumination
IndGIC: Supervised Action Recognition under Low Illumination
Jing-Teng Zeng
22
1
0
29 Aug 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
19
20
0
27 Aug 2023
TransFace: Calibrating Transformer Training for Face Recognition from a
  Data-Centric Perspective
TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
Jun Dan
Yang Liu
Haoyu Xie
Jiankang Deng
H. Xie
Xuansong Xie
Baigui Sun
ViT
12
21
0
20 Aug 2023
Revisiting Vision Transformer from the View of Path Ensemble
Revisiting Vision Transformer from the View of Path Ensemble
Shuning Chang
Pichao Wang
Haowen Luo
Fan Wang
Mike Zheng Shou
ViT
16
3
0
12 Aug 2023
PVG: Progressive Vision Graph for Vision Recognition
PVG: Progressive Vision Graph for Vision Recognition
Jiafu Wu
Jian Li
Jiangning Zhang
Boshen Zhang
M. Chi
Yabiao Wang
Chengjie Wang
ViT
8
12
0
01 Aug 2023
VideoPro: A Visual Analytics Approach for Interactive Video Programming
VideoPro: A Visual Analytics Approach for Interactive Video Programming
Jianben He
Xingbo Wang
Kamkwai Wong
Xijie Huang
Changjian Chen
Zixin Chen
Fengjie Wang
Min Zhu
Huamin Qu
15
10
0
01 Aug 2023
Traffic-Domain Video Question Answering with Automatic Captioning
Traffic-Domain Video Question Answering with Automatic Captioning
Ehsan Qasemi
Jonathan M Francis
A. Oltramari
24
8
0
18 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
F. Khan
ViT
46
12
0
13 Jul 2023
A survey on deep learning approaches for data integration in autonomous
  driving system
A survey on deep learning approaches for data integration in autonomous driving system
Xi Zhu
Likang Wang
Caifa Zhou
Xiya Cao
Yue Gong
L. Chen
23
1
0
17 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action
  Recognition
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
8
4
0
07 Jun 2023
Inflated 3D Convolution-Transformer for Weakly-supervised Carotid
  Stenosis Grading with Ultrasound Videos
Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos
Xinrui Zhou
Yuhao Huang
Wufeng Xue
Xin Yang
Yuxin Zou
Qilong Ying
Yuanji Zhang
Jia Liu
Jie Jessie Ren
Dong Ni
ViT
MedIm
20
4
0
05 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
25
27
0
01 Jun 2023
InterFormer: Interactive Local and Global Features Fusion for Automatic
  Speech Recognition
InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition
Zhibing Lai
Tianren Zhang
Qi Liu
Xinyuan Qian
Li-Fang Wei
Songlu Chen
Feng Chen
Xu-Cheng Yin
27
2
0
24 May 2023
Dual Path Transformer with Partition Attention
Dual Path Transformer with Partition Attention
Zhengkai Jiang
Liang Liu
Jiangning Zhang
Yabiao Wang
Mingang Chen
Chengjie Wang
ViT
28
2
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
92
76
0
22 May 2023
Preconditioned Visual Language Inference with Weak Supervision
Preconditioned Visual Language Inference with Weak Supervision
Ehsan Qasemi
Amani Maina-Kilaas
Devadutta Dash
Khalid Alsaggaf
Muhao Chen
17
0
0
22 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
  Beyond Language
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRM
MLLM
12
78
0
09 May 2023
Implicit Temporal Modeling with Learnable Alignment for Video
  Recognition
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
25
35
0
20 Apr 2023
Rethinking Local Perception in Lightweight Vision Transformer
Rethinking Local Perception in Lightweight Vision Transformer
Qi Fan
Huaibo Huang
Jiyang Guan
Ran He
ViT
16
25
0
31 Mar 2023
DDP: Diffusion Model for Dense Visual Prediction
DDP: Diffusion Model for Dense Visual Prediction
Yuanfeng Ji
Zhe Chen
Enze Xie
Lanqing Hong
Xihui Liu
Zhaoqiang Liu
Tong Lu
Zhenguo Li
Ping Luo
DiffM
VLM
24
85
0
30 Mar 2023
Streaming Video Model
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
25
12
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
41
322
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
25
155
0
28 Mar 2023
Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation
Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation
Yuecong Xu
Jianfei Yang
Yunjiao Zhou
Zhenghua Chen
Min-man Wu
Xiaoli Li
17
5
0
18 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
K. Sohn
ViT
9
37
0
17 Mar 2023
BiFormer: Vision Transformer with Bi-Level Routing Attention
BiFormer: Vision Transformer with Bi-Level Routing Attention
Lei Zhu
Xinjiang Wang
Zhanghan Ke
Wayne Zhang
Rynson W. H. Lau
123
438
0
15 Mar 2023
SSGD: A smartphone screen glass dataset for defect detection
SSGD: A smartphone screen glass dataset for defect detection
Haonan Han
Rui Yang
Shuyan Li
R. Hu
Xiu Li
17
10
0
12 Mar 2023
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
  Recognition
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition
Junyan Wang
Zhenhong Sun
Yichen Qian
Dong Gong
Xiuyu Sun
Ming Lin
M. Pagnucco
Yang Song
3DPC
15
11
0
05 Mar 2023
Rethinking Efficient Tuning Methods from a Unified Perspective
Rethinking Efficient Tuning Methods from a Unified Perspective
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Yiliang Lv
Deli Zhao
Jingren Zhou
9
9
0
01 Mar 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
39
143
0
06 Feb 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
8
25
0
05 Jan 2023
Rethinking Mobile Block for Efficient Attention-based Models
Rethinking Mobile Block for Efficient Attention-based Models
Jiangning Zhang
Xiangtai Li
Jian Li
Liang Liu
Zhucun Xue
Boshen Zhang
Zhe Jiang
Tianxin Huang
Yabiao Wang
Chengjie Wang
MQ
44
78
0
03 Jan 2023
PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part
  Segmentation
PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
Xiangtai Li
Shilin Xu
Yibo Yang
Haobo Yuan
Guangliang Cheng
Yu Tong
Zhouchen Lin
Ming-Hsuan Yang
Dacheng Tao
ViT
26
21
0
03 Jan 2023
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
19
86
0
08 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
32
16
0
08 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
38
307
0
06 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
11
30
0
01 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
13
2
0
29 Nov 2022
Previous
123
Next