Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.01526
Cited By
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
2 December 2021
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MViTv2: Improved Multiscale Vision Transformers for Classification and Detection"
50 / 395 papers shown
Title
Spatial-Temporal Alignment Network for Action Recognition
Jinhui Ye
Junwei Liang
3DPC
13
1
0
19 Aug 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
11
1
0
10 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
22
9
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
16
16
0
08 Aug 2023
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Xiao Wang
Zong-Yao Wu
Yao Rong
Lin Zhu
Bowei Jiang
Jin Tang
Yonghong Tian
ViT
64
14
0
08 Aug 2023
DiT: Efficient Vision Transformers with Dynamic Token Routing
Yuchen Ma
Zhengcong Fei
Junshi Huang
ViT
6
2
0
07 Aug 2023
A Hybrid CNN-Transformer Architecture with Frequency Domain Contrastive Learning for Image Deraining
Cheng-i Wang
Wei Li
34
0
0
07 Aug 2023
M2Former: Multi-Scale Patch Selection for Fine-Grained Visual Recognition
Ji-Hee Moon
Junseok K. Lee
Yu-Ling Lee
Seongsik Park
20
4
0
04 Aug 2023
Revisiting DETR Pre-training for Object Detection
Yan Ma
Weicong Liang
Bo-Ying Chen
Yiduo Hao
Bojian Hou
Xiangyu Yue
Chao Zhang
Yuhui Yuan
VLM
ViT
22
4
0
02 Aug 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
25
7
0
27 Jul 2023
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
Xiaochen Ma
Bo Du
Zhuohang Jiang
Ahmed Y. Al Hammadi
Jizhe Zhou
11
7
0
27 Jul 2023
Causal reasoning in typical computer vision tasks
Kexuan Zhang
Qiyu Sun
Chaoqiang Zhao
Yang Tang
CML
24
11
0
26 Jul 2023
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Cheng Han
Qifan Wang
Yiming Cui
Zhiwen Cao
Wenguan Wang
Siyuan Qi
Dongfang Liu
VPVLM
VLM
12
46
0
25 Jul 2023
FlexiAST: Flexibility is What AST Needs
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
11
3
0
18 Jul 2023
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang
Hui Chen
Zijia Lin
Hengjun Pu
Guiguang Ding
21
169
0
18 Jul 2023
The Effects of Mixed Sample Data Augmentation are Class Dependent
Haeil Lee
Han S. Lee
Junmo Kim
14
1
0
18 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
21
23
0
17 Jul 2023
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Mennatullah Siam
R. Karim
Henghui Zhao
Richard P. Wildes
VOS
16
2
0
15 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
13
22
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
F. Khan
ViT
46
19
0
13 Jul 2023
A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2023
Yi Cheng
Ziwei Xu
Fen Fang
Dongyun Lin
Hehe Fan
Yongkang Wong
Ying Sun
Mohan S. Kankanhalli
14
0
0
13 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
15
102
0
12 Jul 2023
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding
Hao Zheng
R. Lee
Yuqian Lu
VGen
17
16
0
09 Jul 2023
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
16
0
0
23 Jun 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
30
14
0
20 Jun 2023
PaReprop: Fast Parallelized Reversible Backpropagation
Tyler Lixuan Zhu
K. Mangalam
9
1
0
15 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
20
4
0
15 Jun 2023
E2E-LOAD: End-to-End Long-form Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
17
5
0
13 Jun 2023
Mitigating Transformer Overconfidence via Lipschitz Regularization
Wenqian Ye
Yunsheng Ma
Xu Cao
Kun Tang
16
13
0
12 Jun 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
24
16
0
10 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
37
0
0
04 Jun 2023
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Haibo Qiu
Baosheng Yu
Dacheng Tao
3DPC
ViT
11
5
0
02 Jun 2023
Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
Anudhyan Boral
Z. Y. Wan
Leonardo Zepeda-Núnez
James Lottes
Qing Wang
Yi-fan Chen
John R. Anderson
Fei Sha
AI4CE
PINN
8
11
0
01 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
41
156
0
01 Jun 2023
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding
Jun Chen
Ming Hu
D. Coker
M. Berumen
Blair R. Costelloe
Sara Beery
Anna Rohrbach
Mohamed Elhoseiny
14
20
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Z. Tan
12
7
0
01 Jun 2023
Vision Transformers for Mobile Applications: A Short Survey
Nahid Alam
Steven Kolawole
S. Sethi
Nishant Bansali
Karina Nguyen
ViT
16
3
0
30 May 2023
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez
Teck-Yian Lim
Minh N. Do
Raymond A. Yeh
ViT
14
7
0
25 May 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
27
9
0
25 May 2023
Slovo: Russian Sign Language Dataset
A. Kapitanov
Karina Kvanchiani
A.M. Nagaev
Elizaveta Petrova
SLR
13
3
0
23 May 2023
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention
Sanket Thakur
Cigdem Beyan
Pietro Morerio
Vittorio Murino
Alessio Del Bue
20
6
0
22 May 2023
Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition
Junqiao Zhao
Fenglin Zhang
Yingfeng Cai
Geng Tian
Wenjie Mu
Chen Ye
Tiantian Feng
15
4
0
19 May 2023
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models
Antoni Bigata Casademunt
Rodrigo Mira
Nikita Drobyshev
Konstantinos Vougioukas
Stavros Petridis
M. Pantic
DiffM
56
1
0
15 May 2023
CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
Yunsheng Ma
Wenqian Ye
Xu Cao
Amr Abdelraouf
Kyungtae Han
Rohit Gupta
Ziran Wang
27
11
0
13 May 2023
M
2
^2
2
DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Yunsheng Ma
Liangqi Yuan
Amr Abdelraouf
Kyungtae Han
Rohit Gupta
Zihao Li
Ziran Wang
94
9
0
13 May 2023
OneCAD: One Classifier for All image Datasets using multimodal learning
S. Wadekar
Eugenio Culurciello
22
0
0
11 May 2023
A Survey on the Robustness of Computer Vision Models against Common Corruptions
Shunxin Wang
Raymond N. J. Veldhuis
Christoph Brune
N. Strisciuglio
OOD
VLM
21
11
0
10 May 2023
Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
Théophile Cabannes
Shreya Ghosh
Raphaël Marinier
Tom Gedeon
Alexandre M. Bayen
Munawar Hayat
68
21
0
03 May 2023
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
17
4
0
28 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
25
35
0
20 Apr 2023
Previous
1
2
3
4
5
6
7
8
Next