ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01526
  4. Cited By
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

2 December 2021
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "MViTv2: Improved Multiscale Vision Transformers for Classification and Detection"

50 / 395 papers shown
Title
Masked Differential Privacy
Masked Differential Privacy
David Schneider
Sina Sajadmanesh
Vikash Sehwag
Saquib Sarfraz
Rainer Stiefelhagen
Lingjuan Lyu
Vivek Sharma
28
1
0
22 Oct 2024
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Hao Wu
Donglin Bai
Shiqi Jiang
Qianxi Zhang
Y. Yang
Ting Cao
Fengyuan Xu
Yunxin Liu
Fengyuan Xu
42
0
0
19 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
29
13
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
61
3
0
14 Oct 2024
MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly
  Detection in Surveillance Videos
MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly Detection in Surveillance Videos
Yiling Zhang
Erkut Akdag
Egor Bondarev
Peter H. N. de With
AI4TS
ViT
19
1
0
08 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A Survey
Xiaorui Sun
J. Liu
H. Shen
Xiaofeng Zhu
Ping Hu
VLM
43
4
0
07 Oct 2024
System 2 Reasoning Capabilities Are Nigh
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
33
0
0
04 Oct 2024
Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations
  at Test-Time
Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
Chiao-An Yang
Ziwei Liu
Raymond A. Yeh
20
1
0
01 Oct 2024
Loose Social-Interaction Recognition in Real-world Therapy Scenarios
Loose Social-Interaction Recognition in Real-world Therapy Scenarios
Abid Ali
Rui Dai
Ashish Marisetty
Guillaume Astruc
Monique Thonnat
J. Odobez
Susanne Thümmler
Francois Bremond
29
1
0
30 Sep 2024
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for
  Treatment of Hands after Surviving Stroke
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke
Wiktor Mucha
Kentaro Tanaka
M. Kampel
22
0
0
30 Sep 2024
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Xinrui Zhou
Yuhao Huang
Haoran Dou
Shijing Chen
Ao Chang
...
Jie Jessie Ren
Ruobing Huang
Jun Cheng
Wufeng Xue
Dong Ni
MedIm
57
0
0
25 Sep 2024
SoccerNet 2024 Challenges Results
SoccerNet 2024 Challenges Results
A. Cioppa
Silvio Giancola
Vladimir Somers
Victor Joos
Floriane Magera
...
Yuan Li
Yuting Yang
Yuxuan Xiao
Zehua Cheng
Zhihao Li
21
2
0
16 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
27
4
0
06 Sep 2024
MVTN: A Multiscale Video Transformer Network for Hand Gesture
  Recognition
MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
ViT
21
1
0
05 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
14
0
0
02 Sep 2024
Geospatial foundation models for image analysis: evaluating and
  enhancing NASA-IBM Prithvi's domain adaptability
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
Chia-Yu Hsu
Wenwen Li
Sizhe Wang
29
11
0
31 Aug 2024
DEAR: Depth-Enhanced Action Recognition
DEAR: Depth-Enhanced Action Recognition
Sadegh Rahmaniboldaji
Filip Rybansky
Quoc Vuong
Frank Guerin
Andrew Gilbert
18
0
0
28 Aug 2024
A Review of Transformer-Based Models for Computer Vision Tasks:
  Capturing Global Context and Spatial Relationships
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Gracile Astlin Pereira
Muhammad Hussain
ViT
25
7
0
27 Aug 2024
CathAction: A Benchmark for Endovascular Intervention Understanding
CathAction: A Benchmark for Endovascular Intervention Understanding
Baoru Huang
Tuan Vo
Chayun Kongtongvattana
G. Dagnino
Dennis Kundrat
...
Francisco Vasconcelos
Danail Stoyanov
Daniel Elson
Ferdinando Rodriguez y Baena
Anh Nguyen
29
2
0
23 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation
  Models
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
41
1
0
23 Aug 2024
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for
  Saliency Prediction with Diffusion
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang
Gen Zhan
Li Yang
Yiting Liao
Chenliang Xu
VGen
DiffM
LRM
29
8
0
21 Aug 2024
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved
  Denoising Training
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training
Xi Chen
Qian Qiao
Jun Gao
Tianxiang Wu
Rahul Bhadani
...
Ziqiang Cao
Larry Head
Yue Zhang
Jielei Zhang
Huyang Sun
DiffM
21
5
0
01 Aug 2024
Exploring The Neural Burden In Pruned Models: An Insight Inspired By
  Neuroscience
Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience
Zeyu Wang
Weichen Dai
Xiangyu Zhou
Ji Qi
Yi Zhou
36
0
0
23 Jul 2024
Towards AI-Powered Video Assistant Referee System (VARS) for Association
  Football
Towards AI-Powered Video Assistant Referee System (VARS) for Association Football
Jan Held
A. Cioppa
Silvio Giancola
Abdullah Hamdi
Christel Devue
Bernard Ghanem
Marc Van Droogenbroeck
24
4
0
17 Jul 2024
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an
  Efficient Alternative to Attention in ViTs
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng
Zeyi Xu
Fanghui Xue
Biao Yang
Jiancheng Lyu
Shuai Zhang
Y. Qi
Jack Xin
39
0
0
16 Jul 2024
Human-Centric Transformer for Domain Adaptive Action Recognition
Human-Centric Transformer for Domain Adaptive Action Recognition
Kun-Yu Lin
Jiaming Zhou
Wei-Shi Zheng
26
6
0
15 Jul 2024
HAFormer: Unleashing the Power of Hierarchy-Aware Features for
  Lightweight Semantic Segmentation
HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation
Guoan Xu
Wenjing Jia
Tao Wu
Ligeng Chen
Guangwei Gao
ViT
22
9
0
10 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
44
1
0
09 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
19
4
0
03 Jul 2024
Semantically Guided Representation Learning For Action Anticipation
Semantically Guided Representation Learning For Action Anticipation
Anxhelo Diko
D. Avola
Bardh Prenkaj
Federico Fontana
Luigi Cinque
AI4TS
41
2
0
02 Jul 2024
Fibottention: Inceptive Visual Representation Learning with Diverse
  Attention Across Heads
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
31
1
0
27 Jun 2024
Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion
  Approach for Event Stream Recognition
Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition
Lan Chen
Dong Li
Xiao Wang
Pengpeng Shao
Wei Zhang
Yaowei Wang
Yonghong Tian
Jin Tang
68
2
0
27 Jun 2024
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
Mingshu Zhao
Yi Luo
Yong Ouyang
30
2
0
23 Jun 2024
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to
  Remote Physiological Measurement
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement
Hao Wang
E. Ahn
Jinman Kim
28
0
0
19 Jun 2024
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow
  Understanding
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Ming Hu
Peng Xia
Lin Wang
Siyuan Yan
Feilong Tang
...
Xuelian Cheng
Jun Cheng
Chi Liu
Kaijing Zhou
Zongyuan Ge
33
17
0
11 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
29
0
0
11 Jun 2024
Video-based Exercise Classification and Activated Muscle Group
  Prediction with Hybrid X3D-SlowFast Network
Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network
Manvik Pasula
Pramit Saha
18
0
0
10 Jun 2024
A Comparative Survey of Vision Transformers for Feature Extraction in
  Texture Analysis
A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis
Leonardo F. S. Scabini
Andre Sacilotti
Kallil M. C. Zielinski
L. C. Ribas
B. De Baets
Odemir M. Bruno
ViT
25
2
0
10 Jun 2024
SMART: Scene-motion-aware human action recognition framework for mental
  disorder group
SMART: Scene-motion-aware human action recognition framework for mental disorder group
Zengyuan Lai
Jiarui Yang
Songpengcheng Xia
Qi Wu
Zhen Sun
Wenxian Yu
Ling Pei
35
2
0
07 Jun 2024
SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible
  Neural Networks
SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks
Yi Pan
Jun-Jie Huang
Zihan Chen
Wentao Zhao
Ziyue Wang
20
0
0
04 Jun 2024
Use of a Multiscale Vision Transformer to predict Nursing Activities
  Score from Low Resolution Thermal Videos in an Intensive Care Unit
Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit
Isaac YL Lee
Thanh Nguyen-Duc
Ryo Ueno
Jesse Smith
P. Chan
16
0
0
30 May 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any
  Resolution
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
18
4
0
28 May 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
29
48
0
26 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
32
8
0
25 May 2024
SIAVC: Semi-Supervised Framework for Industrial Accident Video
  Classification
SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification
Zuoyong Li
Qinghua Lin
Haoyi Fan
Tiesong Zhao
David Zhang
21
0
0
23 May 2024
Counterfactual Gradients-based Quantification of Prediction Trust in
  Neural Networks
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
M. Prabhushankar
Ghassan AlRegib
UQCV
27
0
0
22 May 2024
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for
  Vision Transformer
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
39
3
0
22 May 2024
Vision Transformer with Sparse Scan Prior
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
36
4
0
22 May 2024
Generative Artificial Intelligence: A Systematic Review and Applications
Generative Artificial Intelligence: A Systematic Review and Applications
S. S. Sengar
Affan Bin Hasan
Sanjay Kumar
Fiona Carroll
MedIm
23
46
0
17 May 2024
No Time to Waste: Squeeze Time into Channel for Mobile Video
  Understanding
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Yingjie Zhai
Wenshuo Li
Yehui Tang
Xinghao Chen
Yunhe Wang
ViT
22
0
0
14 May 2024
Previous
12345678
Next