ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01526
  4. Cited By
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

2 December 2021
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "MViTv2: Improved Multiscale Vision Transformers for Classification and Detection"

50 / 395 papers shown
Title
MambaOut: Do We Really Need Mamba for Vision?
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu
Xinchao Wang
Mamba
37
46
0
13 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
49
513
0
25 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
30
37
0
24 Apr 2024
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Abhishek Aich
Yumin Suh
S. Schulter
Manmohan Chandraker
51
0
0
23 Apr 2024
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature
  Processing
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Yuang Liu
Zhiheng Qiu
Xiaokai Qin
ViT
25
0
0
20 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision
  Transformers via Masked Image Modeling Pre-Training
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
37
1
0
18 Apr 2024
Simultaneous Detection and Interaction Reasoning for Object-Centric
  Action Recognition
Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
Xunsong Li
Pengzhan Sun
Yangcen Liu
Lixin Duan
Wen Li
30
3
0
18 Apr 2024
GeoAI Reproducibility and Replicability: a computational and spatial
  perspective
GeoAI Reproducibility and Replicability: a computational and spatial perspective
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Peter Kedron
AI4CE
18
5
0
15 Apr 2024
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
Otto Brookes
Majid Mirmehdi
H. Kühl
T. Burghardt
22
3
0
13 Apr 2024
X-VARS: Introducing Explainability in Football Refereeing with
  Multi-Modal Large Language Model
X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model
Jan Held
Hani Itani
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
25
16
0
07 Apr 2024
Learning Correlation Structures for Vision Transformers
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
24
7
0
05 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
28
24
0
02 Apr 2024
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause
  Pair Extraction as Sequence Labelling Task
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task
Suyash Vardhan Mathur
Akshett Rai Jindal
Hardik Mittal
Manish Shrivastava
23
1
0
02 Apr 2024
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon
Jinhyun Jang
Jin-Hwa Kim
Kwonyoung Kim
Kwanghoon Sohn
21
1
0
01 Apr 2024
Slightly Shift New Classes to Remember Old Classes for Video
  Class-Incremental Learning
Slightly Shift New Classes to Remember Old Classes for Video Class-Incremental Learning
Jian Jiao
Yu Dai
Hefei Mei
Heqian Qiu
Chuanyang Gong
Shiyuan Tang
Xinpeng Hao
Hongliang Li
CLL
VLM
28
0
0
01 Apr 2024
Benchmarking Object Detectors with COCO: A New Path Forward
Benchmarking Object Detectors with COCO: A New Path Forward
Shweta Singh
Aayan Yadav
Jitesh Jain
Humphrey Shi
Justin Johnson
Karan Desai
19
5
0
27 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and
  Time-Series Analysis
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
43
2
0
26 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
69
14
0
26 Mar 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
26
86
0
26 Mar 2024
Activity-Biometrics: Person Identification from Daily Activities
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad
Y. S. Rawat
17
3
0
26 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
27
1
0
24 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
31
2
0
24 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul M. Chilimbi
VLM
AI4TS
43
4
0
21 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
Don't Judge by the Look: Towards Motion Coherent Video Representation
Don't Judge by the Look: Towards Motion Coherent Video Representation
Yitian Zhang
Yue Bai
Huan Wang
Yizhou Wang
Yun Fu
25
0
0
14 Mar 2024
Pig aggression classification using CNN, Transformers and Recurrent
  Networks
Pig aggression classification using CNN, Transformers and Recurrent Networks
Junior Silva Souza
Eduardo Bedin
G. Higa
Newton Loebens
H. Pistori
19
0
0
13 Mar 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency
  Prediction
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Jun Xiong
Peng Zhang
Tao You
Chuanyue Li
Wei Huang
Yufei Zha
DiffM
16
5
0
02 Mar 2024
FViT: A Focal Vision Transformer with Gabor Filter
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
47
4
0
17 Feb 2024
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised
  Generic Event Boundary Detection
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
Sourabh Vasant Gothe
Vibhav Agarwal
Sourav Ghosh
Jayesh Rajkumar Vachhani
Pranay Kashyap
Barath Raj Kandur
20
2
0
15 Feb 2024
Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph
  Products
Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products
Guy Bar-Shalom
Beatrice Bevilacqua
Haggai Maron
AI4CE
20
6
0
13 Feb 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
78
56
0
08 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
92
22
0
08 Feb 2024
SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in
  Panchromatic Satellite Images
SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images
Pengming Feng
Mingjie Xie
Hongning Liu
Xuanjia Zhao
Guangjun He
Xueliang Zhang
Jian Guan
17
1
0
06 Feb 2024
SAM-based instance segmentation models for the automation of structural
  damage detection
SAM-based instance segmentation models for the automation of structural damage detection
Zehao Ye
Lucy Lovell
A. Faramarzi
Jelena Ninić
16
12
0
27 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
  Modalities
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
16
7
0
25 Jan 2024
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour
  Recognition
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition
Otto Brookes
Majid Mirmehdi
Colleen Stephens
Samuel Angedakin
Katherine Corogenes
...
Klaus Zuberbühler
Christophe Boesch
M. Arandjelovic
H. Kühl
T. Burghardt
22
12
0
24 Jan 2024
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang
Kaihan Li
Di You
Yichong Chen
Arvin Lin
Siying Liu
Xiaohui Li
Julie A. McCann
11
6
0
24 Jan 2024
UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain
  Adaptation
UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation
Hengjia Li
Yang Liu
Yuqi Lin
Zhanwei Zhang
Yibo Zhao
...
Tu Zheng
Zheng Yang
Yuchun Jiang
Boxi Wu
Deng Cai
DiffM
18
0
0
23 Jan 2024
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action
  Recognition
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Boyuan Jiang
Jun Chen
Jianbiao Mei
Xingxing Zuo
Guang Dai
Jingdong Wang
Yong-Jin Liu
VLM
26
3
0
22 Jan 2024
Segment Anything Model Can Not Segment Anything: Assessing AI Foundation
  Model's Generalizability in Permafrost Mapping
Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Yezhou Yang
Hyunho Lee
...
Brendan M. Rogers
S. Arundel
Matthew B. Jones
Kenton McHenry
Patricia Solis
VLM
34
13
0
16 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
17
5
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for
  Audio-Video Classification
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
19
4
0
08 Jan 2024
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN
  Ticket
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
Zhaokun Zhou
Kaiwei Che
Wei Fang
Keyu Tian
Yuesheng Zhu
Shuicheng Yan
Yonghong Tian
Liuliang Yuan
ViT
31
27
0
04 Jan 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
16
6
0
03 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
SVFAP: Self-supervised Video Facial Affect Perceiver
Licai Sun
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
38
14
0
31 Dec 2023
Multiscale Vision Transformers meet Bipartite Matching for efficient
  single-stage Action Localization
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
39
2
0
29 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
156
895
0
21 Dec 2023
Video Recognition in Portrait Mode
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
23
3
0
21 Dec 2023
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Xiao Wang
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
76
3
0
18 Dec 2023
Tokenize Anything via Prompting
Tokenize Anything via Prompting
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
18
22
0
14 Dec 2023
Previous
12345678
Next