MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
Representation Learning

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

24 November 2021

David Junhao Zhang

Shashwat Chandra

Yu Qiao

Mike Zheng Shou

Papers citing "MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning"

14 / 14 papers shown

Title
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization Ling Xing Hongyu Qu Rui Yan Xiangbo Shu Jinhui Tang 42 0 0 12 Sep 2024
SpiralMLP: A Lightweight Vision MLP Architecture Haojie Mu Burhan Ul Tayyab Nicholas Chua 32 0 0 31 Mar 2024
A Simple Framework for Multi-mode Spatial-Temporal Data Modeling Zihang Liu Le Yu T. Zhu Lei Sun AI4TS 19 0 0 22 Aug 2023
M $^3$ Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition Hao Tang Jun Liu Shuanglin Yan Rui Yan Zechao Li Jinhui Tang 19 36 0 06 Aug 2023
UniFormer: Unifying Convolution and Self-attention for Visual Recognition Kunchang Li Yali Wang Junhao Zhang Peng Gao Guanglu Song Yu Liu Hongsheng Li Yu Qiao ViT 142 360 0 24 Jan 2022
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 239 2,554 0 04 May 2021
VidTr: Video Transformer Without Convolutions Yanyi Zhang Xinyu Li Chunhui Liu Bing Shuai Yi Zhu Biagio Brattoli Hao Chen I. Marsic Joseph Tighe ViT 124 193 0 23 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao ViT 263 3,538 0 24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 278 1,939 0 09 Feb 2021
Video Transformer Network Daniel Neimark Omri Bar Maya Zohar Dotan Asselmann ViT 193 375 0 01 Feb 2021
Bottleneck Transformers for Visual Recognition A. Srinivas Tsung-Yi Lin Niki Parmar Jonathon Shlens Pieter Abbeel Ashish Vaswani SLR 265 955 0 27 Jan 2021
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 948 20,214 0 17 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 261 10,106 0 16 Nov 2016
Semantic Understanding of Scenes through the ADE20K Dataset Bolei Zhou Hang Zhao Xavier Puig Tete Xiao Sanja Fidler Adela Barriuso Antonio Torralba SSeg 243 1,817 0 18 Aug 2016