ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.10619
  4. Cited By
Scalable Vision Transformers with Hierarchical Pooling

Scalable Vision Transformers with Hierarchical Pooling

19 March 2021
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
    ViT
ArXivPDFHTML

Papers citing "Scalable Vision Transformers with Hierarchical Pooling"

50 / 78 papers shown
Title
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Guillaume Jeanneret
Loïc Simon
F. Jurie
ViT
44
0
0
24 Feb 2025
Self-Satisfied: An end-to-end framework for SAT generation and
  prediction
Self-Satisfied: An end-to-end framework for SAT generation and prediction
Christopher R. Serrano
Jonathan Gallagher
Kenji Yamada
Alexei Kopylov
Michael A. Warren
24
0
0
18 Oct 2024
ED-ViT: Splitting Vision Transformer for Distributed Inference on Edge
  Devices
ED-ViT: Splitting Vision Transformer for Distributed Inference on Edge Devices
Xiang Liu
Yijun Song
Xia Li
Yifei Sun
Huiying Lan
Zemin Liu
Linshan Jiang
Jialin Li
17
1
0
15 Oct 2024
Hybrid Transformer for Early Alzheimer's Detection: Integration of
  Handwriting-Based 2D Images and 1D Signal Features
Hybrid Transformer for Early Alzheimer's Detection: Integration of Handwriting-Based 2D Images and 1D Signal Features
Changqing Gong
Huafeng Qin
M. El-Yacoubi
16
0
0
14 Oct 2024
ViTGuard: Attention-aware Detection against Adversarial Examples for
  Vision Transformer
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer
Shihua Sun
Kenechukwu Nwodo
Shridatt Sugrim
Angelos Stavrou
Haining Wang
AAML
18
1
0
20 Sep 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
30
1
0
30 Aug 2024
HLogformer: A Hierarchical Transformer for Representing Log Data
HLogformer: A Hierarchical Transformer for Representing Log Data
Zhichao Hou
Mina Ghashami
Mikhail Kuznetsov
MohamadAli Torkamani
18
0
0
29 Aug 2024
Neural-based Video Compression on Solar Dynamics Observatory Images
Neural-based Video Compression on Solar Dynamics Observatory Images
Atefeh Khoshkhahtinat
Ali Zafari
P. Mehta
Nasser M. Nasrabadi
Barbara J. Thompson
M. Kirk
D. D. Silva
39
0
0
12 Jul 2024
Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic
  Segmentation
Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation
Xuming Zhang
Naoto Yokoya
Xingfa Gu
Qingjiu Tian
Lorenzo Bruzzone
34
1
0
25 Jun 2024
TraceNet: Segment one thing efficiently
TraceNet: Segment one thing efficiently
Mingyuan Wu
Zichuan Liu
Haozhen Zheng
Hongpeng Guo
Bo Chen
Xin Lu
Klara Nahrstedt
31
0
0
21 Jun 2024
Learning 1D Causal Visual Representation with De-focus Attention
  Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Yu Qiao
Jie Zhou
Jifeng Dai
60
1
0
06 Jun 2024
You Only Need Less Attention at Each Stage in Vision Transformers
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang
Hanpeng Liu
Stephen Lin
Kun He
53
5
0
01 Jun 2024
Data-independent Module-aware Pruning for Hierarchical Vision
  Transformers
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
Yang He
Joey Tianyi Zhou
ViT
40
3
0
21 Apr 2024
The Need for Speed: Pruning Transformers with One Recipe
The Need for Speed: Pruning Transformers with One Recipe
Samir Khaki
Konstantinos N. Plataniotis
26
9
0
26 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
31
2
0
24 Mar 2024
Once for Both: Single Stage of Importance and Sparsity Search for Vision
  Transformer Compression
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Hancheng Ye
Chong Yu
Peng Ye
Renqiu Xia
Yansong Tang
Jiwen Lu
Tao Chen
Bo-Wen Zhang
46
3
0
23 Mar 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
27
0
05 Feb 2024
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
Ji Liu
Dehua Tang
Yuanxian Huang
Li Lyna Zhang
Xiaocheng Zeng
...
Jinzhang Peng
Yu-Chiang Frank Wang
Fan Jiang
Lu Tian
Ashish Sirasao
ViT
22
7
0
12 Jan 2024
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
Wentao Zhu
15
2
0
03 Jan 2024
GLIMPSE: Generalized Local Imaging with MLPs
GLIMPSE: Generalized Local Imaging with MLPs
AmirEhsan Khorashadizadeh
Valentin Debarnot
Tianlin Liu
Ivan Dokmanić
28
0
0
01 Jan 2024
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
59
15
0
27 Dec 2023
Explainability of Vision Transformers: A Comprehensive Review and New
  Perspectives
Explainability of Vision Transformers: A Comprehensive Review and New Perspectives
Rojina Kashefi
Leili Barekatain
Mohammad Sabokrou
Fatemeh Aghaeipoor
ViT
27
9
0
12 Nov 2023
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision
  Transformers
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
Xuwei Xu
Sen Wang
Yudong Chen
Jiajun Liu
ViT
13
1
0
09 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
35
12
0
08 Oct 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan S. Kankanhalli
VLM
14
3
0
28 Sep 2023
HM-Conformer: A Conformer-based audio deepfake detection system with
  hierarchical pooling and multi-level classification token aggregation methods
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Hyun-Seo Shin
Ju-Sung Heo
Ju-ho Kim
Chanmann Lim
Wonbin Kim
Ha-Jin Yu
22
5
0
15 Sep 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision
  Transformers
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson
Yin Li
M. Gupta
ViT
14
8
0
25 Aug 2023
CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing
CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing
Jianwei Cui
David A. Araujo
Suman Saha
Md Faisal Kabir
BDL
20
0
0
25 Aug 2023
Hierarchical Visual Primitive Experts for Compositional Zero-Shot
  Learning
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
H. Kim
Jiyoung Lee
S. Park
K. Sohn
CoGe
19
11
0
08 Aug 2023
MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic
  Segmentation
MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
Lian Xu
Bennamoun
F. Boussaïd
Hamid Laga
Wanli Ouyang
Dan Xu
ViT
30
15
0
06 Aug 2023
Distilling Self-Supervised Vision Transformers for Weakly-Supervised
  Few-Shot Classification & Segmentation
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
Dahyun Kang
Piotr Koniusz
Minsu Cho
Naila Murray
VLM
ViT
18
24
0
07 Jul 2023
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text
  Removal with SegMIM Pretraining
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Dezhi Peng
Chongyu Liu
Yuliang Liu
Lianwen Jin
DiffM
19
9
0
21 Jun 2023
SegT: A Novel Separated Edge-guidance Transformer Network for Polyp
  Segmentation
SegT: A Novel Separated Edge-guidance Transformer Network for Polyp Segmentation
Feiyu Chen
Haiping Ma
Weijia Zhang
ViT
MedIm
22
7
0
19 Jun 2023
Efficient Vision Transformer for Human Pose Estimation via Patch
  Selection
Efficient Vision Transformer for Human Pose Estimation via Patch Selection
K. A. Kinfu
René Vidal
ViT
18
4
0
07 Jun 2023
COMCAT: Towards Efficient Compression and Customization of
  Attention-Based Vision Models
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Jinqi Xiao
Miao Yin
Yu Gong
Xiao Zang
Jian Ren
Bo Yuan
VLM
ViT
30
9
0
26 May 2023
Joint Token Pruning and Squeezing Towards More Aggressive Compression of
  Vision Transformers
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
Siyuan Wei
Tianzhu Ye
Shen Zhang
Yao Tang
Jiajun Liang
ViT
4
65
0
21 Apr 2023
Visual Dependency Transformers: Dependency Tree Emerges from Reversed
  Attention
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
Mingyu Ding
Yikang Shen
Lijie Fan
Zhenfang Chen
Z. Chen
Ping Luo
J. Tenenbaum
Chuang Gan
ViT
67
14
0
06 Apr 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
17
93
0
25 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training
  Efficiency
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
21
3
0
21 Mar 2023
X-Pruner: eXplainable Pruning for Vision Transformers
X-Pruner: eXplainable Pruning for Vision Transformers
Lu Yu
Wei Xiang
ViT
9
48
0
08 Mar 2023
Stitchable Neural Networks
Stitchable Neural Networks
Zizheng Pan
Jianfei Cai
Bohan Zhuang
37
22
0
13 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
19
56
0
12 Feb 2023
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
  Downstream Tasks
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks
Haiyan Zhao
Tianyi Zhou
Guodong Long
Jing Jiang
Chengqi Zhang
25
0
0
27 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
8
25
0
05 Jan 2023
Regularized Optimal Transport Layers for Generalized Global Pooling
  Operations
Regularized Optimal Transport Layers for Generalized Global Pooling Operations
Hongteng Xu
Minjie Cheng
14
4
0
13 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
22
1
0
08 Dec 2022
Degenerate Swin to Win: Plain Window-based Transformer without
  Sophisticated Operations
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
31
5
0
25 Nov 2022
Token Transformer: Can class token help window-based transformer build
  better long-range interactions?
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
14
0
0
11 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language
  Models by Disentangling the [MASK] Token
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Baohao Liao
David Thulke
Sanjika Hewavitharana
Hermann Ney
Christof Monz
17
9
0
09 Nov 2022
Data Level Lottery Ticket Hypothesis for Vision Transformers
Data Level Lottery Ticket Hypothesis for Vision Transformers
Xuan Shen
Zhenglun Kong
Minghai Qin
Peiyan Dong
Geng Yuan
Xin Meng
Hao Tang
Xiaolong Ma
Yanzhi Wang
30
6
0
02 Nov 2022
12
Next