ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09761
  4. Cited By
Efficient Transformers with Dynamic Token Pooling

Efficient Transformers with Dynamic Token Pooling

17 November 2022
Piotr Nawrot
J. Chorowski
Adrian Lañcucki
E. Ponti
ArXivPDFHTML

Papers citing "Efficient Transformers with Dynamic Token Pooling"

32 / 32 papers shown
Title
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
E. Ponti
134
0
0
25 Mar 2025
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
43
3
0
17 Mar 2025
Neural Attention Search
Neural Attention Search
Difan Deng
Marius Lindauer
88
0
0
21 Feb 2025
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
90
11
0
02 Dec 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
32
2
0
28 Oct 2024
Rethinking Token Reduction for State Space Models
Rethinking Token Reduction for State Space Models
Zheng Zhan
Yushu Wu
Zhenglun Kong
Changdi Yang
Yifan Gong
Xuan Shen
Xue Lin
Pu Zhao
Yanzhi Wang
Mamba
32
4
0
16 Oct 2024
Automatic Metrics in Natural Language Generation: A Survey of Current
  Evaluation Practices
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
43
3
0
17 Aug 2024
MAGNET: Improving the Multilingual Fairness of Language Models with
  Adaptive Gradient-Based Tokenization
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Valentin Hoffman
Tomasz Limisiewicz
Yulia Tsvetkov
Noah A. Smith
43
4
0
11 Jul 2024
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context
  Compression
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression
Jieneng Chen
Luoxin Ye
Ju He
Zhao-Yang Wang
Daniel Khashabi
Alan Yuille
VLM
27
5
0
28 Jun 2024
Understanding and Mitigating Tokenization Bias in Language Models
Understanding and Mitigating Tokenization Bias in Language Models
Buu Phan
Marton Havasi
Matthew Muckley
Karen Ullrich
44
3
0
24 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
45
8
0
04 Jun 2024
Efficient Point Transformer with Dynamic Token Aggregating for Point
  Cloud Processing
Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing
Dening Lu
Jun Zhou
Kyle
K. Gao
Linlin Xu
Jonathan Li
24
0
0
23 May 2024
3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene
  Segmentation
3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation
Dening Lu
Jun Zhou
K. Gao
Linlin Xu
Jonathan Li
3DPC
ViT
27
0
0
23 May 2024
Zero-Shot Tokenizer Transfer
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
E. Ponti
Ivan Vulić
VLM
44
9
0
13 May 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
32
3
0
22 Apr 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual
  Language Modeling
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
12
0
15 Mar 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
E. Ponti
33
50
0
14 Mar 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
35
17
0
20 Feb 2024
Toucan: Token-Aware Character Level Language Modeling
Toucan: Token-Aware Character Level Language Modeling
William Fleshman
Benjamin Van Durme
15
3
0
15 Nov 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
H. Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
45
24
0
04 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
26
15
0
28 Sep 2023
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style
  Models with Limited Resources
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources
Piotr Nawrot
AI4CE
17
5
0
05 Sep 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
15
41
0
12 Jul 2023
Hierarchical Attention Encoder Decoder
Hierarchical Attention Encoder Decoder
Asier Mujika
BDL
22
3
0
01 Jun 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLM
VLM
8
20
0
27 May 2023
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
Mattia Opper
Victor Prokhorov
N. Siddharth
BDL
24
2
0
09 May 2023
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing
  Important Tokens
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Zhanpeng Zeng
Cole Hawkins
Min-Fong Hong
Aston Zhang
Nikolaos Pappas
Vikas Singh
Shuai Zheng
19
6
0
07 May 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Bo-wen Li
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
17
54
0
11 Apr 2023
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
73
77
0
12 Jul 2021
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
579
0
12 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,460
0
23 Jan 2020
Surprisal-Driven Zoneout
Surprisal-Driven Zoneout
K. Rocki
Tomasz Kornuta
Tegan Maharaj
16
8
0
24 Oct 2016
1