ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06732
  4. Cited By
Efficient Transformers: A Survey

Efficient Transformers: A Survey

14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
    VLM
ArXivPDFHTML

Papers citing "Efficient Transformers: A Survey"

50 / 633 papers shown
Title
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
43
25
0
12 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
46
73
0
08 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
50
45
0
08 Apr 2024
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
32
23
0
06 Apr 2024
BiSHop: Bi-Directional Cellular Learning for Tabular Data with
  Generalized Sparse Modern Hopfield Model
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
Chenwei Xu
Yu-Chao Huang
Jerry Yao-Chieh Hu
Weijian Li
Ammar Gilani
H. Goan
Han Liu
37
19
0
04 Apr 2024
Mixture-of-Depths: Dynamically allocating compute in transformer-based
  language models
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
David Raposo
Sam Ritter
Blake A. Richards
Timothy Lillicrap
Peter C. Humphreys
Adam Santoro
MoE
11
68
0
02 Apr 2024
Scene Adaptive Sparse Transformer for Event-based Object Detection
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng
Hebei Li
Yueyi Zhang
Xiaoyan Sun
Feng Wu
ViT
30
11
0
02 Apr 2024
DE-HNN: An effective neural model for Circuit Netlist representation
DE-HNN: An effective neural model for Circuit Netlist representation
Zhishang Luo
Truong Son-Hy
Puoya Tabaghi
Donghyeon Koh
Michael Defferrard
Elahe Rezaei
Ryan Carey
William Rhett Davis
Rajeev Jain
Yusu Wang
14
5
0
30 Mar 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal
  Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
29
104
0
22 Mar 2024
Simple Hack for Transformers against Heavy Long-Text Classification on a
  Time- and Memory-Limited GPU Service
Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service
Mirza Alim Mutasodirin
Radityo Eko Prasojo
Achmad F. Abka
Hanif Rasyidi
VLM
26
0
0
19 Mar 2024
tsGT: Stochastic Time Series Modeling With Transformer
tsGT: Stochastic Time Series Modeling With Transformer
Lukasz Kuciñski
Witold Drzewakowski
Mateusz Olko
Piotr Kozakowski
Lukasz Maziarka
Marta Emilia Nowakowska
Lukasz Kaiser
Piotr Milo's
33
1
0
08 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to
  Linear (and Back) using Taylor-Softmax
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
51
3
0
05 Mar 2024
NiNformer: A Network in Network Transformer with Token Mixing Generated
  Gating Function
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah
Tarkan Aydin
25
0
0
04 Mar 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao
Hailin Zhang
Qinhan Yu
Zhengren Wang
Yunteng Geng
Fangcheng Fu
Ling Yang
Wentao Zhang
Jie Jiang
Bin Cui
3DV
110
215
0
29 Feb 2024
Deep learning for 3D human pose estimation and mesh recovery: A survey
Deep learning for 3D human pose estimation and mesh recovery: A survey
Yang Liu
Changzhen Qiu
Zhiyong Zhang
3DH
30
5
0
29 Feb 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
31
6
0
28 Feb 2024
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and
  Understanding -- A Survey
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey
Xi Fang
Weijie Xu
Fiona Anting Tan
Jiani Zhang
Ziqing Hu
Yanjun Qi
Scott Nickleach
Diego Socolinsky
Srinivasan H. Sengamedu
Christos Faloutsos
LMTD
ALM
30
63
0
27 Feb 2024
Latent Attention for Linear Time Transformers
Latent Attention for Linear Time Transformers
Rares Dolga
Marius Cobzarenco
David Barber
20
1
0
27 Feb 2024
Quantum linear algebra is all you need for Transformer architectures
Quantum linear algebra is all you need for Transformer architectures
Naixu Guo
Zhan Yu
Matthew Choi
Aman Agrawal
Kouhei Nakaji
Alán Aspuru-Guzik
P. Rebentrost
AI4CE
28
14
0
26 Feb 2024
Transformers are Expressive, But Are They Expressive Enough for
  Regression?
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
23
3
0
23 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
37
32
0
21 Feb 2024
Structure-informed Positional Encoding for Music Generation
Structure-informed Positional Encoding for Music Generation
Manvi Agarwal
Changhong Wang
Gaël Richard
19
2
0
20 Feb 2024
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Kuang-Huei Lee
Xinyun Chen
Hiroki Furuta
John F. Canny
Ian S. Fischer
RALM
53
29
0
15 Feb 2024
Changes by Butterflies: Farsighted Forecasting with Group Reservoir
  Transformer
Changes by Butterflies: Farsighted Forecasting with Group Reservoir Transformer
Md Kowsher
Abdul Rafae Khan
Jia Xu
19
0
0
14 Feb 2024
Progressive Gradient Flow for Robust N:M Sparsity Training in
  Transformers
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
A. Bambhaniya
Amir Yazdanbakhsh
Suvinay Subramanian
Sheng-Chun Kao
Shivani Agrawal
Utku Evci
Tushar Krishna
41
16
0
07 Feb 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an
  Efficient Context Memory
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao
Pengle Zhang
Xu Han
Guangxuan Xiao
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Maosong Sun
LLMAG
39
33
0
07 Feb 2024
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Razvan-Gabriel Dumitru
Darius Peteleaza
Mihai Surdeanu
AI4TS
6
2
0
04 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
11
1
0
02 Feb 2024
Computation and Parameter Efficient Multi-Modal Fusion Transformer for
  Cued Speech Recognition
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Lei Liu
Li Liu
Haizhou Li
18
6
0
31 Jan 2024
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization
Sangwoo Cho
Kaiqiang Song
Chao Zhao
Xiaoyang Wang
Dong Yu
13
0
0
31 Jan 2024
Engineering A Large Language Model From Scratch
Engineering A Large Language Model From Scratch
Abiodun Finbarrs Oketunji
22
0
0
30 Jan 2024
Zero-Shot Reinforcement Learning via Function Encoders
Zero-Shot Reinforcement Learning via Function Encoders
Tyler Ingebrand
Amy Zhang
Ufuk Topcu
OffRL
22
2
0
30 Jan 2024
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
Wentao Zhu
13
2
0
03 Jan 2024
Large Language Models for Conducting Advanced Text Analytics Information
  Systems Research
Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel
Chi-Heng Yang
J. Hu
Hsinchun Chen
21
7
0
27 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
19
1
0
18 Dec 2023
Learning Long Sequences in Spiking Neural Networks
Learning Long Sequences in Spiking Neural Networks
Matei Ioan Stan
Oliver Rhodes
30
10
0
14 Dec 2023
MotherNet: Fast Training and Inference via Hyper-Network Transformers
MotherNet: Fast Training and Inference via Hyper-Network Transformers
Andreas Müller
Carlo Curino
Raghu Ramakrishnan
LMTD
33
10
0
14 Dec 2023
Spectral State Space Models
Spectral State Space Models
Naman Agarwal
Daniel Suo
Xinyi Chen
Elad Hazan
17
11
0
11 Dec 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing
  Large Language Models
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
29
41
0
10 Dec 2023
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
  Sequence Processing
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Aleksandar Terzić
Michael Hersche
G. Karunaratne
Zixiao Huang
Abu Sebastian
Abbas Rahimi
AI4TS
12
1
0
09 Dec 2023
SparQ Attention: Bandwidth-Efficient LLM Inference
SparQ Attention: Bandwidth-Efficient LLM Inference
Luka Ribar
Ivan Chelombiev
Luke Hudlass-Galley
Charlie Blake
Carlo Luschi
Douglas Orr
21
45
0
08 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
10
31
0
08 Dec 2023
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep
  Neural Networks
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
Suman Sapkota
Binod Bhattarai
29
0
0
30 Nov 2023
On the Long Range Abilities of Transformers
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
22
7
0
28 Nov 2023
On the Importance of Step-wise Embeddings for Heterogeneous Clinical
  Time-Series
On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series
Rita Kuznetsova
Alizée Pace
Manuel Burger
Hugo Yèche
Gunnar Rätsch
AI4TS
27
5
0
15 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
31
28
0
10 Nov 2023
Hiformer: Heterogeneous Feature Interactions Learning with Transformers
  for Recommender Systems
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
Huan Gui
Ruoxi Wang
Ke Yin
Long Jin
Maciej Kula
Taibai Xu
Lichan Hong
Ed H. Chi
38
2
0
10 Nov 2023
Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform
Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform
Daniele Giofré
Sneha Ghantasala
AILaw
24
0
0
09 Nov 2023
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
In Gim
Guojun Chen
Seung-seob Lee
Nikhil Sarda
Anurag Khandelwal
Lin Zhong
25
71
0
07 Nov 2023
Improving Machine Translation with Large Language Models: A Preliminary
  Study with Cooperative Decoding
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
21
10
0
06 Nov 2023
Previous
123456...111213
Next