ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.00072
  4. Cited By
Data Movement Is All You Need: A Case Study on Optimizing Transformers

Data Movement Is All You Need: A Case Study on Optimizing Transformers

30 June 2020
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
ArXivPDFHTML

Papers citing "Data Movement Is All You Need: A Case Study on Optimizing Transformers"

30 / 30 papers shown
Title
Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression
Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression
Samuel J. Kaufman
René Just
Rastislav Bodik
17
0
0
03 May 2025
Nonlinear Computation with Linear Optics via Source-Position Encoding
Nonlinear Computation with Linear Optics via Source-Position Encoding
N. Richardson
C. Bosch
R. P. Adams
37
0
0
29 Apr 2025
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network
  Acceleration
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
32
1
0
03 Nov 2024
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Rya Sanovar
Srikant Bharadwaj
Renée St. Amant
Victor Rühle
Saravan Rajmohan
49
6
0
17 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and
  Composition of Experts
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
39
12
0
13 May 2024
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
47
16
0
15 Oct 2023
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
Mikhail Khalilov
Marcin Chrapek
Siyuan Shen
Alessandro Vezzu
Thomas Emanuel Benz
Salvatore Di Girolamo
Timo Schneider
Daniele Di Sensi
Luca Benini
Torsten Hoefler
30
6
0
07 Sep 2023
Bridging Control-Centric and Data-Centric Optimization
Bridging Control-Centric and Data-Centric Optimization
Tal Ben-Nun
Berke Ates
A. Calotoiu
Torsten Hoefler
23
7
0
01 Jun 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurélien Lucchi
Thomas Hofmann
32
53
0
25 May 2023
STen: Productive and Efficient Sparsity in PyTorch
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
30
4
0
15 Apr 2023
Operator Fusion in XLA: Analysis and Evaluation
Operator Fusion in XLA: Analysis and Evaluation
Danielle Snider
Ruofan Liang
14
4
0
30 Jan 2023
Myths and Legends in High-Performance Computing
Myths and Legends in High-Performance Computing
Satoshi Matsuoka
Jens Domke
M. Wahib
Aleksandr Drozd
Torsten Hoefler
20
14
0
06 Jan 2023
Pex: Memory-efficient Microcontroller Deep Learning through Partial
  Execution
Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution
Edgar Liberis
Nicholas D. Lane
13
3
0
30 Nov 2022
Spatial Mixture-of-Experts
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
24
9
0
24 Nov 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
18
20
0
03 Sep 2022
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
Jou-An Chen
Wei Niu
Bin Ren
Yanzhi Wang
Xipeng Shen
21
24
0
29 Aug 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models
  at Unprecedented Scale
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
29
334
0
30 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
16
25
0
17 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
56
2,020
0
27 May 2022
C-NMT: A Collaborative Inference Framework for Neural Machine
  Translation
C-NMT: A Collaborative Inference Framework for Neural Machine Translation
Yukai Chen
R. Chiaro
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
16
0
0
08 Apr 2022
DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for
  Layer Fusion in DNN Accelerators
DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators
Sheng-Chun Kao
Xiaoyu Huang
T. Krishna
AI4CE
33
9
0
26 Jan 2022
Lifting C Semantics for Dataflow Optimization
Lifting C Semantics for Dataflow Optimization
A. Calotoiu
Tal Ben-Nun
Grzegorz Kwa'sniewski
Johannes de Fine Licht
Timo Schneider
Philipp Schaad
Torsten Hoefler
11
6
0
22 Dec 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
30
57
0
13 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
47
94
0
01 Jul 2021
Improving the Efficiency of Transformers for Resource-Constrained
  Devices
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
33
20
0
30 Jun 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,950
0
20 Apr 2018
Geometric deep learning: going beyond Euclidean data
Geometric deep learning: going beyond Euclidean data
M. Bronstein
Joan Bruna
Yann LeCun
Arthur Szlam
P. Vandergheynst
GNN
238
3,234
0
24 Nov 2016
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
214
7,923
0
17 Aug 2015
1