Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.01433
Cited By
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
4 April 2023
N. Jouppi
George Kurian
Sheng R. Li
Peter C. Ma
R. Nagarajan
Lifeng Nai
Nishant Patil
Suvinay Subramanian
Andy Swing
Brian Towles
C. Young
Xiaoping Zhou
Zongwei Zhou
David A. Patterson
BDL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings"
50 / 122 papers shown
Title
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Y. Wang
Yuxuan Liu
Y. X. Wei
MoE
19
0
0
14 May 2025
Studying Small Language Models with Susceptibilities
Garrett Baker
George Wang
Jesse Hoogland
Daniel Murfet
AAML
73
1
0
25 Apr 2025
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
X. Zhang
Yaoyao Ding
Yang Hu
Gennady Pekhimenko
41
0
0
22 Apr 2025
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Mingyu Liang
Hiwot Tadese Kassa
Wenyin Fu
Brian Coutinho
Louis Feng
Christina Delimitrou
16
0
0
12 Apr 2025
Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training
Daiyaan Arfeen
Dheevatsa Mudigere
Ankit More
Bhargava Gopireddy
Ahmet Inci
G. R. Ganger
23
0
0
08 Apr 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
J. Kim
Jongse Park
57
0
0
24 Mar 2025
Explainable AI-Guided Efficient Approximate DNN Generation for Multi-Pod Systolic Arrays
Ayesha Siddique
Khurram Khalil
K. A. Hoque
32
0
0
20 Mar 2025
Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation
Ioannis Zarkadas
Amanda Tomlinson
Asaf Cidon
Baris Kasikci
Ofir Weisse
48
0
0
18 Mar 2025
Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs
Bowen Tan
Zheng Xu
Eric P. Xing
Zhiting Hu
Shanshan Wu
SyDa
85
0
0
16 Mar 2025
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs
Qizhe Wu
Huawen Liang
Yuchen Gui
Zhichen Zeng
Z. He
...
Letian Zhao
Zhaoxi Zeng
W. Yuan
Wei Wu
Xi Jin
36
0
0
08 Mar 2025
Robust Multi-Objective Preference Alignment with Online DPO
Raghav Gupta
Ryan Sullivan
Yunxuan Li
Samrat Phatale
Abhinav Rastogi
32
0
0
01 Mar 2025
Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs
Zhantong Zhu
Hongou Li
Wenjie Ren
Meng Wu
Le Ye
Ru Huang
Tianyu Jia
35
0
0
01 Mar 2025
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Pedram Bakhtiarifard
Pınar Tözün
Christian Igel
Raghavendra Selvan
37
0
0
27 Feb 2025
Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
Zhaoxian Wu
Quan Xian
Tayfun Gokmen
Omobayode Fagbohungbe
Tianyi Chen
91
0
0
17 Feb 2025
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang
Simon Guo
Simran Arora
Alex L. Zhang
William Hu
Christopher Ré
Azalia Mirhoseini
ALM
38
1
0
14 Feb 2025
Strassen Multisystolic Array Hardware Architectures
Trevor E. Pogue
N. Nicolici
71
0
0
14 Feb 2025
Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends
Ian Schneider
Hui Xu
Stephan Benecke
David Patterson
Keguo Huang
Parthasarathy Ranganathan
Cooper Elsworth
65
2
0
01 Feb 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
56
1
0
20 Jan 2025
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Guoyu Li
Shengyu Ye
C. L. P. Chen
Yang Wang
Fan Yang
Ting Cao
Cheng Liu
Mohamed M. Sabry
Mao Yang
MQ
78
0
0
18 Jan 2025
Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations
Trevor E. Pogue
N. Nicolici
54
0
0
15 Jan 2025
mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training
Xudong Liao
Yijun Sun
Han Tian
Xinchen Wan
Yilun Jin
...
Guyue Liu
Ying Zhang
Xiaofeng Ye
Yiming Zhang
Kai Chen
MoE
30
0
0
08 Jan 2025
Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation
Alireza Salemi
Cheng-rong Li
Mingyang Zhang
Qiaozhu Mei
Weize Kong
Tao Chen
Zhuowan Li
Michael Bendersky
Hamed Zamani
LRM
RALM
ReLM
52
6
0
07 Jan 2025
Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge
Yuhe Ji
Yilun Liu
Feiyu Yao
Minggui He
Shimin Tao
...
Xinhua Yang
Weibin Meng
Yuming Xie
Boxing Chen
Hao Yang
73
2
0
02 Dec 2024
SafeLight: Enhancing Security in Optical Convolutional Neural Network Accelerators
S. Afifi
Ishan G. Thakkar
S. Pasricha
59
0
0
22 Nov 2024
ML
2
^2
2
Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models
JooHyoung Cha
Munyoung Lee
Jinse Kwon
Jubin Lee
Jemin Lee
Yongin Kwon
31
0
0
16 Nov 2024
Running Markov Chain Monte Carlo on Modern Hardware and Software
Pavel Sountsov
Colin Carroll
Matthew D. Hoffman
BDL
24
2
0
06 Nov 2024
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
26
1
0
03 Nov 2024
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Apostolos Kokolis
Michael Kuchnik
John Hoffman
Adithya Kumar
Parth Malani
Faye Ma
Zachary DeVito
S.
Kalyan Saladi
Carole-Jean Wu
86
7
0
29 Oct 2024
Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small
Zhehui Wang
Tao Luo
Cheng Liu
Weichen Liu
Rick Siow Mong Goh
Weng-Fai Wong
18
1
0
21 Oct 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
62
3
0
15 Oct 2024
ACER: Automatic Language Model Context Extension via Retrieval
Luyu Gao
Yunyi Zhang
Jamie Callan
RALM
24
0
0
11 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
29
7
0
08 Oct 2024
DFabric: Scaling Out Data Parallel Applications with CXL-Ethernet Hybrid Interconnects
Xu Zhang
Ke Liu
Yisong Chang
Hui Yuan
Xiaolong Zheng
11
2
0
09 Sep 2024
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Wei An
Xiao Bi
Guanting Chen
Shanhuang Chen
Chengqi Deng
...
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Yuheng Zou
29
5
0
26 Aug 2024
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
Moritz Scherer
Luka Macan
Victor J. B. Jung
Philip Wiese
Luca Bompani
Alessio Burrello
Francesco Conti
Luca Benini
MoE
30
10
0
08 Aug 2024
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models
Ayrton San Joaquin
Bin Wang
Zhengyuan Liu
Nicholas Asher
Brian Lim
Philippe Muller
Nancy Chen
24
0
0
07 Aug 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
69
8
0
29 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
37
3
0
12 Jul 2024
Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture
Mohammed E. Elbtity
Peyton S. Chandarana
Ramtin Zand
36
2
0
11 Jul 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
45
0
0
29 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
37
278
0
24 Jun 2024
Towards Exact Gradient-based Training on Analog In-memory Computing
Zhaoxian Wu
Tayfun Gokmen
M. Rasch
Tianyi Chen
21
2
0
18 Jun 2024
New Solutions on LLM Acceleration, Optimization, and Application
Yingbing Huang
Lily Jiaxin Wan
Hanchen Ye
Manvi Jha
Jinghua Wang
Yuhong Li
Xiaofan Zhang
Deming Chen
37
12
0
16 Jun 2024
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
Jungi Lee
Wonbeom Lee
Jaewoong Sim
MQ
24
14
0
16 Jun 2024
UDON: Universal Dynamic Online distillatioN for generic image representations
Nikolaos-Antonios Ypsilantis
Kaifeng Chen
André Araujo
Ondřej Chum
30
3
0
12 Jun 2024
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
19
1
0
11 Jun 2024
USM RNN-T model weights binarization
Oleg Rybakov
Dmitriy Serdyuk
Chengjian Zheng
MQ
26
0
0
05 Jun 2024
An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging
Sulaiman Khan
Md. Rafiul Biswas
Alina Murad
Hazrat Ali
Zubair Shah
32
4
0
02 Jun 2024
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
40
21
0
29 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
27
3
0
28 May 2024
1
2
3
Next