ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.01433
  4. Cited By
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning
  with Hardware Support for Embeddings

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

4 April 2023
N. Jouppi
George Kurian
Sheng R. Li
Peter C. Ma
R. Nagarajan
Lifeng Nai
Nishant Patil
Suvinay Subramanian
Andy Swing
Brian Towles
C. Young
Xiaoping Zhou
Zongwei Zhou
David A. Patterson
    BDL
    VLM
ArXivPDFHTML

Papers citing "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings"

50 / 122 papers shown
Title
Multi-turn Reinforcement Learning from Preference Human Feedback
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
45
23
0
23 May 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CE
GNN
28
1
0
22 May 2024
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
Aditya Agrawal
Matthew Hedlund
Blake A. Hechtman
MQ
13
4
0
22 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical
  Fisher information
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
24
4
0
21 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and
  Composition of Experts
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
39
12
0
13 May 2024
Learning from Students: Applying t-Distributions to Explore Accurate and
  Efficient Formats for LLMs
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Jordan Dotzel
Yuzong Chen
Bahaa Kotb
Sushma Prasad
Gang Wu
Sheng R. Li
Mohamed S. Abdelfattah
Zhiru Zhang
24
7
0
06 May 2024
CALRec: Contrastive Alignment of Generative LLMs For Sequential
  Recommendation
CALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation
Yaoyiran Li
Xiang Zhai
M. Alzantot
Keyi Yu
Ivan Vulić
Anna Korhonen
Mohamed Hammad
26
10
0
03 May 2024
Hardware Accelerators for Autonomous Cars: A Review
Hardware Accelerators for Autonomous Cars: A Review
Ruba Islayem
Fatima Alhosani
Raghad Hashem
Afra Alzaabi
Mahmoud Meribout
25
1
0
26 Apr 2024
The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA
  Platforms
The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms
Yu Gao
Juan Camilo Vega
Paul Chow
19
1
0
24 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
27
12
0
14 Apr 2024
Toward Cross-Layer Energy Optimizations in Machine Learning Systems
Toward Cross-Layer Energy Optimizations in Machine Learning Systems
Jae-Won Chung
Mosharaf Chowdhury
24
0
0
10 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
50
47
0
08 Apr 2024
Allo: A Programming Model for Composable Accelerator Design
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
41
14
0
07 Apr 2024
Bigger is not Always Better: Scaling Properties of Latent Diffusion
  Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
50
12
0
01 Apr 2024
Human Alignment of Large Language Models through Online Preference
  Optimisation
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
44
60
0
13 Mar 2024
Communication Optimization for Distributed Training: Architecture,
  Advances, and Opportunities
Communication Optimization for Distributed Training: Architecture, Advances, and Opportunities
Yunze Wei
Tianshuo Hu
Cong Liang
Yong Cui
AI4CE
25
0
0
12 Mar 2024
Beyond Inference: Performance Analysis of DNN Server Overheads for
  Computer Vision
Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision
Ahmed F. AbouElhamayed
Susanne Balle
Deshanand Singh
Mohamed S. Abdelfattah
3DH
19
0
0
02 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
53
117
0
29 Feb 2024
TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge
  AI with Compact Slow-Light Electro-Optic Modulator
TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator
Meng Zhang
Dennis Yin
Nicholas Gangi
Amir Begović
Alexander Chen
Z. Huang
Jiaqi Gu
6
4
0
12 Feb 2024
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
Liangyu Zhao
Saeed Maleki
Ziyue Yang
Hossein Pourreza
Aashaka Shah
18
0
0
09 Feb 2024
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Gregory Kielian
Mehdi Saligane
18
7
0
31 Jan 2024
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
Sami Alabed
Daniel Belov
Bart Chrzaszcz
Juliana Franco
Dominik Grewe
...
Michael Schaarschmidt
Timur Sitdikov
Agnieszka Swietlik
Dimitrios Vytiniotis
Joel Wee
26
3
0
20 Jan 2024
Mathematical Algorithm Design for Deep Learning under Societal and
  Judicial Constraints: The Algorithmic Transparency Requirement
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement
Holger Boche
Adalbert Fono
Gitta Kutyniok
FaML
23
4
0
18 Jan 2024
Neural Rendering and Its Hardware Acceleration: A Review
Neural Rendering and Its Hardware Acceleration: A Review
Xinkai Yan
Jieting Xu
Yuchi Huo
Hujun Bao
3DH
24
6
0
06 Jan 2024
Gemini Pro Defeated by GPT-4V: Evidence from Education
Gemini Pro Defeated by GPT-4V: Evidence from Education
Gyeong-Geon Lee
Ehsan Latif
Lehong Shi
Xiaoming Zhai
26
21
0
27 Dec 2023
GenCast: Diffusion-based ensemble forecasting for medium-range weather
GenCast: Diffusion-based ensemble forecasting for medium-range weather
Ilan Price
Alvaro Sanchez-Gonzalez
Ferran Alet
Tom R. Andersson
Andrew El-Kadi
...
Jacklynn Stott
Shakir Mohamed
Peter W. Battaglia
Rémi R. Lam
Matthew Willson
26
105
0
25 Dec 2023
DEAP: Design Space Exploration for DNN Accelerator Parallelism
DEAP: Design Space Exploration for DNN Accelerator Parallelism
Ekansh Agrawal
Xiangyu Sam Xu
6
1
0
24 Dec 2023
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
56
76
0
23 Dec 2023
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's
  LLM with Open Source SLMs in Production
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production
Chandra Irugalbandara
Ashish Mahendra
Roland Daynauth
T. Arachchige
Jayanaka L. Dantanarayana
K. Flautner
Lingjia Tang
Yiping Kang
Jason Mars
ELM
21
14
0
20 Dec 2023
ESPN: Memory-Efficient Multi-Vector Information Retrieval
ESPN: Memory-Efficient Multi-Vector Information Retrieval
Susav Shrestha
Narasimha Reddy
Zongwang Li
19
6
0
09 Dec 2023
Training Chain-of-Thought via Latent-Variable Inference
Training Chain-of-Thought via Latent-Variable Inference
Du Phan
Matthew D. Hoffman
David Dohan
Sholto Douglas
Tuan Anh Le
Aaron T Parisi
Pavel Sountsov
Charles Sutton
Sharad Vikram
Rif A. Saurous
BDL
ReLM
LRM
16
22
0
28 Nov 2023
A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View
  Synthesis
A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Kai Katsumata
D. Vo
Hideki Nakayama
3DGS
22
14
0
21 Nov 2023
Fast Inner-Product Algorithms and Architectures for Deep Neural Network
  Accelerators
Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators
Trevor E. Pogue
N. Nicolici
14
3
0
20 Nov 2023
Sparsity-Preserving Differentially Private Training of Large Embedding
  Models
Sparsity-Preserving Differentially Private Training of Large Embedding Models
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Pasin Manurangsi
Amer Sinha
Chiyuan Zhang
11
2
0
14 Nov 2023
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small
  Scorer
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Bowen Tan
Yun Zhu
Lijuan Liu
Eric P. Xing
Zhiting Hu
Jindong Chen
ALM
LRM
16
7
0
12 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
31
29
0
10 Nov 2023
Practical Performance Guarantees for Pipelined DNN Inference
Practical Performance Guarantees for Pipelined DNN Inference
Aaron Archer
Matthew Fahrbach
Kuikui Liu
Prakash Prabhu
13
0
0
07 Nov 2023
Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI
  Grand Challenges
Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges
Eren Kurshan
11
1
0
23 Oct 2023
Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study
  on HBM2 DRAM Chips
Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips
Ataberk Olgun
Majd Osseiran
A. G. Yaglikçi
Yahya Can Tugrul
Haocong Luo
Steve Rhyner
Behzad Salami
Juan Gómez Luna
Onur Mutlu
13
8
0
23 Oct 2023
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
  Multi-DNN Workloads
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads
Hongxiang Fan
Stylianos I. Venieris
Alexandros Kouris
Nicholas D. Lane
13
7
0
17 Oct 2023
Exponential Quantum Communication Advantage in Distributed Inference and
  Learning
Exponential Quantum Communication Advantage in Distributed Inference and Learning
H. Michaeli
D. Gilboa
Daniel Soudry
Jarrod R. McClean
FedML
16
0
0
11 Oct 2023
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model
  Acceleration on Distributed Systems
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
Samuel Hsia
Alicia Golden
Bilge Acun
Newsha Ardalani
Zach DeVito
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
MoE
31
9
0
04 Oct 2023
Enabling Language Models to Implicitly Learn Self-Improvement
Enabling Language Models to Implicitly Learn Self-Improvement
Ziqi Wang
Le Hou
Tianjian Lu
Yuexin Wu
Yunxuan Li
Hongkun Yu
Heng Ji
ReLM
LRM
6
5
0
02 Oct 2023
DeepPCR: Parallelizing Sequential Operations in Neural Networks
DeepPCR: Parallelizing Sequential Operations in Neural Networks
Federico Danieli
Miguel Sarabia
Xavier Suau
Yuan-Sen Ting
Luca Zappella
10
1
0
28 Sep 2023
Efficient All-to-All Collective Communication Schedules for
  Direct-Connect Topologies
Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies
P. Basu
Liangyu Zhao
Jason Fantl
Siddharth Pal
Arvind Krishnamurthy
J. Khoury
10
7
0
24 Sep 2023
LMDX: Language Model-based Document Information Extraction and
  Localization
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
48
29
0
19 Sep 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
22
65
0
07 Aug 2023
HUGE: Huge Unsupervised Graph Embeddings with TPUs
HUGE: Huge Unsupervised Graph Embeddings with TPUs
Brandon Mayer
Anton Tsitsulin
Hendrik Fichtenberger
Jonathan J. Halcrow
Bryan Perozzi
GNN
8
1
0
26 Jul 2023
TPU as Cryptographic Accelerator
TPU as Cryptographic Accelerator
Rabimba Karanjai
Sangwon Shin
Xinxin Fan
Lin Chen
Tianwei Zhang
Taeweon Suh
W. Shi
Lei Xu
11
1
0
13 Jul 2023
EnergAt: Fine-Grained Energy Attribution for Multi-Tenancy
EnergAt: Fine-Grained Energy Attribution for Multi-Tenancy
Hongyu Hè
Michal Friedman
Theodoros Rekatsinas
19
7
0
11 Jul 2023
Previous
123
Next