Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.05329
Cited By
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
8 May 2024
Minsik Cho
Mohammad Rastegari
Devang Naik
Re-assign community
ArXiv
PDF
HTML
Papers citing
"KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation"
3 / 3 papers shown
Title
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
60
5
0
04 Nov 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1