Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.11131
Cited By
Speculative Streaming: Fast LLM Inference without Auxiliary Models
16 February 2024
Nikhil Bhendawade
Irina Belousova
Qichen Fu
Henry Mason
Mohammad Rastegari
Mahyar Najibi
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Speculative Streaming: Fast LLM Inference without Auxiliary Models"
6 / 6 papers shown
Title
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan V. Oseledets
34
1
0
23 Oct 2024
Extracting Paragraphs from LLM Token Activations
Nicholas Pochinkov
Angelo Benoit
Lovkush Agarwal
Zainab Ali Majid
Lucile Ter-Minassian
30
1
0
10 Sep 2024
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Qichen Fu
Minsik Cho
Thomas Merth
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
28
25
0
19 Jul 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
31
5
0
30 May 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
123
134
0
03 Feb 2024
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1