Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07528
Cited By
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
11 June 2024
Jingyao Li
Han Shi
Xin Jiang
Zhenguo Li
Hong Xu
Jiaya Jia
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
5 / 5 papers shown
Title
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
55
4
0
01 Mar 2025
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr F. Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
75
148
0
22 Apr 2024
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch
Uri Alon
Graham Neubig
Matthew R. Gormley
RALM
94
122
0
02 May 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
1