ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05736
  4. Cited By
LLMLingua: Compressing Prompts for Accelerated Inference of Large
  Language Models

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

9 October 2023
Huiqiang Jiang
Qianhui Wu
Chin-Yew Lin
Yuqing Yang
Lili Qiu
ArXivPDFHTML

Papers citing "LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models"

50 / 70 papers shown
Title
An Empirical Study on Prompt Compression for Large Language Models
An Empirical Study on Prompt Compression for Large Language Models
Z. Zhang
Jinyi Li
Yihuai Lan
X. Wang
Hao Wang
MQ
42
0
0
24 Apr 2025
The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation
The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation
Z. Zhang
Ning Li
Qi Liu
Rui Li
W. Gao
Qingyang Mao
Zhenya Huang
Baosheng Yu
Dacheng Tao
RALM
34
0
0
11 Apr 2025
Saliency-driven Dynamic Token Pruning for Large Language Models
Saliency-driven Dynamic Token Pruning for Large Language Models
Yao Tao
Yehui Tang
Yun Wang
Mingjian Zhu
Hailin Hu
Yunhe Wang
34
0
0
06 Apr 2025
Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems
Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems
Sejong Kim
Hyunseo Song
Hyunwoo Seo
Hyunjun Kim
RALM
77
0
0
19 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
H. Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
59
1
0
18 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
76
0
0
14 Mar 2025
On Memory Construction and Retrieval for Personalized Conversational Agents
On Memory Construction and Retrieval for Personalized Conversational Agents
Zhuoshi Pan
Qianhui Wu
Huiqiang Jiang
Xufang Luo
Hao Cheng
...
Y. Yang
Chin-Yew Lin
H. V. Zhao
Lili Qiu
Jianfeng Gao
RALM
56
3
0
08 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
107
3
0
04 Feb 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
Compressed Chain of Thought: Efficient Reasoning Through Dense
  Representations
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Jeffrey Cheng
Benjamin Van Durme
LRM
69
24
0
17 Dec 2024
Rank It, Then Ask It: Input Reranking for Maximizing the Performance of
  LLMs on Symmetric Tasks
Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks
Mohsen Dehghankar
Abolfazl Asudeh
61
1
0
30 Nov 2024
AssistRAG: Boosting the Potential of Large Language Models with an
  Intelligent Information Assistant
AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant
Yujia Zhou
Zheng Liu
Zhicheng Dou
AIFin
LRM
RALM
31
2
0
11 Nov 2024
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Alireza Rezazadeh
Zichao Li
Wei Wei
Yujia Bao
30
4
0
17 Oct 2024
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang
Yecheng Wu
Shang Yang
Enze Xie
Junsong Chen
Junyu Chen
Zhuoyang Zhang
Han Cai
Y. Lu
Song Han
61
32
0
14 Oct 2024
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for
  In-Context Learning
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning
Chengsong Huang
Langlin Huang
Jiaxin Huang
MoMe
27
1
0
14 Oct 2024
KV Prediction for Improved Time to First Token
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
32
1
0
10 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
48
1
0
09 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning
  in LLMs
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
40
1
0
07 Oct 2024
ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question
  Answering
ALR2^22: A Retrieve-then-Reason Framework for Long-context Question Answering
Huayang Li
Pat Verga
Priyanka Sen
Bowen Yang
Vijay Viswanathan
Patrick Lewis
Taro Watanabe
Yixuan Su
RALM
LRM
40
6
0
04 Oct 2024
Geometric Collaborative Filtering with Convergence
Geometric Collaborative Filtering with Convergence
Hisham Husain
Julien Monteil
FedML
23
0
0
04 Oct 2024
How Much Can RAG Help the Reasoning of LLM?
How Much Can RAG Help the Reasoning of LLM?
Jingyu Liu
Jiaen Lin
Yong Liu
LRM
20
9
0
03 Oct 2024
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey
  on How to Make your LLMs use External Data More Wisely
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Siyun Zhao
Yuqing Yang
Zilong Wang
Zhiyuan He
Luna Qiu
Lili Qiu
SyDa
RALM
3DV
32
33
0
23 Sep 2024
Do Large Language Models Need a Content Delivery Network?
Do Large Language Models Need a Content Delivery Network?
Yihua Cheng
Kuntai Du
Jiayi Yao
Junchen Jiang
KELM
36
7
0
16 Sep 2024
E2LLM: Encoder Elongated Large Language Models for Long-Context
  Understanding and Reasoning
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
Zihan Liao
Jun Wang
Hang Yu
Lingxiao Wei
Jianguo Li
Jun Wang
Wei Zhang
19
2
0
10 Sep 2024
LanguaShrink: Reducing Token Overhead with Psycholinguistics
LanguaShrink: Reducing Token Overhead with Psycholinguistics
Xuechen Liang
Meiling Tao
Yinghui Xia
Tianyu Shi
Jun Wang
JingSong Yang
23
1
0
01 Sep 2024
QUITO: Accelerating Long-Context Reasoning through Query-Guided Context
  Compression
QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression
Zhaohong Liu
Yihang Wang
Yixing Fan
Huaming Liao
Peng Dong
19
1
0
01 Aug 2024
Finch: Prompt-guided Key-Value Cache Compression
Finch: Prompt-guided Key-Value Cache Compression
Giulio Corallo
Paolo Papotti
33
3
0
31 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question
  Answering
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
18
0
0
28 Jul 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache
  Consumption
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
31
31
0
25 Jul 2024
LeKUBE: A Legal Knowledge Update BEnchmark
LeKUBE: A Legal Knowledge Update BEnchmark
Changyue Wang
Weihang Su
Yiran Hu
Qingyao Ai
Yueyue Wu
Cheng Luo
Yiqun Liu
Min Zhang
Shaoping Ma
AILaw
ELM
30
3
0
19 Jul 2024
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Qichen Fu
Minsik Cho
Thomas Merth
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
33
25
0
19 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
35
41
0
09 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
67
81
0
02 Jul 2024
Cloud-Edge-Terminal Collaborative AIGC for Autonomous Driving
Cloud-Edge-Terminal Collaborative AIGC for Autonomous Driving
Jianan Zhang
Zhiwei Wei
Boxun Liu
Xiayi Wang
Yong Yu
Rongqing Zhang
18
5
0
02 Jul 2024
Searching for Best Practices in Retrieval-Augmented Generation
Searching for Best Practices in Retrieval-Augmented Generation
Xiaohua Wang
Zhenghua Wang
Xuan Gao
Feiran Zhang
Yixin Wu
...
Qi Qian
Ruicheng Yin
Changze Lv
Xiaoqing Zheng
Xuanjing Huang
48
40
0
01 Jul 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Brandon Huang
Chancharik Mitra
Assaf Arbelle
Leonid Karlinsky
Trevor Darrell
Roei Herzig
37
12
0
21 Jun 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
Qi Liu
Bo Wang
Nan Wang
Jiaxin Mao
RALM
72
3
0
21 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
67
13
0
20 Jun 2024
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS
  Optimization Framework
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework
Zhi Yao
Zhiqing Tang
Jiong Lou
Ping Shen
Weijia Jia
37
7
0
19 Jun 2024
Supportiveness-based Knowledge Rewriting for Retrieval-augmented
  Language Modeling
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
Zile Qiao
Wei Ye
Yong Jiang
Tong Mo
Pengjun Xie
Weiping Li
Fei Huang
Shikun Zhang
KELM
25
4
0
12 Jun 2024
Metaheuristics and Large Language Models Join Forces: Toward an Integrated Optimization Approach
Metaheuristics and Large Language Models Join Forces: Toward an Integrated Optimization Approach
Camilo Chacón Sartori
Christian Blum
Filippo Bistaffa
Guillem Rodríguez Corominas
AIFin
51
3
0
28 May 2024
Compressing Lengthy Context With UltraGist
Compressing Lengthy Context With UltraGist
Peitian Zhang
Zheng Liu
Shitao Xiao
Ninglu Shao
Qiwei Ye
Zhicheng Dou
24
4
0
26 May 2024
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao
Hanchen Li
Yuhan Liu
Siddhant Ray
Yihua Cheng
Qizheng Zhang
Kuntai Du
Shan Lu
Junchen Jiang
42
14
0
26 May 2024
Accelerating Inference of Retrieval-Augmented Generation via Sparse
  Context Selection
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
Yun Zhu
Jia-Chen Gu
Caitlin Sikora
Ho Ko
Yinxiao Liu
...
Lei Shu
Liangchen Luo
Lei Meng
Bang Liu
Jindong Chen
RALM
22
14
0
25 May 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
73
45
0
22 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
40
4
0
18 May 2024
Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented
  Large Language Models
Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models
Zhongzhen Huang
Kui Xue
Yongqi Fan
Linjie Mu
Ruoyu Liu
Tong Ruan
Shaoting Zhang
Xiaofan Zhang
LM&MA
RALM
33
5
0
27 Apr 2024
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots
Simranjit Singh
Michael Fore
Dimitrios Stamoulis
LLMAG
22
12
0
23 Apr 2024
Rethinking LLM Memorization through the Lens of Adversarial Compression
Rethinking LLM Memorization through the Lens of Adversarial Compression
Avi Schwarzschild
Zhili Feng
Pratyush Maini
Zachary Chase Lipton
J. Zico Kolter
39
39
0
23 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
67
45
0
23 Apr 2024
12
Next