This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computation overhead during the prefill stage and the "lost in the middle" performance degradation of large language models (LLMs) during long sequence processing. Implemented through a novel dynamic context optimization mechanism, QwenLong-CPRS enables multi-granularity context compression guided by natural language instructions, achieving both efficiency gains and improved performance.

View on arXiv

Comments on this paper

QwenLong-CPRS: Towards ∞\infty∞-LLMs with Dynamic Context Optimization

QwenLong-CPRS: Towards $\infty$ -LLMs with Dynamic Context Optimization