PRISM: Efficient Long-Range Reasoning With Short-Context LLMs

25 December 2024

Abstract

Long-range tasks demand reasoning over long inputs. Current solutions require large compute budgets, training data, model weight access, or complex task-specific designs. We introduce PRISM, which processes information as a stream of chunks while maintaining a structured in-context memory specified with a typed hierarchical schema. PRISM outperforms baselines on diverse tasks while using at least 4x shorter contexts than long-context models. This approach is token-efficient, producing concise outputs and efficiently leveraging key-value (KV) caches to reduce costs by up to 54% compared to alternative short-context methods. PRISM scales down to tiny chunks (<500 tokens) without increasing encoding costs or sacrificing quality, and generalizes to new tasks with minimal effort by automatically generating schemas from task descriptions.

View on arXiv

@article{jayalath2025_2412.18914,
  title={ PRISM: Efficient Long-Range Reasoning With Short-Context LLMs },
  author={ Dulhan Jayalath and James Bradley Wendt and Nicholas Monath and Sandeep Tata and Beliz Gunel },
  journal={arXiv preprint arXiv:2412.18914},
  year={ 2025 }
}

Comments on this paper