PRISM: Efficient Long-Range Reasoning With Short-Context LLMs

Long-range tasks demand reasoning over long inputs. Current solutions require large compute budgets, training data, model weight access, or complex task-specific designs. We introduce PRISM, which processes information as a stream of chunks while maintaining a structured in-context memory specified with a typed hierarchical schema. PRISM outperforms baselines on diverse tasks while using at least 4x shorter contexts than long-context models. This approach is token-efficient, producing concise outputs and efficiently leveraging key-value (KV) caches to reduce costs by up to 54% compared to alternative short-context methods. PRISM scales down to tiny chunks (<500 tokens) without increasing encoding costs or sacrificing quality, and generalizes to new tasks with minimal effort by automatically generating schemas from task descriptions.
View on arXiv@article{jayalath2025_2412.18914, title={ PRISM: Efficient Long-Range Reasoning With Short-Context LLMs }, author={ Dulhan Jayalath and James Bradley Wendt and Nicholas Monath and Sandeep Tata and Beliz Gunel }, journal={arXiv preprint arXiv:2412.18914}, year={ 2025 } }