v1v2v3 (latest)

GOLD PANNING: Strategic Context Shuffling for Needle-in-Haystack Reasoning

10 October 2025

Adam Byerly

Daniel Khashabi

ArXiv (abs)PDF HTML

Main:7 Pages

6 Figures

Bibliography:3 Pages

2 Tables

Appendix:5 Pages

Abstract

Large language models (LLMs) exhibit pronounced position bias in long-context needle-in-haystack problems, systematically prioritizing the location of information over its relevance. While current mitigations rely on white-box access, this is effectively impossible for many state-of-the-art models. We introduce GOLD PANNING, a black-box Bayesian framework that performs inference-time active search over long contexts by (i) reordering documents to concentrate high-belief items in highly diagnostic positions (signal anchoring) and (ii) updating beliefs over document relevance from model outputs. Unlike conventional active learning, which prioritizes uncertainty reduction, GOLD PANNING leverages anchoring -- once flagged, keep it in sight -- to preserve weak cues. We implement this using iterative assignment derived from the model's diagnosticity profile, which provably identifies a target among $N$ documents in $O(\log N)$ rounds, ensuring scalability to many-documentthis http URLneedle-in-a-haystack retrieval and long-context QA, GOLD PANNING matches Permutation Self-Consistency's target identification with $30--65%$ fewer queries and remains effective under calibration mismatch, suggesting coarse positional ordering drives performance gains. These results demonstrate that inherent model biases need not be failures, but can be used as tools for control.

View on arXiv

Comments on this paper