150
v1v2v3 (latest)

GOLD PANNING: Strategic Context Shuffling for Needle-in-Haystack Reasoning

Main:7 Pages
6 Figures
Bibliography:3 Pages
2 Tables
Appendix:5 Pages
Abstract

Large language models (LLMs) exhibit pronounced position bias in long-context needle-in-haystack problems, systematically prioritizing the location of information over its relevance. While current mitigations rely on white-box access, this is effectively impossible for many state-of-the-art models. We introduce GOLD PANNING, a black-box Bayesian framework that performs inference-time active search over long contexts by (i) reordering documents to concentrate high-belief items in highly diagnostic positions (signal anchoring) and (ii) updating beliefs over document relevance from model outputs. Unlike conventional active learning, which prioritizes uncertainty reduction, GOLD PANNING leverages anchoring -- once flagged, keep it in sight -- to preserve weak cues. We implement this using iterative assignment derived from the model's diagnosticity profile, which provably identifies a target among NN documents in O(logN)O(\log N) rounds, ensuring scalability to many-documentthis http URLneedle-in-a-haystack retrieval and long-context QA, GOLD PANNING matches Permutation Self-Consistency's target identification with 306530--65% fewer queries and remains effective under calibration mismatch, suggesting coarse positional ordering drives performance gains. These results demonstrate that inherent model biases need not be failures, but can be used as tools for control.

View on arXiv
Comments on this paper