Rethinking Reflection in Pre-Training

Abstract

A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.

View on arXiv

@article{ai2025_2504.04022,
  title={ Rethinking Reflection in Pre-Training },
  author={ Essential AI and Darsh J Shah and Peter Rushton and Somanshu Singla and Mohit Parmar and Kurt Smith and Yash Vanjani and Ashish Vaswani and Adarsh Chaluvaraju and Andrew Hojel and Andrew Ma and Anil Thomas and Anthony Polloreno and Ashish Tanwer and Burhan Drak Sibai and Divya S Mansingka and Divya Shivaprasad and Ishaan Shah and Karl Stratos and Khoi Nguyen and Michael Callahan and Michael Pust and Mrinal Iyer and Philip Monk and Platon Mazarakis and Ritvik Kapila and Saurabh Srivastava and Tim Romanski },
  journal={arXiv preprint arXiv:2504.04022},
  year={ 2025 }
}

Comments on this paper