On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse RecoveryInternational Conference on Learning Representations (ICLR), 2024 |
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization AnalysisInternational Conference on Learning Representations (ICLR), 2024 |
The pitfalls of next-token predictionInternational Conference on Machine Learning (ICML), 2024 |