Out-of-distribution generalization via composition: a lens through induction heads in TransformersProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024 |
Bayesian-guided Label Mapping for Visual ReprogrammingNeural Information Processing Systems (NeurIPS), 2024 |
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient DescentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 |
Varying Shades of Wrong: Aligning LLMs with Wrong Answers OnlyInternational Conference on Learning Representations (ICLR), 2024 |