From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample EfficiencyInternational Conference on Learning Representations (ICLR), 2024 |
Auto-Regressive Next-Token Predictors are Universal LearnersInternational Conference on Machine Learning (ICML), 2023 |
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural
NetworksInternational Conference on Machine Learning (ICML), 2023 |
Hidden Progress in Deep Learning: SGD Learns Parities Near the
Computational LimitNeural Information Processing Systems (NeurIPS), 2022 |
Learning a Single Neuron for Non-monotonic Activation FunctionsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 |
A spectral-based analysis of the separation between two-layer neural
networks and linear methodsJournal of machine learning research (JMLR), 2021 |
Computational Separation Between Convolutional and Fully-Connected
NetworksInternational Conference on Learning Representations (ICLR), 2020 |