On the Convergence of Encoder-only Shallow Transformers

2 November 2023

Papers citing "On the Convergence of Encoder-only Shallow Transformers"

3 / 3 papers shown

Title
Faster WIND: Accelerating Iterative Best-of- $N$ Distillation for LLM Alignment Tong Yang Jincheng Mei H. Dai Zixin Wen Shicong Cen Dale Schuurmans Yuejie Chi Bo Dai 36 4 0 20 Feb 2025
Generalizable autoregressive modeling of time series through functional narratives Ran Liu Wenrui Ma Ellen L. Zippi Hadi Pouransari Jingyun Xiao ... Behrooz Mahasseni Juri Minxha Erdrin Azemi Eva L. Dyer Ali Moin AI4TS 25 0 0 10 Oct 2024
On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths Quynh N. Nguyen 31 49 0 24 Jan 2021