We study (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter . We show that after at most burn-in steps, GD achieves a risk upper bounded by , where is the margin of the dataset. As can be arbitrarily large, GD attains an arbitrarily small risk , though the risk evolution may be .We further construct hard datasets with margin , where any batch (or online) first-order method requires steps to find a linear separator. Thus, GD with large, adaptive stepsizes is among first-order batch methods. Notably, the classical (Novikoff, 1962), a first-order online method, also achieves a step complexity of , matching GD even in constants.Finally, our GD analysis extends to a broad class of loss functions and certain two-layer networks.
View on arXiv@article{zhang2025_2504.04105, title={ Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes }, author={ Ruiqi Zhang and Jingfeng Wu and Licong Lin and Peter L. Bartlett }, journal={arXiv preprint arXiv:2504.04105}, year={ 2025 } }