v1v2 (latest)

Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

17 October 2024

ArXiv (abs)PDF HTML Github

Main:7 Pages

6 Figures

Bibliography:2 Pages

1 Tables

Appendix:1 Pages

Abstract

The Straight-Through Estimator (STE) is the dominant method for training neural networks with discrete variables, enabling gradient-based optimisation by routing gradients through a differentiable surrogate. However, existing STE variants conflate two fundamentally distinct concerns: forward-pass stochasticity, which controls exploration and latent space utilisation, and backward-pass gradient dispersion i.e how learning signals are distributed across categories. We show that these concerns are qualitatively different and that tying them to a single temperature parameter leaves significant performance gains untapped. We propose Decoupled Straight-Through (Decoupled ST), a minimal modification that introduces separate temperatures for the forward pass ( $\tau_f$ ) and the backward pass ( $\tau_b$ ). This simple change enables independent tuning of exploration and gradient dispersion. Across three diverse tasks (Stochastic Binary Networks, Categorical Autoencoders, and Differentiable Logic Gate Networks), Decoupled ST consistently outperforms Identity STE, Softmax STE, and Straight-Through Gumbel-Softmax. Crucially, optimal $(\tau_f, \tau_b)$ configurations lie far off the diagonal $\tau_f = \tau_b$ , confirming that the two concerns do require different answers and that single-temperature methods are fundamentally constrained.

View on arXiv

Comments on this paper