409

Stochastic Thermodynamics of Learning Generative Models

Entropy (Entropy), 2023
Abstract

We have formulated generative machine learning problems as the time evolution of Parametric Probabilistic Models (PPMs), inherently rendering a thermodynamic process. Then, we have studied the thermodynamic exchange between the model's parameters, denoted as Θ\Theta, and the model's generated samples, denoted as XX. We demonstrate that the training dataset and the action of the Stochastic Gradient Descent (SGD) optimizer serve as a work source that governs the time evolution of these two subsystems. Our findings reveal that the model learns through the dissipation of heat during the generation of samples XX, leading to an increase in the entropy of the model's parameters, Θ\Theta. Thus, the parameter subsystem acts as a heat reservoir, effectively storing the learned information. Furthermore, the role of the model's parameters as a heat reservoir provides valuable thermodynamic insights into the generalization power of over-parameterized models. This approach offers an unambiguous framework for computing information-theoretic quantities within deterministic neural networks by establishing connections with thermodynamic variables. To illustrate the utility of this framework, we introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the dynamic flow of information during the learning process of PPMs.

View on arXiv
Comments on this paper