Partenaires

MSC
Logo CNRS Logo Université Paris Diderot
Logo tutelle Logo tutelle



Rechercher

Sur ce site

Sur le Web du CNRS


Accueil du site > Seminars > Séminaires théorie > Theory Club Friday, June 22 at 12:00pm in room 454A. Grant Rotskoff: "Parameters as interacting particles: asymptotic scaling, convexity, and error of neural networks".

Theory Club Friday, June 22 at 12:00pm in room 454A. Grant Rotskoff: "Parameters as interacting particles: asymptotic scaling, convexity, and error of neural networks"

Unless otherwise stated, seminars and defences take place at 11:30 in room 454A of Condorcet building.


Parameters as interacting particles: asymptotic scaling, convexity, and error of neural networks

Grant Rotskoff

Abstract: The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^-1)$. As a consequence, we rederive the universal approximation theorem for neural networks but we additionally prove that the optimal representation can be achieved through stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^-1)$, for suitable choices of the batch size. These fluctuations in the landscape demonstrate the necessity of the noise inherent in stochastic gradient descent and our analysis provides a precise scale for tuning this noise. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions. From our insights, we extract several practical guidelines for large scale applications of neural networks, emphasizing the importance of both noise and quenching, in particular.

Friday, June 22 at 12:00pm in room 454A


Contact : Équipe séminaires / Seminar team - Published on / Publié le 13 June


In the same section :