Learning zero, and what SLT gets wrong about it

This is a first in a pair of posts I'm hoping to write about Singular Learning Theory (SLT) and singularities as a model of data degeneracy. If I get to it, the second post is going to be more general-audience; this one is more technical. Introduction To me, SLT is an important source of toy models which point at an interesting class of new statistical phenomena in learning. It is also a valuable correction to an older and (at this point) largely-defunct story of learning being fully controlled by Hessian eigenvalues and "nonsingular basins". Practitioners of SLT have been instrumental for developing and refining the practice of Bayesian sampling (used by physicists in papers like this one ) to empirical models. And the theory's founder Sumio Watanabe is a once-in-a-generation genius who saw and mathematically justified crucial statistical and information-theoretic concepts in ML long before they appeared in "mainstream" theory. However there is a frequently repeated statement in SLT p

Read Original Article →

Source

https://www.lesswrong.com/posts/5hKgJy8rcqnM9ntp2/learning-zero-and-what-slt-gets-wrong-about-it