The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchJune 15, 2026
Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization
Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper that addresses this is...
Read Original Article →