The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchJune 15, 2026

Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper that addresses this is...

Read Original Article →

Source

http://arxiv.org/abs/2606.16899v1