The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchJune 15, 2026
Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models
Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this, we introduce Exper...
Read Original Article →