The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchMay 26, 2026
More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations
Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying...
Read Original Article →