The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchMay 21, 2026

Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

Large language model post-training methods such as supervised fine-tuning (SFT), reinforcement learning (RL), and distillation are often analyzed through their loss functions: maximum likelihood, policy gradients, forward KL, reverse KL, or related objective-level variants. We study a complementary ...

Read Original Article →

Source

http://arxiv.org/abs/2605.22731v1