The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchJune 3, 2026
Sequential Data Poisoning in LLM Post-Training
LLM post-training proceeds through multiple stages, e.g., supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), where each stage draws data from different, potentially untrusted sources. Existing literature assumes data po...
Read Original Article →