The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchJune 17, 2026
Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning
Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable ...
Read Original Article →