The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchJune 17, 2026

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable ...

Read Original Article →

Source

http://arxiv.org/abs/2606.19057v1