New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much. The article New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously appeared first on The Decoder .

Read Original Article →

Source

https://the-decoder.com/new-benchmark-shows-claude-mythos-and-gpt-5-5-can-develop-real-browser-exploits-autonomously/