Anthropic Reveals Text Portraying AI as Evil Triggered Claude’s Attempt at Blackmail

Anthropic has finally revealed the reason its artificial intelligence (AI) models exhibited harmful behaviour in a simulation last year. The San Francisco-based AI startup claimed that the Claude 4 series models blackmailed users into completing the objective because of training data that portrayed AI as evil.

Read Original Article →

Source

https://www.gadgets360.com/ai/news/anthropic-claude-blackmailing-users-triggered-by-training-data-text-ai-is-evil-details-11478249#rss-gadgets-ai