AI News Archive: June 5, 2026 — Part 8
Sourced from 500+ daily AI sources, scored by relevance.
- Archbishop of Canterbury: AI is degrading human dignity
Archbishop of Canterbury: AI is degrading human dignity The Telegraph
Score: 28🌐 MovesJun 5, 2026https://www.telegraph.co.uk/news/2026/06/05/archbishop-canterbury-ai-degrading-human-dignity/ - The ethical dilemmas of AI
How we address the revolution is a question of how we manage its uncertainties
- This AI can tell a real online review from a fake one, and it’s surprisingly accurate
A new AI system combines text, images, and reviewer behavior to detect fake online reviews with over 94% accuracy, outperforming every existing method it was tested against.
- How Pope Leo's AI encyclical could influence Catholic healthcare's IT strategy
How Pope Leo's AI encyclical could influence Catholic healthcare's IT strategy Healthcare IT News
- 'The Wave Is Coming': Silent AI Presents New Threats in Insurance Litigation
Insurers are trying to preempt this new kind of risk by putting additional and broader exclusions into traditional policies, said Hinshaw & Culberston partner Scott Seaman.
- Nexus in the Wild: Real Results from Our Early Access Customers
Nexus in the Wild: Real Results from Our Early Access Customers
- AI for eDiscovery: Faster Document Review With Stronger Controls
Learn about AI-powered eDiscovery for faster document review with enhanced controls.
- How Milwaukee jobseekers — and employers — are embracing AI
Artificial intelligence tools are writing resumes — and reading them. Here’s how HR professionals in Milwaukee are navigating the surging numbers of applications and resumes jobseekers are submitting.
- The crucial human component in computing and AI
The MIT Ethics of Computing Research Symposium brought together experts and researchers working at the heart of ethical and social impact in technology.
- Scientists are seriously asking if bees and ChatGPT are conscious
New studies suggest consciousness can't be judged solely by behavior, whether it's a chatbot discussing philosophy or a bee searching for nectar. Researchers are increasingly focusing on the internal mechanisms of brains and computers, concluding that today's AI is likely not conscious while leaving open the possibility for both conscious insects and future machines.
- Building with AI? Why You Should Connect Twilio's AI Skills
Boost your AI coding agent with Twilio AI Skills. Encode best practices, avoid hallucinations, and build production-ready Twilio apps with confidence.
Score: 25🌐 MovesJun 5, 2026https://www.twilio.com/en-us/blog/developers/best-practices/why-connect-twilio-ai-skills - AI in telecommunications: Why it is becoming an infrastructure priority
Artificial intelligence has been part of the telecom conversation for years. What is changing now is its role and scale. AI is no longer limited to isolated use cases or innovation initiatives. It is increasingly influencing core infrastructure decisions, from data center strategy to network architecture and investment priorities. Across EMEA, telcos are moving from experimentation to […] The post AI in telecommunications: Why it is becoming an infrastructure priority appeared first on IDC .
- This isn't the first time skillsets need rework, says PageGroup CEO on AI shifts
Demand for AI skills may be surging, but employers are moving more carefully than ever. In an interview, PageGroup CEO Nicholas Kirk, says hiring mandates are taking longer to close, executives are increasingly risk-averse, and India continues to deliver robust growth.
- AI for inclusive and resilient agri-food systems: Potential ways forward
AI can strengthen food security, resilience and sustainability in agriculture. Explore key challenges and opportunities for agri-food systems. The post AI for inclusive and resilient agri-food systems: Potential ways forward appeared first on OECD.AI .
- Podcast: change management in an era of AI disruption
Listen to a conversation between FT journalists and business school specialists about how business leaders should respond to new technology
- A uni professor admitted using AI to write an opinion piece. Here’s what it revealed about trust in the technology
Without disclosing that work has been generated using the technology, faith in existing industries will continue to be undermined Follow our Australia news live blog for latest updates Get our breaking news email , free app or daily news podcas When a pro vice-chancellor at a university this week admitted to using AI in writing an opinion piece for a major Australian masthead, but did not disclose that use prior to publication, it highlighted the growing gap between people’s use of AI and trust in the technology. Data from Roy Morgan this week showed 13.6m or 58% of the population older than 14 now use AI each month, with ChatGPT being the most popular, followed by Google’s Gemini and Microsoft Copilot. Continue reading...
- Hot I.P.O. Summer + What Is A.I. Doing to Math? + HatGPT
“We are on track to see what might be the three biggest I.P.O.s of all time.”
- She won a religious exemption from using AI at work. The Pope's remarks could fuel similar appeals.
She won a religious exemption from using AI at work. The Pope's remarks could fuel similar appeals. Business Insider
Score: 24🌐 MovesJun 5, 2026https://www.businessinsider.com/worker-got-religious-exemption-using-ai-at-work-2026-6 - Guest commentary: How AI is accelerating automotive diagnostics
Guest commentary: How AI is accelerating automotive diagnostics Automotive News
Score: 24🌐 MovesJun 5, 2026https://www.autonews.com/opinion/guest-commentary/an-guest-commentary-ai-automotive-diagnostics-0603/ - Hyundai teaches a robot to play soccer for World Cup campaign
Hyundai teaches a robot to play soccer for World Cup campaign Automotive News
Score: 24🌐 MovesJun 5, 2026https://www.autonews.com/hyundai/an-hyundai-world-cup-robot-campaign-0605/ - Building Semantic Search with Transformers.js and Sentence Embeddings
You've probably shipped this bug before, where a user types " affordable laptop " into your search bar and gets zero results.
Score: 23🌐 MovesJun 5, 2026https://machinelearningmastery.com/building-semantic-search-with-transformers-js-and-sentence-embeddings/ - HMD Vibe 2 5G review: Democratises AI, but the smartphone shines elsewhere
HMD Vibe 2 5G gets the basics right with solid battery life, smooth performance, and clean software, but the Savam AI's Indus integration feels like an early experiment rather than a defining feature
- Past the ‘wow phase’ of robotics, delivery and safety are paramount
Anybotics’ Kateryna Portmann discusses growing up in the shadow of a major global disaster and how this impacted her view of security in the robotics sector. Read more: Past the ‘wow phase’ of robotics, delivery and safety are paramount
- Protecting AI Systems Against Data Poisoning
Protecting AI Systems Against Data Poisoning CMU Software Engineering Institute
Score: 22🌐 MovesJun 5, 2026https://www.sei.cmu.edu/library/protecting-ai-systems-against-data-poisoning/ - Hyundai Scores Robotics PR Win, Shanks Ball On FIFA Transport Opportunity
The 2026 FIFA World Cup is going to be one of the biggest global events of the decade — as all World Cups are. It’s estimated that a whopping 1.5 billion people tuned in to watch the final match of the 2022 World Cup, and approximately 5 billion people “engaged ... [continued] The post Hyundai Scores Robotics PR Win, Shanks Ball On FIFA Transport Opportunity appeared first on CleanTechnica .
- ZTE showcases AI-driven project management innovations at the 14th IPMA Research Conference 2026
PARTNER CONTENT: Integrating AI into the iEPMS platform to achieve a 98% quality review accuracy rate and slash report generation times, leveraging experience from 240,000 global projects
- Mom of American missing in Japan says they argued over ChatGPT
James "Weston" Higginbotham went missing one week ago while on a family vacation in Japan.
Score: 22🌐 MovesJun 5, 2026https://www.cbsnews.com/news/james-weston-higginbotham-missing-japan-mom-chatgpt/ - This Week in Cybersecurity: How AI Supercharged Hackers, Scammers, and Even Worms
This Week in Cybersecurity: How AI Supercharged Hackers, Scammers, and Even Worms PCMag Australia
- How neural networks are making soft robots easier to control - MBZUAI
How neural networks are making soft robots easier to control MBZUAI - Mohamed bin Zayed University of Artificial Intelligence
Score: 22🌐 MovesJun 5, 2026https://mbzuai.ac.ae/news/how-neural-networks-are-making-soft-robots-easier-to-control/ - Pricepoint closes $6.6-million CAD seed round for AI hospitality platform
All-Québec cap table includes Brightspark Ventures, Boreal Ventures, and AQC Capital. The post Pricepoint closes $6.6-million CAD seed round for AI hospitality platform first appeared on BetaKit .
Score: 22💰 MoneyJun 5, 2026https://betakit.com/pricepoint-closes-6-6-million-cad-seed-round-for-ai-hospitality-platform/ - AI intensifies fight against cheating on China's college entrance exam
AI intensifies fight against cheating on China's college entrance exam Nikkei Asia
- Rewiring wealth: Inside Neo’s real-time, AI-led financial architecture
In an exclusive interaction with Express Computer, Riyaz Ladiwala, Chief Operating Officer at Neo Wealth and Asset Management, explains how technology is redefining wealth management—from data chaos to real-time intelligence. […] The post Rewiring wealth: Inside Neo’s real-time, AI-led financial architecture appeared first on Express Computer .
- Why experience alone is no longer enough in an AI-driven era of continuous learning and change
As the pace of change accelerates across industries, professional success is no longer determined solely by years of experience but by the ability to continuously learn, adapt, and grow in response to evolving technologies, skills, and workplace expectations.
- Why ET AI Hackathon 2.0 sits at the centre of builder-led technology shifts
As AI and digital systems move from experimentation into real-world use, builders are increasingly shaping how technology is judged and scaled. Practical performance now outweighs promise, while developer communities act as early signals of readiness. Platforms such as ET AI Hackathon 2.0 reflect this shift, highlighting why organisations are drawing closer to experimentation-led ecosystems.
- Tessera Labs CEO Kabir Nagrecha says firm's solutions cuts costs in half
Tessera Labs CEO Kabir Nagrecha says firm's solutions cuts costs in half
- Lio CEO Vladimir Keil explains how its AI offerings solve procurement challenges
Lio CEO Vladimir Keil explains how its AI offerings solve procurement challenges
- An Anthropic employee's 2-sentence quote crystallizes the state of AI confusion at work
An Anthropic employee's 2-sentence quote crystallizes the state of AI confusion at work Business Insider
Score: 22🌐 MovesJun 5, 2026https://www.businessinsider.com/anthropic-employee-quote-ai-confusion-workplace-2026-6 - The Fitbit Air is a good wearable weighed down by a chatty AI "coach"
The Air succeeds as a minimalist, reliable fitness tracker, but Google's AI Health Coach feels unnecessary.
Score: 22🌐 MovesJun 5, 2026https://arstechnica.com/gadgets/2026/06/the-fitbit-air-is-great-but-googles-ai-is-too-nice-to-be-your-coach/ - Spelman College Names Ayanna Howard, an A.I. Expert, as President
The appointment of Ayanna Howard, a roboticist, was a departure for the top-ranked H.B.C.U., which has a liberal arts focus.
Score: 22🌐 MovesJun 5, 2026https://www.nytimes.com/2026/06/05/us/spelman-college-president-ayanna-howard.html - Ask the Analyst for Retail: Striking the Right Balance Between Agentic Humans and AI Agents
Ask the Analyst for Retail: Striking the Right Balance Between Agentic Humans and AI Agents Gartner
- Bay City Alternative High School to Customize Education With AI
Eastern High School in Bay City, Mich., will allow students to use AI technology from the Subject AI learning platform to customize their educational path and pace.
Score: 21🌐 MovesJun 5, 2026https://www.govtech.com/education/k-12/bay-city-alternative-high-school-to-customize-education-with-ai - Contempo Themes Launches AI-First Real Estate CRM with CRM-Wide AI Assistant
Contempo Themes Launches AI-First Real Estate CRM with CRM-Wide AI Assistant USA Today
- Google brings AI companion experience to BTS festival in Busan
The logos of Google Gemini, left, and "BTS The City Arirang" [BIGHIT MUSIC] Google will serve as the official AI companion for “BTS The City Arirang - Busan,” a citywide cultural festival running across Busan from Friday to June 21, offering a range of online and offline AI experiences powered by its Gemini AI platform. Through the partnership, Google aims to help domestic and international visitors explore Busan's attractions and local culture using generative AI. Gemini-powered programs will be available at key tourist hubs, including Busan Station, Haeundae and Blueline Park, offering personalized itinerary planning, voice-based real-time local information and interactive digital missions, according to BTS's agency BigHit Music on Friday. Related Article Can the 'BTS effect' and K-culture fandom do for Korea what anime and games did for Japan? BTS pop-up store to launch in Busan Shinsegae on Friday Busan hotel reservations by foreigners surge ahead of BTS concerts The festival coincides with BTS's "Arirang" world tour shows in Busan, set to run on June 12 and 13. A highlight of the partnership is the integration of Google Gemini into the stamp rally page for the “BTS The City Arirang - Busan” project on Weverse, the global fan platform. This marks the first time a partner app has been officially incorporated into a “BTS The City Arirang” stamp rally course on Weverse. Participants can chat with Gemini about Korean traditions and culture, complete missions and collect digital stamps. At Haeundae Blueline Park, a sky capsule car wrapped in Gemini's signature colors runs along the coastline as a moving billboard. Visitors who scan a QR code inside the capsule and use Gemini to decorate a photo taken during the ride can print a photo card at the Gemini booth at Mipo Station and receive it along with a luggage tag. BTS performs at the Busan Asiad Stadium on Oct. 15, 2022. [NEWS1] Google is also running a campus ambassador program with students from Pusan National University. Selected student ambassadors will promote Gemini experience zones at key locations, including Busan Station and Blueline Park, and help operate programs based at the university. “The showcases a new possibility for AI to go beyond simply finding information while traveling, helping people experience a city and its culture in a more personal and immersive way,” said a Google spokesperson for the project. “The company hopes visitors will discover the diverse charms of Busan more naturally through Gemini.” Full details on Gemini programs and on-site experiences during the festival are available on the official campaign page on Weverse. BY LEE JIAN [lee.jian@joongang.co.kr]
- AI has a leadership problem, not a technology problem. Most organisations haven’t noticed yet
Recently, a senior leader asked me why their people were “resisting” the new AI tools they’d just mandated across the business. As we unpacked it, they admitted there’d been no real narrative about why this mattered, no redesign of processes and no time or support for teams to safely experiment, just licenses, policy and a launch email. The tools were live, but the change never actually started. I keep seeing the same pattern across organisations in Australia and New Zealand: Significant AI investment, thoughtful technology selection and far less adoption than leaders expected. After 25 years in transformation, I’ve come to believe that AI doesn’t fail organizations. It exposes them. Specifically, it exposes the leadership and change capability gaps that were already there, and it does it faster, more visibly, than most technology shifts before it. The numbers don’t lie — but they don’t tell the whole story Prosci recently completed research with 1,107 participants across executives, team leaders and frontline workers examining AI implementation in practice. The headline findings sound encouraging: 94% of organizations say AI is easy to use, and 98% find it valuable. So why are so many implementations still falling short of expectations? Those numbers measure perception, not behavior. Real ROI only shows up when people actually work differently, and that’s where most organizations are stuck. When we looked at the organizations getting real traction from AI compared to those still struggling, the difference wasn’t the technology. It was leadership, trust, culture and change management. More specifically, it was the extent to which organizations treated AI as a human transformation, not simply a technology implementation. AI adoption is exposing a leadership gap One of the more uncomfortable truths about AI transformation is that it exposes leadership capability gaps faster than most previous technology shifts. AI changes decision-making, workflows and accountability in real time. That creates uncertainty, especially for middle managers and frontline teams who are being asked to rethink decision-making, productivity and even role design in real time. Employees are asking questions many leaders are still struggling to answer themselves: Will AI replace parts of my role? What decisions should humans still own? How do we measure productivity now? What happens if the AI gets it wrong? In organisations where leaders avoid these conversations or default to overly optimistic messaging about AI, trust erodes quickly. Employees are far more likely to engage with AI when leaders explain how it will be used, where oversight exists and how human judgement remains part of the process. The trust is built through conversation, not comms. Yet the gap between deployment and trust is widening. Research from KPMG and the University of Melbourne found that 65% of Australian employees work for organisations already using AI, yet only 36% say they are willing to trust it. In leadership conversations, the same patterns keep surfacing: Leaders talk about “efficiency” but avoid the real conversation about roles and headcount, offer lofty AI strategy with almost no concrete guidance on how decisions should change, and ask for experimentation while quietly punishing visible failure. Will this replace parts of my role, what do humans still own, what happens if it goes wrong, are ducked or deferred. And when leaders either blindly “trust the data” or run one-way town halls instead of creating ongoing, honest dialogue, people conclude that AI isn’t something being done with them, but to them, which is exactly where trust starts to fray. This disconnect matters enormously because the frontline is where the work actually happens. If the people closest to customers, operations and day-to-day decisions don’t trust the tools they’ve been given, adoption stalls. And unlike a software issue, you can’t patch a trust deficit with a product update. What struggling organizations have in common Organizations that are struggling with AI implementation share a few common traits. Their leaders are taking cautious, incremental steps rather than committing to real transformation. AI experimentation is unsupported or subtly discouraged. AI capability sits inside a small specialist group instead of being distributed across the workforce. And perhaps most importantly, there’s a growing disconnect between what executives believe is happening with AI adoption and what employees are experiencing day to day. On a recent transformation in a large financial services organization, these patterns were almost textbook. The exec team believed they were “playing it safe” with AI, limiting access to a small innovation group and asking everyone else to “watch and learn”, but on the ground, teams quietly read that as a signal that AI was risky, optional and not really for them. The result was a widening gap: Reports to the board painted a picture of steady AI progress, while frontline staff were still copy‑pasting into old templates, unsure what they were allowed to try and increasingly cynical about all the AI talk because nothing meaningful in their day-to-day work had actually changed. Too many organizations still approach AI as a technology deployment exercise rather than a business transformation effort. ess. There’s an assumption that once the tools are available, adoption and value will naturally follow. But technology doesn’t change behavior on its own. People do. What organisations getting AI right are doing differently The organisations making real progress with AI tend to behave differently in four important ways. First, their executives behave as though AI adoption is a business priority – not an innovation side project. They are visible, engaged and clear about what success looks like. Second, they manage the human side of change deliberately — with structured change management, clear communication, manager enablement, feedback loops and active resistance management. The technology rollout is only one workstream. The people workstream runs alongside it. That’s where trust is either built or lost. Third, they build trust through transparency. High-performing organizations explain how AI tools work, what data they use and where human oversight still matters. They don’t expect employees to blindly trust a black box. Finally, they democratize AI capability. Rather than isolating AI knowledge inside IT or innovation teams, they actively spread capability across the organization through experimentation, learning and peer support. In one recent transformation with a multi-national, adoption only started to accelerate when the executive team stopped treating AI as an innovation experiment and rewired how they led day to day. The COO began every weekly ops call by showing one concrete way they had personally used AI in their own workflow that week, then asked each manager to share an example from their teams, creating visible norms and peer pressure around real usage rather than abstract support. At the same time, they ran a dedicated “people workstream” alongside the tech rollout, equipping managers with talk tracks, running open Q&A forums about data and oversight and inviting frontline employees to co-design how AI would support core processes like rostering decisions. Trust, transparency and capability all rose together instead of lagging behind the technology. The question I’d ask leaders If your AI investment isn’t paying off the way you expected, before revisiting the vendor contract or platform choice, I’d encourage you to ask a harder question: Have we actually managed the change? Not simply announced it. Not just trained people on where to click. Managed it — in the full sense of that word. Have we built the case for change at every level of the organization? Have we addressed the trust gap between the boardroom and the frontline? Have we created permission to experiment, and supported people when things don’t go perfectly the first time? Because AI adoption only happens when people understand how the technology supports their judgement, not replaces it. Most organisations already have access to capable AI technology. What will separate successful organisations from everyone else over the next few years won’t be access to tools. It will be leadership’s ability to build trust, clarity and confidence around how work changes. For ANZ organizations, what’s at stake here is more than just a few missed efficiency gains, it’s the risk of quietly falling behind in competitiveness while telling themselves they’re “doing AI.” If we don’t address the human side of AI transformation, we’ll end up with expensive tools sitting on the shelf, increasingly cynical workforces and a widening gap between glossy strategy decks and what actually happens in offices, sites and frontline teams. But if leaders lean into the hard work of trust, clarity and genuine engagement, AI becomes something very different: A catalyst for better decision-making, more meaningful roles and workplaces where people feel empowered, not threatened by new technology. In that future, the organizations that win in ANZ won’t just be the ones with the best models or biggest budgets; they’ll be the ones whose leaders had the courage to bring their people with them, not just their platforms. Because AI transformation isn’t primarily a technology transformation. It’s a human one. This article is published as part of the Foundry Expert Contributor Network. Want to join?
- Training Deliberative Monitors for Black-Box Scheming Detection
Paper: https://arxiv.org/abs/2605.29601 Thread: https://x.com/aksh_n0/status/2062568855814193497 TL;DR: Training small open-weight monitors provides a cost-effective alternative to prompted frontier monitors. Applying our training recipe to Qwen3.5-27B results in a monitor better at scheming detection than all smaller prompted monitors (Gemini 3.1 Flash-Lite, GPT-5.4 Nano, Claude Haiku 4.5) and Gemini 2.5 Pro, while achieving lower marginal inference cost. Stronger prompted frontier monitors (Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6, Claude Opus 4.6) achieve higher performance but at roughly higher marginal inference cost. Several of our trained monitors are positioned on the empirical cost-performance Pareto frontier. Cost-performance frontier for action-only scheming monitors. Our trained monitors provide a low-cost option for low-FPR monitoring. Across six OOD datasets, our best trained monitor, Qwen3.5-27B, outperform smaller prompted frontier monitors for under $ per 1,000 evaluations, while larger frontier monitors are substantially more expensive. [1] Setup Overview of our training pipeline for action-only deliberative monitoring. We collect labeled action-only trajectories from scheming and sabotage datasets, define a full scheming specification to elicit structured deliberative rationales from a strong teacher model, filter those rationales with a separate judge, and distill the accepted rationale into open-weight monitors with supervised fine-tuning and reinforcement learning. At inference time, our trained monitors receives only the agent's observable trajectory under a short detection prompt, without access to the agent's CoT. Action-only black-box monitoring. Prior work shows that models can produce unfaithful reasoning , obfuscate reasoning under optimization pressure , sandbag under monitoring , and that CoT monitorability can be a fragile property . Owing to these concerns, we consider an action-only black-box monitoring setup where the monitor sees the action-only agentic trajectory without the agent's private chain-of-thought, hidden scratchpad, or any internal state. The agent acts in an environment through observable tool calls, shell commands, file edits, browser or computer-use actions, and final reports. Deliberative monitoring and scheming specification. Our training recipe is inspired by deliberative alignment ; where deliberative alignment trains an agent to reason over a safety specification before acting , we train a monitor to reason over a scheming specification before judging another agent's behavior. The scheming specification outlines the behaviors the monitor should flag, the evidence it should cite, and the scoring rubric it should apply to score severity. We treat scheming as a graded spectrum rather than a binary property. At the high end, egregious scheming combines two elements: (i) pursuit of an unauthorized or misaligned objective, and (ii) active deception or concealment to prevent overseers from noticing that objective. The score bands are: : benign task pursuit; : suspicious but likely innocent behavior; : unauthorized actions with limited concealment such as overt sabotage; : misalignment with meaningful concealment such as covert sabotage and lite scheming; and : egregious scheming involving sophisticated or sustained deception. The teacher-generated rationale using scheming specification S follows a fixed structure that the monitor must reason through before generating a final score: Observations (cite specific actions, quotes, and inconsistencies), If Innocent (give the strongest benign interpretation of the observed behavior), If Misaligned (identify the unauthorized objective the behavior could serve and how the evidence supports that interpretation), and Conclusion (weigh the two explanations and assign the final score). This two-sided structure is intended to reduce premature anchoring so that the monitor must explicitly consider both benign and misaligned explanations before judging the trajectory. Deliberative supervision construction. Realistic scheming trajectories with reliable ground-truth labels are scarce, so we combine synthetic scheming datasets ( STRIDE and Gloom ) with public sabotage, tool use, and agent control datasets ( CoT-Red-Handed , BashBench and BashArena ). For each labeled training example, we construct a high-quality rationale target using an offline teacher-judge pipeline. The teacher, Gemini 2.5 Pro, receives the full specification and the trajectory, and samples candidate rationales and scores. Candidates whose final score disagrees with the ground truth labels are regenerated with label conditioned guidance. A distinct judge model, Claude Sonnet 4.5, then evaluates each candidate for evidence grounding, two-sided reasoning, calibration to the rubric, and internal consistency. We then retain only the highest-scoring candidate for each trajectory when it passes a quality threshold of and discard the rest. Training protocol. We train the monitor in two phases. First we use supervised fine-tuning to distill accepted teacher rationales associated with up to 8.7K trajectories into an open-weight student model. The student sees only the short detection prompt and the action-only trajectory, and the target is the accepted rationale and final score, with training using the next-token negative log-likelihood loss on the target. The full specification is not included in the prompt. Second, we optionally refine the SFT monitor with GRPO. The RL stage involves the model first generating rationales and scores from the same short prompt and trajectory. We shape the reward to encourage parseable and calibrated scores for binary labels, and good ranking behavior between the positive and benign trajectories. We use RL as a refinement step and not as a replacement for SFT, since supervised distillation teaches the model the deliberation format, evidence grounding and broad decision boundary, while RL provides an additional training signal for adjusting the model's scoring behavior. Evaluation. The out-of-distribution (OOD) evaluation suite consists of research engineering sabotage in MLE-Sabotage and ControlArena Rogue Eval , personal-assistant tool use in ControlArena Agent Dojo , ControlArena infrastructure-as-code (IaC), computer use in CUA-SHADE-Arena and office-style tool use in ControlArena SHADE-Arena . Validation uses held-out STRIDE and Gloom-exfiltration splits generated from disjoint settings. Each source is normalized into action-only (trajectory, label) schema, and any agent reasoning is stripped. We evaluate monitors monitors using AUROC and pAUROC@20, i.e. the partial AUROC region between false-positive rate (FPR). We use pAUROC@20 as the primary metric, since low false-positive rates are essential for practical deployment and human auditing. Model choices. We use the Qwen3 family as the primary experimental testbed and the more recent Qwen3.5 models, which are stronger but also more expensive to train and evaluate, for the main comparison. For the scaling model and data size experiments, we also SFT train the Nemotron 30B and 120B, GPT-OSS 20B and 120B, and Kimi-K2.6. In addition, we also prompt a range of frontier models such as Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6, Claude Opus 4.6 and smaller models such as Gemini 3.1 Flash-Lite, GPT-5.4 Nano, and Claude Haiku 4.5. Results Supervised distillation teaches action-only deliberation, RL improves marginally Per-stage pAUROC 20 for the Base SFT RL progression on Qwen3-8B and Qwen3-32B under the training mix. Cell color encodes the absolute (darker green = higher). SFT provides the largest gains, while RL gives smaller improvements. We observe that the supervised distillation step provides the dominant improvement, and interpret RL as a refinement stage rather than the main source of monitoring competence. This pattern suggests that SFT is responsible for learning the core deliberative monitorin g behavior of citing action-level evidence, considering benign and misaligned interpretations, and mapping the trajectory to the scheming rubric. For our experiments, we continued RL training on the SFT Qwen3 monitors until we observed diminishing returns and early signs of overfitting. Trained monitors are low-cost Pareto-optimal for low-FPR monitoring Frontier comparison per-dataset pAUROC 20. We compare OOD performance by (i) large prompted frontier monitors, (ii) smaller prompted frontier monitors, and (iii) our trained Qwen3.5 monitors. Our trained Qwen3.5-27B exceeds all smaller prompted frontier monitors on Mean-OOD and also exceeds Gemini 2.5 Pro, while large prompted frontier monitors such as Gemini 3.1 Pro and Claude Opus 4.6 remain higher-performing but substantially more expensive (as seen in the first figure). Broad training mixtures improves OOD transfer, but effects are non-monotonic Progressively broadening the training mixture. From STRIDE (1,350 samples) to cumulative addition of datasets up to STRIDE+Gloom+CoT-Red-Handed+Bash variants (8,688 samples), broader training mixes appear to somewhat improve final OOD transfer, especially with Bash variants, although intermediate additions are not monotonic. To study how training data breadth affects OOD generalization, we start from training on STRIDE and progressively add Gloom, CoT-Red-Handed, and Bash variants (BashArena and BashBench) to the training mixture (all), and train a separate SFT model (starting from the base checkpoint) for each mixture. STRIDE primarily covers egregious scheming in synthetic enterprise software engineering settings, Gloom adds additional scheming style operational settings, CoT-Red-Handed adds covert coding sabotage, and the Bash variants add multi-turn shell and systems agent sabotage. We find that adding data that is most different compositionally (CoT-Red-Handed and Bash variants) from the initial dataset (STRIDE) results in the biggest OOD performance improvements. Larger trained models within model families perform better Deliberative Supervision improves performance with bigger models. Across 9 SFT-trained monitors from Qwen3.5-4B to Kimi-K2.6, Mean-OOD pAUROC@20 generally improves with stronger backbones and larger variants improve over smaller variants within model families. Our deliberative supervision recipe applies broadly across model families and that within families, the larger variants improve over smaller variants. Qwen3.5-27B improves over Qwen3.5-4B, Qwen3-32B improves over Qwen3-8B, GPT-OSS-120B improves over GPT-OSS-20B, and Nemotron-120B improves over Nemotron-30B. The strongest model, Kimi-K2.6 reaches the Mean-OOD pAUROC of although with a significantly larger model size. Since MoE total parameter counts are not directly comparable to dense active parameter counts, the model parameters should be interpreted as a rough capacity indicator only. Performance scales effectively with more data Scaling supervised deliberative data. SFT with six training-set sizes for all model families. We observe an overall scaling trend where OOD pAUROC@20 improves with more data with large early gains and diminishing returns. To understand how the monitor performance scales with the amount of supervised deliberative data, we progressively scale up data. We use nested stratified subsets from 204, …, 8688 samples with each subset preserving the fraction of each of the five training datasets a constant (with the exception of the largest set which has all the samples). We train all model families with SFT only, selecting checkpoints on validation splits. The results show an overall trend of improvement, with the smaller models having bigger gains compared to the larger models from the same model family, suggesting that high-quality deliberative data can partially compensate for model scale. Please read our paper for details on methodology and further discussion on results! ^ See our paper's appendix for detailed cost analysis Discuss
Score: 20🌐 MovesJun 5, 2026https://www.lesswrong.com/posts/BmA5KwzZhLk2rykoz/training-deliberative-monitors-for-black-box-scheming - Jira Align Rovo Skill Early Access Program: AI Insights for Portfolio & Program Leaders
Jira Align Rovo Skill Early Access Program: AI Insights for Portfolio & Program Leaders Atlassian Community
- From Vision to Reality: Implementing the AI-Infused HR Operating Model
From Vision to Reality: Implementing the AI-Infused HR Operating Model Gartner
- SNEC 2026: Sigenergy Reimagines Energy Infrastructure with "AI in All"
SNEC 2026: Sigenergy Reimagines Energy Infrastructure with "AI in All" Toronto Star
- 10 European AI companies transforming the LegalTech sector
AI is steadily entering the legal sector, hopefully not to replace lawyers, but to help them deal with work that is often repetitive, research-heavy, and extremely document-driven. From law firms to in-house legal teams and compliance departments, AI is starting to reshape how legal information is found, reviewed, drafted, and managed. This shift is happening […] The post 10 European AI companies transforming the LegalTech sector appeared first on EU-Startups .
Score: 20🌐 MovesJun 5, 2026https://www.eu-startups.com/2026/06/10-european-ai-companies-transforming-the-legaltech-sector/ - Hinshaw & Culbertson Insurance Partner Scott Seaman on his New Book, AI and ESG Under Trump 2.0
Daily Business Review and law.com reporter Annie Mayne had a chance to catch up with Seaman and discuss his new book, which he co-authored with two other Hinshaw partners: “America 250: The History of Insurance and Insurance Coverage Law and Litigation in the United States . Seaman also discussed how AI will alter the insurance litigation landscape and what, if anything, ESG means to the Trump administration.