AI News Archive: June 4, 2026 — Part 6
Sourced from 500+ daily AI sources, scored by relevance.
- AI gets a body, and capital changes its gaze (KOR)
Lee Soo-hwa The author is a research professor of Big Data Innovation Convergence College at Seoul National University and the head of the AI Center at DLG Law Corporation. AI’s next battleground is not the data center but the living room. The valuation of Boston Dynamics, a robotics company under Hyundai Motor Group, has reportedly climbed to around 30 trillion won ($19.6 billion). Meanwhile, LG Electronics recently launched Axium, its robotics actuator series, and announced plans to become a vertically integrated robotics company. Hyundai Motor Group has released a campaign video ahead of the 2026 FIFA World Cup, in which Atlas, the humanoid robot developed by its subsidiary Boston Dynamics, practices football skills. In the five-part training series, Atlas successfully performs a “Ghost Rabona” kick, deceiving defenders with the trick move. [PHOTO BY HYUNDAI MOTOR GROUP] “The appliance of the future is a humanoid,” Seoul National University Prof. Jang Byung-tak said, capturing a shift that is reshaping investor sentiment. This is not just another technology stock rally. Investors in western Seoul’s Yeouido and on Wall Street are placing a premium on the moment when AI, once confined to generating text on screens, enters the physical world through machines equipped with motors and metal frames. Behind this shift is a new understanding of intelligence. Intelligence is no longer viewed as the product of calculation alone. Humans do not record the world as cameras do; instead, they create meaning through interactions shaped by perception, experience and emotion — a process that cognitive science refers to as embodied cognition. Industrial robotic arms operate in highly controlled settings, where they repeat predefined tasks. Future humanoids, however, must function in far less predictable environments. A family living room, where children leave their toys scattered across the floor, presents challenges that do not exist on an assembly line. As a result, the humanoid race involves more than computing power. People constantly create unexpected situations. They can alter the location of furniture and other household items. They can dirty the floor. Even the seemingly simple act of washing dishes could require adaptation. The longstanding AI ambition of categorizing the world and governing it through fixed rules is insufficient for such conditions. Humanoids must learn by interacting with reality rather than relying solely on preprogrammed responses. Related Article Humanoid robots seem ready to do the heavy lifting, but concerns still weighty Boston Dynamics showcases Atlas' strength, ability to shoulder industrial weight Hyundai’s factory humanoid will differ from the Atlas that you know At CES 2026, a humanoid robot showdown between Korea and China Competition has already become global. China’s Unitree is pursuing aggressive expansion with relatively affordable humanoid robots. Tesla and major U.S. technology companies have begun deploying robots in their own factories. Korean firms are responding with advanced manufacturing capabilities. The key question is not who possesses the most sophisticated algorithm but who can build machines capable of functioning in imperfect human environments. The standards used to assess corporate value are also changing. Investors increasingly reward companies that move beyond controlled settings and into everyday life. The premium goes to those capable of recovering after failure and adapting to unexpected variables. Machines are no longer digital ghosts confined to screens. They are entering daily life in physical form. The future will belong to machines and organizations that can withstand real-world shocks and continue learning from them. A machine that never falls can never learn how to walk. AI가 몸을 얻자 자본의 시선이 바뀌었다 이수화 서울대 빅데이터혁신융합대학 연구교수·법무법인 디엘지AI센터장 인공지능(AI)의 다음 전쟁터는 백색가전이 사는 우리의 거실이다. 현대차그룹 산하 보스턴 다이내믹스의 기업가치가 30조원 규모로 뛰었다. LG전자가 로봇 구동장치 브랜드 악시움을 내놓으며 로보틱스 수직통합 기업으로 전환하겠다고 선언했다. “미래의 가전은 휴머노이드다”라는 장병탁 서울대 교수의 최근 발언은 이 흐름을 압축한다. 단순한 기술주 랠리가 아니다. 여의도와 월스트리트는 지금, 화면 속에서 텍스트를 생성하던 인공지능이 철과 모터를 갖춘 몸으로 현실 세계에 들어오는 순간에 프리미엄을 부여하고 있다. 이 변화의 배경에는 하나의 인식 전환이 있다. 지능은 뇌 속 계산만으로 완성되지 않는다는 생각이다. 인간은 세상을 카메라처럼 저장하지 않는다. 착시와 경험, 감정이라는 필터를 통해 세계와 부딪히며 의미를 만들어 낸다. 인지과학은 이를 ‘체화된 인지(embodied cognition)’라고 부른다. 공장의 로봇 팔이 정해진 환경에서 반복 작업을 수행하는 ‘온실형 기계’라면, 미래의 휴머노이드는 아이가 장난감을 어질러 놓은 거실이라는 야생 속으로 들어가야 한다. 현실은 공장 라인과 달리 예측 불가능하고 늘 어수선하다. 그래서 휴머노이드 경쟁의 본질은 계산 능력만이 아니다. 싱크대 앞 설거지조차 그렇다. 컵의 위치가 바뀌고, 바닥은 젖어 있으며, 인간은 늘 예외를 만든다. 세상을 완벽히 분류하고 규칙으로 통제할 수 있다는 오래된 AI의 꿈만으로는 이 현실을 헤쳐 나가기 어렵다. 휴머노이드는 완벽한 계산기가 아니라, 현실과 충돌하며 배우는 존재다. 판은 이미 글로벌 경쟁으로 옮겨갔다. 1000만원대 가격으로 물량 공세를 펼치는 중국 유니트리, 자사 공장에 실전 배치를 시작한 테슬라와 미국 빅테크, 정교한 제조 역량으로 맞서는 한국 기업들이 각자의 강점을 앞세워 치열하게 경쟁하고 있다. 이 경쟁에서 기업가치를 가르는 기준도 달라지고 있다. 프리미엄은 통제된 환경을 떠나 인간의 현실 속으로 들어가는 기업에 돌아간다. 넘어지고, 실패하고, 예상치 못한 변수에 흔들리면서도 다시 균형을 찾는 능력이다. 반면 위기의 순간 인간 개입에 기대는 안전지대형 모델은 시장의 냉혹한 할인 평가를 피하기 어려울 것이다. 기계는 더는 모니터 속 유령이 아니다. 철과 실리콘의 몸을 입고 우리 일상 속으로 들어오고 있다. 텍스트를 다루는 영리함을 넘어, 현실의 충격 속에서도 다시 세계와 접속하는 능력을 갖춘 기계와 조직이 미래를 가져갈 것이다. 넘어지지 않는 기계는 결코 걷는 법을 배우지 못한다. This article was originally written in Korean and translated by a bilingual reporter with the help of generative AI tools. It was then edited by a native English-speaking editor. All AI-assisted translations are reviewed and refined by our newsroom.
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
At our latest Snorkel AI Reading Group, Yijia Shao (Stanford NLP) stopped by our San Francisco office to present Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration. As LLM agents get better at automating tasks on their own, a large class of real-world problems still needs a human in the loop – for their preferences, their domain expertise, or simply for control.... The post Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration appeared first on Snorkel AI .
- AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence
New research is causing quite some controversy on Reddit — but it makes some very interesting points.
- Goldman lifts MSCI EM target on AI boost, flags Iran deal relief for forex, bonds
The brokerage raised its benchmark index target to 2,000 from 1,850, implying a nearly 12% upside from its last close of 1,787.88
- Rethinking risk in the age of AI
Join senior risk and payments leaders in Seattle to explore how AI is reshaping fraud strategy. Seats are limited.
- Qlik and Starburst Turn Fragmented Enterprise Data into Governed, AI-Ready Intelligence
Qlik recently announced a strategic partnership with Starburst to help enterprises turn fragmented data into governed, AI-ready intelligence. The collaboration will pair Qlik’s data integration, replication, analytics and agentic workflows with Starburst’s federated query engine, context layer and agentic capabilities, giving customers more choice in how they query, move, prepare and use data across cloud, on-premises […] The post Qlik and Starburst Turn Fragmented Enterprise Data into Governed, AI-Ready Intelligence appeared first on CXOToday.com .
- Fight back faster: Why AI-powered defense is no longer optional for enterprise security
The new AI-powered threat environment has already changed in ways that security teams cannot address by working harder or adding head count. According to the Unit 42 Global Incident Response Report 2026 , which draws on more than 750 major incidents, attackers can move from initial access to data exfiltration in as little as 72 minutes, four times as fast as in the prior year. What’s more, exploit scans begin within 15 minutes of a vulnerability disclosure. But AI has not created new categories of attack so much as it has removed the friction from existing ones, compressing defenders’ response timelines from days to minutes. New frontier AI models present a step change in capabilities. Trained to write code, they are remarkably good at finding vulnerabilities, combining multiple lower-severity issues into critical-level exploit paths and analyzing the full exposure surface of applications, including SaaS and public-facing platforms. As more capable frontier AI models become widely accessible, attackers will increasingly be able to automate reconnaissance, vulnerability discovery, phishing campaigns, and lateral movement at a level previously impossible for individual operators or small teams. As Palo Alto Networks Chairman and CEO Nikesh Arora writes in Weaponized Intelligence , frontier AI models are now capable of methodically cataloging every weakness in an organization’s technology infrastructure, at scale and without pause. Aided by frontier AI, a single threat actor will be able to run campaigns that once required entire teams. What makes this moment especially dangerous is that most organizations are, for the most part, not losing ground due to exotic, novel exploits. Instead, AI-powered attacks are rapidly taking advantage of conditions that CIOs have already had the ability to fix. In more than 90% of the incidents Unit 42 investigated, preventable gaps in security coverage materially enabled the intrusion. Misconfigurations, inconsistently applied controls, and excessive identity trust were more decisive than any zero-day vulnerability. The structural problem runs deeper than any individual gap. Arora writes that in 75% of breaches, the logging existed that should have flagged the anomalous behavior . The warning signs were there, but they were buried across fragmented, disconnected tools where no one could see the full picture. This gap was arguably manageable when attacks moved at human speed. At the speed that AI will soon enable, it has become a critical liability. Siloed security environments operating at human speed cannot keep pace with threats that move in minutes. Consolidating that infrastructure is now a prerequisite for an effective defense. Fighting AI with AI The same AI capabilities that are amplifying attacker speed and scale can be deployed in defense, but only within the right architecture. As Arora argues, models alone cannot provide sufficient enterprise security without an underlying infrastructure that includes sensors across endpoints, networks, identity, cloud, and browsers, along with AI-enabled data lakes that give models the context they need. Agentic defenses operationalize such an architecture. Rather than waiting for a human analyst to correlate signals across multiple tools, autonomous systems investigate alerts at machine speed, correlate data across the entire environment, and rapidly execute containment. Revoking a compromised credential, isolating an affected workload, or blocking lateral movement no longer depends on an analyst’s being available at the right moment. What this looks like in practice Palo Alto Networks has built this architecture into Cortex XSIAM, its AI-driven security operations platform. In a 15-minute keynote , Lee Klarich, chief product and technology officer, describes how Cortex ingests raw data from any source; applies 2,900 machine learning models to detect attack behaviors; including previously unseen ones; and executes 1.9 billion automated actions per year through more than 1,300 built-in playbooks. The result for organizations using the platform has been roughly a quarter of the previous manual work and mean time to remediation measured in minutes rather than days. With AI agents’ now being embedded into the automation engine, Klarich expects that performance to improve further still. The window to act is open. Security teams that consolidate their infrastructure, invest in AI-driven detection, and build agentic response capability now will be far better positioned than those that wait for the threat landscape to force their hand. See what’s possible .
Score: 41🌐 MovesJun 4, 2026https://www.cio.com/article/4181320/fighting-ai-with-ai-the-time-to-act-is-now.html - Cambridge AI tool promises new era of Earth Intelligence from space
Cambridge AI tool promises new era of Earth Intelligence from space cst.cam.ac.uk
Score: 41🌐 MovesJun 4, 2026https://www.cst.cam.ac.uk/research/eeg/cambridge-ai-tool-promises-new-era-earth-intelligence-space - Kaggle is making AI benchmark creation effortless
A sketch of a computer with a line on either side leading to different sketches of browser screens.
Score: 41🌐 MovesJun 4, 2026https://blog.google/innovation-and-ai/technology/developers-tools/build-kaggle--benchmarks-locally/ - AI isn’t just a compute race. It’s a data race – and storage will decide the winners, WD shares
[The content of this article has been produced by our advertising partner.] For years, the industry has framed AI as a compute problem: more GPUs, denser clusters, faster interconnects. That framing is now incomplete. At scale, AI is fundamentally a data system. Every training run, every inference, every agent interaction generates new data that must be stored, retained, and revisited. Compute is elastic and cyclical. It can be repurposed. Data is cumulative and permanent, making storage demands...
- Find it, fix it: Seattle startup Emphere raises $2.1M to automate software vulnerability patching
Emphere, a Seattle startup that emerged from the AI2 Incubator, raised $2.1 million in pre-seed funding to automatically fix vulnerabilities found in popular open-source distributions, catering to software companies that sell to regulated industries. Read More
- Using Scikit-LLM with Open-Source LLMs
This article will teach you how to perform a language task like text classification by integrating locally hosted large language models (LLMs) of manageable size, like Mistral, Gemma, and Llama 3: all for free thanks to Ollama — a free repository for local LLMs — and the Scikit-LLM Python library.
Score: 40🌐 MovesJun 4, 2026https://machinelearningmastery.com/using-scikit-llm-with-open-source-llms/ - These LLMs are the best at resisting Russian propaganda
Estonian government benchmark shows how dozens of models combat Russia's "strategic narratives."
Score: 40🌐 MovesJun 4, 2026https://arstechnica.com/ai/2026/06/these-llms-are-the-best-at-resisting-russian-propaganda/ - The 2026 IPO Wave Will Dwarf 1999 in Real Dollars. But 1 Metric Will Prove If AI Is a Bubble
SpaceX, Anthropic and OpenAI are all set to rattle public markets.
Score: 40🌐 MovesJun 4, 2026https://www.inc.com/phil-rosen/spacex-ipo-anthropic-openai-nvda-jensen-huang-stock-market/91355458 - Countries attracting Al talent
Countries attracting Al talent The National
Score: 40🌐 MovesJun 4, 2026https://www.thenationalnews.com/video/kBYJc8ah/countries-attracting-al-talent/ - How next-gen agricultural robots are helping farmers analyze crops – plant by plant
Robots equipped with computer vision can take in data about plant health and inform decision-making
- Walmart's Code Puppy was born from rage at AI's lock-in trap
Walmart's Code Puppy was born from rage at AI's lock-in trap Business Insider
Score: 40🌐 MovesJun 4, 2026https://www.businessinsider.com/walmart-code-puppy-ai-anthropic-claude-code-openai-codex-2026-6 - Upwind, the next-gen wiz, now secures every corner of the AI stack
Upwind just dropped a new product announcement today, and it signals a fundamental shift in how the company thinks about AI risk. CEO Amiram Shachar published a lengthy post this morning laying out Upwind’s “Security for AI” thesis, the companion piece to their earlier push around agentic AI capabilities. The core argument is simple: AI security isn’t a standalone product […] This story continues at The Next Web
Score: 39🌐 MovesJun 4, 2026https://thenextweb.com/news/upwind-the-next-gen-wiz-now-secures-every-corner-of-the-ai-stack - Synology brings private AI and enterprise-grade management to the next generation of DSM
Synology today announced the roadmap for the next generation of DiskStation Manager (DSM), expanding it from a storage operating system into an intelligent data platform for governed, on-premises AI workflows that transforms data and system metrics into actionable insights without the privacy risks or costs associated with cloud providers. “Enterprise AI adoption is no longer […] The post Synology brings private AI and enterprise-grade management to the next generation of DSM appeared first on CXOToday.com .
- IRCTC disables 3cr user IDs, flags 6cr; scales up AI-based kitchen monitoring
IRCTC has taken significant steps to combat ticket booking fraud by deactivating over three crore suspicious user IDs and placing another six crore under verification. To enhance food safety, the railway's catering arm has expanded its AI-based kitchen monitoring system, utilizing over 2,300 cameras to detect hygiene violations.
- Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model
In Part 1 of this series, we introduced Chronos-2, a time-series foundation model. We got our hands dirty by walking through a real case study and saw what Chronos-2 can do straight out of the box, with no training. But as we noted at the end of Part 1, zero-shot isn’t always enough. In cases […] The post Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model appeared first on Towards Data Science .
Score: 39🌐 MovesJun 4, 2026https://towardsdatascience.com/five-ways-to-fine-tune-chronos-2-the-time-series-foundation-model/ - Unified Discrete Diffusion for Categorical Data
Unified Discrete Diffusion for Categorical Data Carnegie Mellon University Computer Science Department
- Designing the hf CLI as an agent-optimized way to work with the Hub
Designing the hf CLI as an agent-optimized way to work with the Hub
- Antler backs AI robotics recycling startup Oscorp Energy in $1.3 million pre-Seed
Oscorp Energy is building AI-powered vision and robotic systems to detect and remove lithium-ion batteries from waste streams.
- Why one person + AI is becoming a serious workforce model
For years, the conversation around AI has revolved around one question: Will AI replace jobs? It is understandable. Change creates uncertainty, and few technologies have moved as quickly or visibly as artificial intelligence. But this may be the wrong question. The more interesting one is: how does AI change the structure of work itself? From […] The post Why one person + AI is becoming a serious workforce model appeared first on e27 .
Score: 38🌐 MovesJun 4, 2026https://e27.co/why-one-person-ai-is-becoming-a-serious-workforce-model-20260601/ - BHASHINI CEO Amitabh Nag Outlines Voice-First Vision After DPIIT Multilingual AI Pact
The new partnership aims to make government services, startup programmes, investment opportunities and industrial resources accessible across India's linguistic landscape through AI-powered language technologies. The post BHASHINI CEO Amitabh Nag Outlines Voice-First Vision After DPIIT Multilingual AI Pact appeared first on Express Computer .
- Humanoid robots won’t be the future: purpose-built robots will
Humanoid robots grab headlines, but specialized machines will power real factory automation.
Score: 38🌐 MovesJun 4, 2026https://www.techradar.com/pro/humanoid-robots-wont-be-the-future-purpose-built-robots-will - Domain AI Is Taking Over: How Tech Providers Capture Value Beyond Generic Models
Domain AI Is Taking Over: How Tech Providers Capture Value Beyond Generic Models Gartner
- From Shanghai to the World: UCloud and China's AI-Driven Cloud Export
On June 2, Shanghai-headquartered UCloud cut the ribbon on a new cloud computing node in Uzbekistan, pushing its global network to 36 nodes across 28 regions. The move might once have been a routine infrastructure announcement. In 2025, it is some...
- How The Hindware-Google Verdict Will Shape Advertising In The AI Age
In a landmark judgement, the Delhi High Court (HC) imposed a fine of ₹30 Lakh on search and AI giant…
Score: 38🌐 MovesJun 4, 2026https://inc42.com/features/how-the-hindware-google-verdict-will-shape-advertising-in-the-ai-age/ - CrowdStrike reports higher operating expenses as AI investments gain pace
CrowdStrike expects 2027 revenue to be between $5.91 billion and $5.96 billion, compared with its prior expectations of $5.87 billion to $5.93 billion.
- Google is pledging to replenish more water than its data centers consume by 2030
The company announced $17 million in new stewardship projects and a $500 million commitment to public water infrastructure
- Scarcity is driving AI innovation outside Silicon Valley
New AI infrastructure is emerging in India, Brazil, the UAE, and Africa, where local stacks are designed to get around compute scarcity.
- Walmart says token limit on employee AI tool is about cutting down on duplicative vibe coding
Walmart says token limit on employee AI tool is about cutting down on duplicative vibe coding Business Insider
Score: 38🌐 MovesJun 4, 2026https://www.businessinsider.com/walmart-ai-coding-tool-limit-duplicative-requests-2026-6 - How some data center operators are tackling their water use problems
Hyperscalers have come under scrutiny for their impact on water quality and availability.
Score: 38🌐 MovesJun 4, 2026https://arstechnica.com/ai/2026/06/how-data-center-operators-are-tackling-their-water-use-problems/ - Nvidia CEO mounts charm push in South Korea with TV talk show and baseball appearances
Nvidia CEO mounts charm push in South Korea with TV talk show and baseball appearances The Japan Times
- The channel is heading straight for an AI infrastructure wall
The channel is heading straight for an AI infrastructure wall IT Pro
- JioHotstar bets on AI for streaming, production and audience engagement
JioHotstar bets on AI for streaming, production and audience engagement Techcircle
- Build an agent that writes its own tools
The third post from Build Club, our weekly live build session. The companion GitHub repo can be found here, docs here and you can try the agent live in the hosted playground. Your agent framework is not the bottleneck. The bottleneck is that every new external system your agent needs to talk to requires another... The post Build an agent that writes its own tools appeared first on DataRobot .
- Finetuning CLIP to Reason about Pairwise Differences
Finetuning CLIP to Reason about Pairwise Differences Carnegie Mellon University Computer Science Department
- Is Microsoft 365 Premium worth it? What $20 a month gets you - and how it compares to ChatGPT Plus
Microsoft is offering a 50% discount to 365 subscribers who want more AI Copilot features. Here's what's included.
Score: 37🌐 MovesJun 4, 2026https://www.zdnet.com/article/is-microsoft-365-premium-worth-it-vs-chatgpt-plus/ - Awinic Positions Itself as the 'Audio King' of AI Glasses with New Chip Solution
Awinic Positions Itself as the 'Audio King' of AI Glasses with New Chip Solution Shanghai-listed Awinic Technology (688798.SH) has unveiled its latest audio solution purpose-built for AI glasses, the AW88188, at the Songshan Lake IC Forum. The move positions the 18-year-old semiconductor company — which has shipped over 36 billion chips to date — as a dominant player in the rapidly emerging AI wearable audio market.
Score: 37🌐 MovesJun 4, 2026https://pandaily.com/awinic-positions-itself-as-the-audio-king-of-ai-glasses-with-jun2026 - Q2 2026 Building, Backing, and Buying AI
Q2 2026 Building, Backing, and Buying AI PitchBook
Score: 37🌐 MovesJun 4, 2026https://pitchbook.com/news/reports/q2-2026-building-backing-and-buying-ai - Can free AI for everyone be sustained?
Kim Won-bae The author is an editorial writer at the JoongAng Ilbo. A notable exchange during a Cabinet meeting last month highlighted a growing debate over the government’s proposed “AI for Everyone” initiative. The project is part of President Lee Jae Myung’s campaign pledge to build an “AI basic society,” aimed at guaranteeing a minimum level of access to AI for all citizens. President Lee Jae Myung, right, speaks with Deputy Prime Minister and Science and ICT Minister Bae Kyung-hoon during a meeting with Presidential Science Scholarship recipients and members of Korea’s International Youth Olympiad teams, titled “A Conversation with Future Scientists,” at the Blue House on Feb. 5. [JOINT PRESS CORPS] When Lee asked about the program’s progress, Deputy Prime Minister and Science and ICT Minister Bae Kyung-hoon replied that preparations were underway, with a target launch in November or December. Bae explained that the service would be provided free of charge through 2028, after which private companies would lead its operation. Lee offered a different perspective. If users are required to pay after becoming accustomed to free access, many may stop using the program, he said. While acknowledging that not everyone needs the same level of service, Lee suggested guaranteeing a minimum level of AI access for all citizens while charging for upgraded features. He also reminded Bae, a former business executive, that efficiency and fairness must be balanced. The exchange revealed two distinct approaches. Bae’s comments reflected an industrial policy view in which the government creates initial demand before allowing the private sector to lead. Lee emphasized access to AI as a basic social right. As the technology becomes increasingly important in daily life, concerns about access are understandable. But the government’s plans may be moving too quickly. At a press briefing last month, Bae announced a goal of providing every citizen with an AI agent. Unlike chatbots, which simply answer questions, AI agents are designed to perform tasks on behalf of users. The government also plans to offer specialized services for older adults and socially vulnerable groups. Related Article Korea should not shy away from the challenge of sovereign AI Naver to become 'integrated AI agent' as CEO reveals company's long-term plan at conference Sovereign cloud and its dilemma: Data and AI in a time of crisis Google reportedly developing AI agent ahead of annual conference The challenge is that expanding free access does not automatically create a sustainable service model. During the internet era, user data became the foundation of targeted advertising, and this allowed technology companies to generate substantial revenue. Generative AI operates differently. User interactions may help improve services, but they do not automatically create enough revenue to offset the significant costs of computation. AI agents are even more expensive because they must understand requests, gather information and repeatedly carry out multiple tasks. Questions of quality and accountability also deserve careful consideration. For AI for Everyone to become a nationwide program, it must first provide quality responses that users who are familiar with commercial AI services find acceptable. But the program’s performance and operational stability have not been publicly verified. Promising advanced agent functions before these basics are proven may be premature. The risks increase if AI agents become linked to public services. Incorrect information or inaccurate guidance could cause administrative problems. The more strongly the government promotes the program as a free national service, the more likely citizens are to regard it as a public service. If errors occur, responsibility will inevitably fall on the government. Another unresolved issue is who determines the scope of free services. Private AI providers normally decide where to draw the line between free and paid features. Under the government’s model, however, public funds would support free access for all citizens. Deputy Prime Minister and Science and ICT Minister Bae Kyung-hoon delivers a keynote speech titled “People Who Change AI” during the AI College Vision Declaration Ceremony at KAIST in Yuseong District, Daejeon, on June 1. [NEWS1] If participating companies limit usage because of rising costs, public dissatisfaction is likely to be directed at the government. On the other hand, if the government demands broader functionality to satisfy users, it risks interfering with private-sector pricing and service design. A better approach would be to foster competition among multiple AI providers while limiting government intervention afterward. Extending de facto free-service requirements beyond 2028 could distort the market and weaken innovation. The program could become dependent on government subsidies rather than competition. If expanding AI access is the goal, alternative approaches deserve consideration. Rather than directly supporting a universal AI service, the government could help citizens gain access to AI tools already on the market. Specialized support could be directed toward vulnerable groups and public-service applications while allowing ordinary users to choose among competing private services. One of the principles that helped Korea become an information technology powerhouse was to support the market without unnecessarily intervening in it. That principle remains relevant in the AI era. For AI for Everyone to succeed, building a sustainable structure matters more than promising free access. This article was originally written in Korean and translated by a bilingual reporter with the help of generative AI tools. It was then edited by a native English-speaking editor. All AI-assisted translations are reviewed and refined by our newsroom.
- Profitable French scale-up Innovorder raises €20 million to accelerate AI-first restaurant digitalisation
Innovorder, a Paris-based scale-up specialising in the digitalisation of the restaurant industry, has raised €20 million in a funding round to accelerate its “AI-first” transformation. The round was led by UL Invest, the family office of tech entrepreneur Laurent Useldinger. This deal combines a capital increase and the buyout of shares from historical investors. Evolem, […] The post Profitable French scale-up Innovorder raises €20 million to accelerate AI-first restaurant digitalisation appeared first on EU-Startups .
- Building Better Activation Oracles
Work done for our MATS 10.0 Sprint project - mentored by Neel Nanda and Adam Karvonen Huggingface , Github TL;DR: We have improved the original Activation Oracle (AO) training regime by training on on-policy rollouts, improving the conversational dataset, feeding more layers (following the approach by Niclas Luick ) and making a small change to the injection formula. We also open source our evals, which we believe are currently the most comprehensive evaluation of AO quality called AObench. The capability improvements are marginal, but quality of life improvements are quite substantial. If you want to play around with the new AOs, we recommend you use this one , if you want to play with our new Activation Oracles live, we will host them for a week on ao.celeste.computer . Alternatively, you can self-host our web interface . Activation Oracles (AOs) by Karvonen et al. are fine-tuned LLMs that can receive the original target LLM’s activations as input and answer natural language questions about them. However, they are plagued by various issues , which limit their usefulness as an off-the-shelf tool for interpretability research. For our MATS Sprint, we set out to work on these issues. Issues with current Activation Oracles In Current activation oracles are hard to use , Arya Jakkli demonstrates scenarios where AOs are hard to work with. We focus on addressing two of the issues pointed out: Hallucinations : The AO will output false information. Vagueness: The AO output will be generic (therefore unfalsifiable) and will not answer the user’s question. In addition, they are difficult to evaluate because of the problem of text inversion : the model infers the surrounding text and answers based on that, just as any black box oracle (i.e. a method that only receives text) could, rather than extracting specific info from activations. As part of our evaluations, we focus on some of his specific tasks, you can find details on our evaluations in the Appendix. Our approaches to improving AO training A better conversational finetune than LatentQA To make the Activation Oracle be able to answer natural language questions, you need a dataset consisting of questions and answers about activations. The original paper used LatentQA to this end. However, we found that this dataset was of low quality, likely incentivizing vagueness: The model is given a complicated prompt, and then a specific question is asked about this prompt. We think the answers to the questions LatentQA poses are often not easily retrievable from activations, which makes it a difficult task for the AO, not incentivizing much beyond text inversion, and may even directly incentivise hallucinations/guessing if the relevant info is not present. The questions are not about on-policy data, but about specifics of a user prompt: this does not target the model’s internal reasoning. It was generated by o1, a now outdated model. Our solution: Questions about unarticulated completions We constructed a new conversational dataset that attempts to address all of these concerns. Because we don’t want the questions learned to be trivially answerable from adjacent tokens (text inversion), we construct QA pairs as follows: To construct this task, a separate LLM (Sonnet 4.6) is given the target model’s chain-of-thought (CoT), and is instructed to split the chain of thought into a prefix and suffix, and to write a question about the suffix. It is instructed to do this in a way such that the question is hard to answer purely from the text of the prefix (i.e. to avoid text inversion ), but plausibly answerable from the prefix’ activations ( solvability ). You can explore our dataset here . We ablate the effect of this task by replacing only LatentQA in Adam's recipe (leaving everything else the same) and notice a significant uplift, across the board on our AObench evaluations. We find that the responses are more specific and the resulting model is less vague, and responds better to instructions. Layer choice/feeding multiple layers to the AO As Niclas Luick demonstrated , feeding multiple layers at a time during training and inference increases Activation Oracle capability. Adam originally fed activations randomly selected from either layer 25%, 50% or 75% of total model depth. Since most features live around the 55-80% layer ranges, we suspected a layer sweep could be important. Indeed, we find that AO performance peaks at layer 22 (62%). Feeding 5 contiguous layers from layer 21-25 causes further uplift. Interestingly, the largest uplift is on model diffing tasks. We’d like to point out that training a multi-layer Activation Oracle can cause an increase in training time due to longer context, and that most gains can be had by simply choosing a layer at ~65% depth. Training on on-policy data To train Activation Oracles, we need scalable unsupervised training tasks. A common way to achieve this is to predict past and/or future tokens from the activations, known as past or future-lens. This requires some data to source activations from, from which then to predict tokens. Adam’s original paper only used pre-training data (fineweb). However, this has a problem: to predict future tokens in pre-training data, you don't necessarily need to know much about what the model is thinking, just what the prior text is. The model’s activations may contain useful information, so the training signal is not zero, but it’s considerably harder for the AO. We think that the on-policy data we use (i.e., generations from the model we are trying to interpret) are better training data because it is both a more solvable task by virtue of targeting what the model is actually representing in its activations. Further, we will in practice mostly care about using the AO on a model in an on-policy setting, e.g. for studying agent traces. While the above explanation is plausible, we only notice minor uplift in evaluations. Steering strength Natural Language Autoencoders (NLAs) inject their activations by replacing the token embedding entirely, and using a fixed scalar. We use additive, norm-matched injection after the second transformer layer like in Adam’s paper. We do not have a formal ablation, but on Qwen3-8B, every run that did NLA-style injection performed significantly worse than Adam’s formula. NLAs sweep their injection strength and claim that this is a quite sensitive hyperparameter. We did the same starting from Adam’s formula, and found that increasing the injection strength marginally increases performance. This difference may look small, and indeed it is, but in hallucinations it is considerable (79% -> 85%), which is particularly important, so we do recommend using this. Our hypothesis why Adam-style injection does better than NLAs is that the first residual stream layer has a very small cosine similarity to previous layers, a property unique to the first layer. After the first layer, cosine similarities remain pretty similar layer to layer. Because of this, it’s pretty sensible that injecting after the second layer, when the residual stream lies in the “correct basis” would work better. The reason a stronger injection strength might do better is that language models have a strong prior to weight tokens sort of equally, and that it’s rare that one token is load bearing for the entire explanation. Language model priors can be hard to overcome, so manually enforcing a stronger norm for the activation can help overcome this. We are slightly less certain about this intervention than others, but this is a hyperparam that matters, and 2.0x is a better default Summary of AOBench evaluations The evaluations we constructed aim to measure what an ideal Activation Oracle should be able to do, which we call AObench . This benchmark is a work in progress, but we recommend you start from there when making a new activation oracle. It evaluates the main frustrations in Arya’s blogpost and some of the model organisms from the original paper . We find that the above changes result in an AO with marginally improved capability, marginally reduced hallucination rates and significantly reduced vagueness which generally scores better on the majority of our benchmarks. The full evaluation results can be found in the Appendix and exact data/prompts used can be found in our repository. We find a significant improvement in performance through our interventions: In addition, our new AOs hallucinate less and are significantly less vague Hallucination evaluates whether the AO invents specific but unsupported details about the model’s reasoning. Vagueness evaluates whether the AO will commit to a precise answer instead of something that is hard to falsify. The AO’s output is judged with respect to these criteria. Note that we have a bit of FUD around our evals. In particular, on-policy future/pastlens ablation is not entirely clean, because the new conversational data also uses on-policy data. More information on AOBench can be found in the Appendix. Outlook After spending several weeks working on AOs, we believe they are a useful interpretability technique for specific use cases: they are best used for complex open-ended questions about activations. AOs/NLAs might be particularly useful to interpret latent reasoning models, and when there is complex computation already happening in a single forward pass. However, even with our improvements, clear limitations remain. First, their outputs may still be hallucinated, though this generally improves with the amount of activations supplied, and uncertainty can be estimated by resampling (see Appendix). Second, in many settings (but not all) it is possible to just read the chain-of-thought directly and arrive at the same insight as the AO. Still, we think there may be significant room for improvement by scaling up the conversational data we used, both in amount and kind. A second route is to include more narrow tasks in a “post-training” stage, though we did not find improvements at the current level of capability of our AO. Another exciting path forward is to come up with more evaluations that target something an ideal AO could plausibly do, while being robust to text inversion concerns. If such tasks are scalable, they can be used for training as well. We think of AOs as part of the family of scalable, end-to-end interpretability . Very recently, Natural Language Autoencoders (NLAs) have been proposed by Anthropic as an exciting technique to verbalize activations. In contrast to AOs, NLAs are unsupervised, auto-encoding activations-to-activations across a natural-language bottleneck, which seems like a more faithful way to convert activations to natural language. The NLA paper trains their AV (the encoder part, act -> text) on LatentQA to turn it into an AO. Due to aforementioned issues, we believe this is hampering the AVs performance. We suspect our other 2 interventions, extending NLAs to use multiple layers, and training NLAs on on-policy data, are also applicable here. On the other hand, one might use NLAs as a source of ground truth to augment AO training. We remain excited to pursue further research in the field. Further advice for training Activation Oracles These were not the only things we tried during our sprint. Our initial impression was that we could improve AOs by training on narrow tasks. Specifically, we singled out the tasks from Riya Tiagi and Daria Ivanova’s Test your best methods on our hard CoT interp tasks (datasets can be found here ), but did not have good success, probably due to limited training data. We found that we could quite consistently match the performance of linear probes when training narrowly, but never significantly exceeded it. Some advice if you are interested in further improving AOs: Make good evals that you think a good AO should be able to do ( solvability ) and is hard for a black box monitor ( text inversion ; you can explicitly check this). Then try to find training tasks that would make the oracle better at this. You should generally aim to at least match the performance of probes. A good training task causes broad uplift, and is scalable. Loss graphs going down does not always translate to capability: in particular, future/pastlens demonstrates a very strong scaling law, but there is a risk of just fitting surface statistics that will not translate to any meaningful uplift in evals. We observed the majority of uplift on the evals after 10% of training (~200K tokens), you generally don’t need to train to convergence to know if your task is causing uplift. Be careful when changing learning rate, LoRA rank or LoRA alpha, as they can destabilize training. We experimented with scheduling training tasks one by one (unshuffled) to locate uplift, but encountered catastrophic forgetting on tasks not included in the group. Therefore, we recommend you have at least 10% of data at every stage come from other tasks. An interesting way forward would be to have a broad "pre-training stage”, say of verbatim and conversational data, and then a shorter “post-train” on specialized tasks. Read your datasets, your oracle outputs and your evaluation traces! Language models are not very good at generating/discriminating good AO questions/responses, this is a good way to test if your pipeline is doing what you want it to be doing. Some future work we are excited about: Increasing corpus diversity on the unsupervised learning task. Feeding even more, or all layers/positions. Training directly on activations from finetunes, to optimize for model diffing tasks. Appendix Practical notes on evaluating AOs While working on our evaluations, we discovered several practical lessons that significantly affect measured AO performance. Use AUC, not accuracy, for binary classification. In his blog post , Arya found that AOs performed poorly on tasks like sycophancy detection and missing information identification. When we investigated, we found that part of the problem was that Qwen had a biased default answer, such as always answering "No" when asked "Is this response sycophantic?" This makes fixed-threshold accuracy look near chance, but the results are much stronger when instead using the difference between the Yes and No logits: on a sycophancy detection task using activations from the chain of thought, the Original AO scored 0.50 accuracy but 0.83 ROC AUC. In our experience, Qwen AOs seem to particularly suffer from this bias towards always answering "No". Additionally, AUC makes evaluation far less sensitive to prompt wording. With accuracy, asking "Is this sycophantic?" vs "Is this response somewhat sycophantic?" can easily swing results by 20 percentage points, because each phrasing shifts the model's Yes/No calibration differently. With AUC, these prompt variations produce relatively stable results. Sweep context window size. AOs receive activations from some window of the target model's generation, and the size of this window is a significant variable. In a Qwen3-8B backtracking evaluation, the Original AO scored only 1.26/5 mean correctness when given activations from the final token alone. But performance rose steadily with more context: 1.54 at 5 tokens, 1.94 at 20 tokens, and 2.10 at 50 tokens. At 20 tokens, the AO roughly matched a baseline of simply asking Qwen3-8B the same question with full text context. At 50 tokens, the AO exceeded this baseline (Figure N). For complex reasoning questions like "why is the model about to backtrack?", the relevant information appears to be spread across dozens or hundreds of tokens of internal computation, not concentrated in a single activation. These are also the questions where we believe AOs are most useful relative to alternatives, as answering a question like "what is the model uncertain about" using SAEs across dozens of tokens could be quite challenging. Consensus sampling can mitigate hallucination. Open-ended AO answers can confidently hallucinate incorrect answers. A simple mitigation is to sample multiple answers (we use 10 samples at temperature 1) and only trust answers where the samples agree. On the taboo secret-word extraction task, unfiltered single-token accuracy was 46.6%. Requiring consensus >= 0.8 retained 19.4% of examples at 94.3% precision, with a clean trade-off between precision and recall. This is a simple inference time strategy which can significantly mitigate hallucination. Experiments using post-training We experimented heavily with DPO, inspired by Introspection Adapters and upcoming similar work, to increase performance on the following metrics: Instruction following Hallucination rate Vagueness Results were hard to stabilize, and we frequently ran into mode-collapse. We found it hard to make the judge correctly label “good” or “bad” activation oracle outputs, even with a rubric. We also attempted using Dr. GRPO with the following rubric: Passing the “swap test”? Is the answer specific and falsifiable? Does the response add any meaningful insight? Is it not obviously wrong? Is the oracle following instructions? Surprisingly, none of these lead to significant gains in performance. Using RL with an LLM as a judge for Activation Oracles seems to be a bit doomed. Problems remain in getting an LLM judge to understand which AO outputs are desirable. Envs with verifiable statements about activations are hard to come by, but we remain excited about this as a possibility to increase faithfulness and achieve better performance than LLM prompted explanations about the context Other differences compared to Karvonen et al. The amount of activations and which activations we feed is 20% chance to feed 1 activation, 20% chance to feed 2, 20% chance to feed 3 and a 50% a random subset of the positions, since our input is long. This is notably different from Adam’s "sometimes contiguous n, sometimes 1 activation”. We ablated for this, and did not find a significant change in eval performance. AOBench details Higher is better for all of these scores Activation sensitivity is a benchmark that checks if outputs are different depending on context, if activations are fed from the same tokens Hallucination is a simple benchmark that checks if the AO is obviously confabulating, based on the context the activation comes from Missing info is the task from Jakkli et al (simplebench type questions where the model is missing information and doesn't know it, these may be impossible for AOs to pull out though) Sycophancy is a benchmark based on the scruples dataset , where we check if the model can detect if someone is sycophantic Taboo is the model organism from Bartosz Cywiński PersonaQA is questions inquiring about the system prompt that is used, to approximate a steering vector MMLU prediction is predicting the answer the model wiould give to an MMLU question if it were interrupted right now and forced to answer Number prediction is predicting the numbers involved in an arithmetic equation, from activations of said equation Vagueness are how vague the responses are (claude has terrible vibes for this, this might not be very useful) Domain confusion benchmarks how often the model is completely wrong about the domain the activations is from (eg. confusing ice cubes with hen houses) Backtracking is predicting if the model is about to backtrack We consider 3 narrow tasks Arya identified to illustrate these issues: Identifying why a model is about to backtrack (generally outputs are extremely vague ) Identifying the number the model is about to produce in a math calculation (keeps hallucinating the same couple numbers) Identify current topic domain of the text on unrelated activations ( hallucinates irrelevant topics like hens) Identify Persona and Detect Taboo are taken from the original AO paper. Our evaluation tasks can be found here . We reiterate that evaluating AOs is hard, mainly due to controlling for text inversion, and that getting a judge to classify vagueness requires cautious inspection. We recommend qualitative analysis of your oracle. Discuss
Score: 37🌐 MovesJun 4, 2026https://www.lesswrong.com/posts/heXwuDRfbQQgB5JLP/building-better-activation-oracles - The Sequence Opinion #872: The Cake Is a Battlefield: Who Really Controls the AI Stack
Full stacks vs layer specialists. That's the AI race.
- Fault Tolerance in LangGraph: Retries, Timeouts, and Error Handlers
Implementing fault tolerance in LangGraph
- AI video generation startup TrueFan AI raises $10M Series A funding
AI video generation startup TrueFan AI raises $10M Series A funding YourStory.com
Score: 36💰 MoneyJun 4, 2026https://yourstory.com/2026/06/ai-video-generation-startup-truefan-ai-raises-series-a-funding - How to Navigate the Shift from Prompt-Based Tools to Workflow-Driven AI
Abacus.AI and the case for unified AI workflows The post How to Navigate the Shift from Prompt-Based Tools to Workflow-Driven AI appeared first on Towards Data Science .
Score: 36🌐 MovesJun 4, 2026https://towardsdatascience.com/how-to-navigate-the-shift-from-prompt-based-tools-to-workflow-driven-ai/