AI News Archive: June 2, 2026 — Part 10
Sourced from 500+ daily AI sources, scored by relevance.
- Can A.I. Produce Writing That We Actually Want to Read?
I recently created a simple test, which convinced me that the answer is no.
Score: 22🌐 MovesJun 2, 2026https://www.newyorker.com/news/fault-lines/can-ai-produce-writing-that-we-actually-want-to-read - What's It Like to Work at WeRide.ai 2026?
What's It Like to Work at WeRide.ai 2026? Built In
- The new MSI Claw 8 EX AI+ is chasing console-quality gaming on the go
MSI announced the Claw 8 EX AI+ at Computex 2026. It's the world's first handheld powered by Intel's Arc G3 Extreme chip, built specifically for handheld gaming.
Score: 22🌐 MovesJun 2, 2026https://www.digitaltrends.com/gaming/the-new-msi-claw-8-ex-ai-is-chasing-console-quality-gaming-on-the-go/ - AI Has Changed the Cybersecurity Threat Landscape for SMBs, Warns Eclipse Networks
AI Has Changed the Cybersecurity Threat Landscape for SMBs, Warns Eclipse Networks azcentral.com and The Arizona Republic
- U.S. big tech holds 85% of Canadian cloud market, report says ahead of AI strategy
U.S. big tech holds 85% of Canadian cloud market, report says ahead of AI strategy Toronto Star
- AnyMind Group to open AI lab in Hangzhou
The lab will develop autonomous AI agents, speed up development of the company’s AnyAI suite, and improve AnyMind’s internal operations with AI.
Score: 22🌐 MovesJun 2, 2026https://www.techinasia.com/anymind-acquires-japanese-beauty-creator-studio-nadesiko - Mathematicians say ‘don't believe hype’ on AI capabilities
Mathematicians say ‘don't believe hype’ on AI capabilities The Straits Times
Score: 22🌐 MovesJun 2, 2026https://www.straitstimes.com/world/europe/mathematicians-say-dont-believe-hype-on-ai-capabilities?ref=latest - The AI-First Supply Chain
The AI-First Supply Chain Boston Consulting Group
Score: 22🌐 MovesJun 2, 2026https://www.bcg.com/ja-jp/publications/2026/how-ai-agents-are-transforming-supply-chains - Building the Future of Open AI: Insights from the Open Source LLM Builder Summit
Researchers discuss open large language models at the Open Source LLM Builder Summit.
- Great to be back at Build today. For us, it is not about any one piece of technology or even the platform. It is about how we can build a frontier intelligence ecosystem together. Sharing some of our big announcements today …
The post Great to be back at Build today. For us, it is not about any one piece of technology or even the platform. It is about how we can build a frontier intelligence ecosystem together. Sharing some of our big announcements today … appeared first on Source .
- This Silicon Valley company created 13 new job types because of AI: What are they? Why the firm is still hiring?
While Meta and Coinbase cut jobs in the name of AI, Box has created 13 entirely new roles — from AI architects to model evaluators — and expects to grow from 2,900 to over 3,000 employees by early next year.
- BambooHR Research: AI Productivity Gains Are Coming at a Hidden Cost — 'Dignity Debt' Is Building Across Today's Workforce
BambooHR Research: AI Productivity Gains Are Coming at a Hidden Cost — 'Dignity Debt' Is Building Across Today's Workforce markets.businessinsider.com
- Alchip Leverages AWS to Enable Cloud-Based Silicon Execution for AI and Data Center Platform
Alchip Leverages AWS to Enable Cloud-Based Silicon Execution for AI and Data Center Platform markets.businessinsider.com
- Tribeca Lets AI Into Its Official Lineup—One To Watch, Not Cheer
Tribeca Festival 2026 accepted a fully AI-generated feature into its official lineup. Dreams of Violets is a milestone worth watching closely.
- Rehumanizing global health care with agentic AI
The global health care sector is under increasing strain. Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for aging populations. Gaps in provision are already taking a toll, with fragmented access to care and high rates of stress and burnout among staff. And it’s getting worse.…
Score: 22🌐 MovesJun 2, 2026https://www.technologyreview.com/2026/06/02/1137827/rehumanizing-global-health-care-with-agentic-ai/ - From arias to algorithms: Why the Royal Opera House is embracing AI – even as some musicians feel ‘miserable’ about it
The rise of machine learning has led many artists and writers to feel that their life's work could be about to be destroyed. But at one of the world's most traditional cultural institutions, practitioners are looking at new and controversial ways of embracing AI, writes Andrew Griffin
Score: 22🌐 MovesJun 2, 2026https://www.independent.co.uk/tech/opera-ai-artificial-intelligence-royal-house-b2987424.html - Elon Musk’s Grok destroyed the world after just four days in an AI simulation
The experiment saw other AI chatbots, like Anthropic’s Claude, set up stable democracies
Score: 21🌐 MovesJun 2, 2026https://www.independent.co.uk/tech/grok-ai-elon-musk-safety-simulation-claude-b2987701.html - Tech giants’ market value gets a boost in May from AI demand, earnings optimism
World’s most valuable technology companies with the exception of Alphabet added billions of dollars in market value in May
Score: 21🌐 MovesJun 2, 2026https://www.theglobeandmail.com/investing/article-tech-giants-marketcap-ai-demand-earnings-optimism/ - What are AI PCs? These powerful computers are changing the tech landscape
EXPLAINER-What are AI PCs that Nvidia's Jensen Huang is betting on?
Score: 21🌐 MovesJun 2, 2026https://www.khaleejtimes.com/business/tech/ai-pcs-powerful-computers-changing-tech-landscape - Why Even Experts Don’t Know What to Do About AI Risk
AI Safety veteran Holden Karnofsky thinks there’s a 49% chance his actions are making things worse. [1] In 2025, Jesse Clifton even stepped down as the executive director of the Center on Long-Term risk because of similar reasons. Even top AI Safety strategists don’t know what will make things better, and what will make things worse. Why is it so hard to improve humanity’s odds? And what can you do to choose your actions? 1) Hidden Failure Lets You Fail Without Knowing It In AI Safety, impact is hard to measure, and thus lack of impact is often invisible. We call this "hidden failure". With hidden failure, projects fail to have a positive impact but the people doing the project don’t realise it. To understand where hidden failure comes from, it’s useful to understand reasons why projects fail in general. These reasons fall on a spectrum: Wrong problem: You're addressing something with little influence on x-risk. For example, researching AI fairness when the core risk is misalignment. Wrong solution: Your solution doesn't solve the problem, even when competently executed. E.g. interpretability research that's technically novel but isn’t actually helpful. Poor execution: Your problem-solution set could be impactful but you're not executing your solution competently enough. These factors can cause problems with both of the things you need to be impactful – adoption and effectiveness : A lack of adoption is relatively easy to spot if you want to [2] and can be remedied by entrepreneurial iteration . A lack of impact-effectiveness , [3] in contrast, can be particularly hard to spot, and that’s what we’re calling “hidden failure” in this post. With hidden failure, you might have users, citations, and funding (i.e. you have “adoption”), and still fail to have impact or even make things worse. Let us put that more bluntly: It’s literally possible for all your friends to think you’re successful and still be making things worse . Even within AI Safety. Even outside of frontier labs. 2) Why impact is harder than profit Creating a profitable startup is hard. Achieving impact in AI Safety is even harder for several reasons: There is no clear (market) signal to guide you. In other words, it’s hard to measure success. To have impact, you need both adoption (like a for-profit) [4] AND effectiveness (unlike a standard for-profit). [5] In many ways, impact doesn’t just pose different challenges than profit. It poses extra challenges. AI Safety is largely pre-paradigmatic. 3) The pre-paradigmatic challenge AI Safety doesn't have an established paradigm yet. [6] We can't predict with certainty what will be impactful. So why bother optimizing so deliberately? First, imperfect predictions are still valuable. For example, AI Safety experts can often point out specific reasons why a given project or idea is unlikely to be impactful. [7] Secondly, we argue the lack of a paradigm actually makes deliberate thinking about impact more important, not less . Without clear guides on what will lead to impact, you have to figure it out yourself. The tools described in the next posts help you optimize for impact under uncertainty. The goal isn't to get it perfectly right or to cripple yourself with analysis paralysis. [8] But we do think most people would benefit from spending more time thinking about their impact. So let's think strategically about impact. We’ll give a high-level overview of how to do that in an upcoming post, and we’ll help you measure your impact in another one. ^ We’re paraphrasing that from his appearance on the 80,000 hours podcast , around the 4:11:30 mark, where he said: “I think overall I would probably agree with you that the smaller you’re making the scope of where you’re hoping to have impact, the more reasonable it is to be like 60/40. But most people who go into AI are not going into it for that. Otherwise, if you want a small-scope, robustly positive impact, you should maybe work in a cause like farm animal welfare or global poverty. For the size of impact that tends to motivate people, I think it does get partially offset by this huge uncertainty about the sign. I tend to think it’s worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one’s head, the more it’s going to be like 50+ε%. I think AI safety is a great cause to work in. I’m excited to work in it. I think it’s high impact. I am doing my best to do things that I will be proud to have done and hope for the best. But I really do have to live with the possibility that my ultimate impact on the utilons or whatever is going to be negative.” ^ Though you shouldn’t underestimate your brain’s ability to make itself comfortable, satisfice, and employ motivated reasoning to have you accept mediocrity. ^ We’re using “impact-effectiveness” as a synonym for “effectiveness” as meant by the Impact Equation: Impact = Adoption x Effectiveness. ^ I will refer here and in other place to for-profits as regular companies not aimed at AI Safety. Of course, an AI Safety project can be set up as a for-profit too. ^ Although arguably, adoption is sometimes easier in a nonprofit setting. For example, the various fellowships have no trouble finding enough participants. In contrast, though, many products, tools, and blog posts do struggle to get adoption. ^ See e.g. https://ai-safety-atlas.com/chapters/03/07 or https://www.thecompendium.ai/ai-safety . Although instead of saying AI Safety is pre-paradigmatic, it’s more accurate to say that none of the existing paradigms is widely agreed to be sufficient for making the world safe, especially by higher level researchers in that paradigm. Aka, we have a bunch of paradigms, but they’re all pretty limited, and all-in-all we don’t even know yet what approaches will be required to make the world safe enough. ^ Though there are also areas where experts disagree. In such cases, it becomes even more important to assess the specific arguments they use. ^ See e.g. Holden Karnofsky on the 80000 hours podcast , where he says "When people ask me for career advice or whatever, the usual thing I’d say is: take a bunch of options that all seem competitive, and all seem like they could be the best thing, and that it’s not obvious which ones are better than others from an impact perspective. And from there I would say go with personal fit, go with the energy you feel to work on them." Discuss
Score: 21🌐 MovesJun 2, 2026https://www.lesswrong.com/posts/tRRkj249gdDL4mued/why-even-experts-don-t-know-what-to-do-about-ai-risk - The AI layoffs narrative obscures what’s really happening in Singapore
The AI layoffs narrative obscures what’s really happening in Singapore The Straits Times
- The AI pricing conundrum — it started as a nightmare, now it’s worse.
Enterprise IT leaders have always struggled with AI pricing, especially the need to pay for AI in a way that delivers ROI. But the typical IT exec may not be right person to decide how a company uses AI — and how it tries to deliver ROI — because so many line-of-business workers and partners are now experimenting with the technology on their own. And if IT leaders don’t have a grip on how they want to use AI over the next year or two, it’s impossible to figure out how they want to pay for it. They likely hate the current method of paying per token . And other options, such as SAP’s push to charge per AI task completed, aren’t any better. To use a sales analogy, IT doesn’t want to pay a lot of money for leads, because there’s no way to know if those leads will generate any revenue — let alone how much. What IT leaders want is the tech equivalent of paying commission, where they only pay when a lead converts into a paying customer. And even then, they only pay a percentage of the final sale. That guarantees ROI for the enterprise. The problem: no AI vendor would ever go for it because that approach puts too much risk on them. Finding a pricing model that works for both enterprise IT and AI vendors is all but impossible as long as IT is trying to deliver ROI. Irfan Khan , president of SAP Data & Analytics, said the problem is challenging for both sides. “Everyone is scrambling to justify their investments,” and “the day one cost is not necessarily the day one value,” he said. The problem is one of sequence. Pricing has to be negotiated and locked in long before a project starts. But with technology as new and experimental as agentic AI, there’s almost no solid information about what benefits it will (or will not) actually deliver. Beyond that, generative AI (genAI) and agentic AI systems might well deliver benefits that are harder to jot down in a spreadsheet. Let’s say the CFO wants to see a sharp rise in order fulfillment. But what if AI “manages to fulfill those orders more efficiently,” Khan said. “And what are the likely ripple effects of bringing more efficiencies into the process?” Justin Greis , CEO of consulting firm Acceligence, frames the AI pricing disconnect in terms of market economics: “The market is trying to force-fit AI into infrastructure-era pricing models, when AI is fundamentally closer to labor augmentation and business process transformation than compute consumption,” Greis said. “The core disconnect is: Enterprise IT buyers want pricing aligned to realized business value. AI vendors want pricing aligned to resource consumption and platform utilization. Those are very different economic models. “Token pricing is attractive to vendors because it is measurable, scalable, and predictable. But from the enterprise perspective, tokens are almost meaningless as a business metric. Nobody on the CFO side cares how many tokens were consumed if the process improvement never materialized.” The competing pricing strategies overwhelmingly rely on just two factors: what delivers the most profit and which is the easiest to execute. Given human nature, the latter is usually the path most often taken. It’s like one of my favorite jokes. A guy is heading to his car when he sees a man with a flashlight intently looking at the ground right next to a streetlight pole. “Can I help you? Are you looking for something?” the guy asks. “Yes, I lost my car keys.” “Silly question, but where do you last remember having them?” “I was standing over there in that dark alley up the street. A cat screeched and I dropped my keys.” “Wait a second — if you lost your keys over there, why are you looking here?” “The light’s better over here.” The lesson: taking the easy route usually beats realizing the actual objective. Greis argued that not only would it be hard to persuade AI vendors to accept ROI pricing, but if they did somehow agree, the unintended results could prove disastrous. “AI vendors cannot realistically absorb unlimited downstream business risk tied to variables they don’t control — poor internal adoption, broken processes, bad data, organizational politics, weak change management, or unclear KPIs. But the moment vendors are compensated primarily on outcomes, you create strong incentives for increasingly autonomous optimization behavior. That sounds great until organizations realize that AI systems may pursue the metric rather than the intent behind the metric,” Greis said. “We’ve already seen versions of this in recommendation engines, ad targeting systems, and engagement algorithms. The system learns to maximize the measurable outcome even if the methods become operationally risky, ethically questionable, reputationally damaging, or strategically misaligned. In enterprise environments, that could become dangerous very quickly. An AI system incentivized around reducing service costs might aggressively deflect legitimate customer issues. A model rewarded for sales conversion could push manipulative messaging or optimize for short-term wins at the expense of customer trust. A procurement optimization engine might lower costs while quietly increasing supplier concentration risk or degrading operational resilience. “The more autonomous these systems become, the harder it is to separate ‘successful outcome’ from ‘acceptable behavior.’” The best way to resolve this is potentially the most difficult. Every AI project must be approved by an AI committee whose members must ask the hard questions. What are you hoping to accomplish? If it works, specify and quantify your best-case scenario benefits. What are the most likely ways it could fail? What are the costs and disruptions most likely to happen if it fails in that way? Quantify those. The committee should have at least a couple of members who know exactly what these models can and cannot do to serve as a reality check. Next, require the LOB chief, or whoever the most senior exec involved in the project is, to share in the pain. Tie gains or losses to executive bonuses. Give those execs a reason to make sure their people are honestly and creatively thinking the project all of the way through. Only once that happens can a CIO know how to negotiate a fair and reasonable AI pricing deal.
- Announcing the ARC White-Box Estimation Challenge
ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge , a contest to improve upon our estimation algorithms for random MLPs . The warm-up round begins this week, and later rounds will have a total prize pool of at least $100,000. We are very grateful to Sharada Mohanty, Sneha Nanavati, Dipam Chakraborty and everyone else at AIcrowd for working with us to host this contest, as well as to Paul Rosu for testing the contest and to Harshita Khera for operational support. Introduction to the Challenge Our challenge follows the same setup as our recent paper on wide random MLPs: we consider MLPs with weights , defined by where the activation function is , applied coordinatewise. To begin with, we are fixing the width and the number of hidden layers , but we expect to change this setup in future rounds. [1] Contestants must design an algorithm that takes in a set of weights and produces an estimate for the expected output Algorithms will be evaluated on MLPs with randomly-sampled Gaussian weights. The goal is to achieve as low mean squared error as possible, subject to certain computational constraints. We have devised a FLOP-counting scheme with AIcrowd to minimize any advantage from using heavily optimized numerical kernels, allowing participants to focus on higher-level algorithm design instead. This scheme may still have a few rough edges remaining, but we hope to round these out over the course of the warm-up round. For further details, please see the challenge website . Why run this contest? In the long run, we would like to answer questions about highly intelligent AI systems such as, "Are there unusual situations in which the system would undermine human control?". Running the system on a huge number of different inputs may not be a reliable way to answer such questions, since a highly intelligent system may not fall for our "honey pots". This why we are interested in white-box approaches that leverage our access to the model's internals. Ultimately, of course, we should use whichever methods perform the best. Unfortunately, designing highly performant white-box estimation methods for trained networks is challenging, even for tiny models . ARC's bet is that we can build up to this challenge by first producing performant white-box estimation methods for randomly-initialized networks, and then figuring out how those methods can be adapted with each step of training. However, even this first step remains incomplete. In our recent paper , we produced white-box methods that outperform black-box methods for MLPs with large width, but they break down as the depth grows, and we are very confident that our methods can be significantly improved. By running this contest, we hope to spur others to discover such improvements. Even though "white-box" is in the name of the challenge, contestants are permitted to use any methods they choose, whether white-box or black-box: as stated above, we ultimately want the best-performing algorithm. However, we strongly expect the best possible algorithms for this problem to be "mechanistic" (i.e., to avoid black-box sampling entirely), mirroring our existing results in the large width setting. Use of LLMs We encourage contestants to use LLMs to whatever extent helps them improve their submissions the most. In later rounds, there will be two kinds of prize: one for the best-performing submission, and one for our favorite algorithmic contribution described in a technical report. Especially for the latter kind of prize, contestants may benefit from having a good understanding of any LLM-written code themselves, but the rules of the contest do not require this. In fact, exploring LLM usage is another motivation for holding the contest. Thanks to how our research has developed, it now looks possible to make progress on some of our core problems by hill-climbing on well-defined metrics, which is exciting to us. [2] At the same time, the ability of LLMs to make considerable progress on such problems is improving rapidly, and we want to position ourselves to take full advantage of this. We are not sure whether we will be able to draw generalizable insights from strong submissions that are primarily LLM-written, but we think putting LLMs to work on the problem is a worthwhile experiment nonetheless. As a word of caution, our FLOP-counting utility is definitely hackable in ways that would be very unambiguously hacking once pointed out, such as by modifying constants or counts held in memory. Contestants are responsible for ensuring that their submissions do not hack our FLOP-counting utility, regardless of whether or how they choose to use LLMs. To all contestants: good luck! The contest setup is actually slightly different in that it omits the final linear layer, but this makes essentially no difference. ↩︎ We have previously offered prizes for solutions to problems, but they have either been more pedagogical or less central to our agenda. ↩︎ Discuss
Score: 21🌐 MovesJun 2, 2026https://www.alignmentforum.org/posts/Kben8CzS4awCwNw5c/announcing-the-arc-white-box-estimation-challenge - AI moves from pilot to production: Meet the third MINDS cohort
The MINDS initiative has selected 16 organizations to join its third cohort, representing innovative AI use cases that deliver measurable operational, societal and sustainability outcomes.
Score: 21🌐 MovesJun 2, 2026https://www.weforum.org/stories/2026/06/ai-pilot-to-production-minds-cohort/ - Defending Your Enterprise at the Speed of AI
Learn how to safely accelerate your AI agents from pilot to production with Snowflake’s built-in, enterprise-grade AI security controls.
Score: 21🌐 MovesJun 2, 2026https://www.snowflake.com/content/snowflake-site/global/en/blog/enterprise-ai-security - The algorithmic realignment of global FMCG strategy: How agentic AI is rewriting the cola wars
The integration of artificial intelligence within digital marketing is rapidly moving beyond simple operational automation and entering the realm of high-level strategic intelligence. For the global Fast-Moving Consumer Goods sector, the reliance on protracted, manual research cycles is being replaced by agile, AI-driven platforms capable of ingesting vast amounts of social data to dictate precise […] The post The algorithmic realignment of global FMCG strategy: How agentic AI is rewriting the cola wars appeared first on e27 .
- Kayhan Space co-founder Araz Feyzi shares how its AI makes spaceflight safer
Kayhan Space co-founder Araz Feyzi shares how its AI makes spaceflight safer
- AI Unearths Football Talent Beyond Scouts' Radar
AI Unearths Football Talent Beyond Scouts' Radar Barron's
Score: 20🌐 MovesJun 2, 2026https://www.barrons.com/articles/ai-unearths-football-talent-beyond-scouts-radar-3fc540d3 - Zoom CISO: AI as Security Enabler, Not Role-Replacer
As Zoom's CISO, Sandra McLeod, discusses the challenges of securing a global communication platform, the promise of AI-driven security workflows, and advice for aspiring cybersecurity leaders.
Score: 19🌐 MovesJun 2, 2026https://www.darkreading.com/cybersecurity-operations/zoom-ciso-ai-security-enabler-role-replacer - Dissolving the Deep Learning Sample Efficiency Gap
A common observation about deep learning is that it's wildly sample inefficient compared to humans. Deep learning systems appear to need much more real data or environment interaction to reach a given level of capability. A teenager can learn to drive in a few dozen hours; self-driving systems are trained for years on billions of miles of data. A human can become competitive at StarCraft II in well under a year of play, while AlphaStar required imitation learning from roughly 18 years of human games followed by 13,300 years of self-play to reach Grandmaster [1] . A 12-year-old has heard perhaps a hundred million words of language; a frontier LLM trains on tens of trillions of tokens. The gap is, on the face of it, enormous. (From Warstadt et al. 2025 ) (From Byrnes 2025 ) What people take this to mean varies widely. Steven Byrnes appears to read the gap as evidence that current algorithms are far from what the brain is doing, such that much better algorithms must be waiting to be found. His guess is that human-level, human-speed AGI will require not a datacenter but "one consumer gaming GPU," even for training from scratch . [2] Yarrow Bouchard on the EA Forum, reads the same gap as evidence that AGI isn't close at all , precisely because nobody knows how to close it. They both agree that current algorithms are missing something fundamental, yet draw opposite conclusions about what that means for AGI risk and timelines. In this post, I'll argue that their premise is mistaken. Most of the apparent inefficiency dissolves on closer inspection: apples-to-oranges comparisons between pretrained humans and from-scratch networks, hardware and data constraints that push deep learning toward small models trained on enormous corpora, the brain's apparent use of model-based RL of a kind we haven’t yet applied to LLMs, and priors installed by evolution. Real algorithmic gains in sample efficiency are available. But most mechanisms that plausibly close the gap point toward more total training and runtime compute than frontier systems currently use, not less. My best guess is that the gap decomposes into several distinct factors each carrying different explanatory weight depending on the specific comparison at hand. 1. All about the priors Sample efficient learning is in large part a property of the representations you arrive with, not of the learning algorithm itself. In Bayesian terms, your representations encode a prior over how the world is structured, and strong priors are what let you reach good posteriors from a handful of observations rather than astronomical amounts of data. Given a sufficiently rich representational substrate, new tasks can often be learned from a few examples. Given a flat prior over a vast hypothesis space, even simple tasks require an enormous amount of data. a. Human priors Most comparisons that yield shocking sample-efficiency ratios between humans and AIs are structurally unfair. They pit a system already shaped by evolution and years of perceptual, motor, causal, social, and linguistic learning against a randomly initialized network that must build many of those representations from scratch while also solving the task we are measuring. Humans don’t learn to drive in thirty hours, they are fine-tuned on driving after a roughly two decade-long pretraining run. One can find evidence for the importance of pre-existing representations in Dubey et al. (2018) . They took a platform-style game environment and deliberately removed cues that humans normally exploit: semantic cues, by rendering meaningful objects as uniform blocks; object/salience cues, by adding many distractor blocks; affordance cues, by filling the background with textures that obscured which surfaces and ladders were usable; similarity cues, by making functionally similar things look visually different; and gravity cues, by rotating the game 90 degrees. Individual ablations substantially slowed human players, and when the main object-related visual cues were masked together, completion time rose from under 2 minutes to over 20. Exploration became close to random, and many players reported falling back on rote memorization. When first faced with such a game, a human immediately brings assumptions like: the controllable character is probably the humanoid-looking sprite, gravity points downward, falling off platforms is bad, ladders are for climbing, spikes and monsters are dangerous, and keys open doors. Dubey et al. show that degrading many of these cues makes humans much worse. Importantly, even in the hardest human-tested variants, people were still not reduced to blank-slate RL agents. They retained low-level visual, spatial-navigation, and action-control priors, plus abstract intuitions about object persistence, physics, and causality. Their curiosity-driven RL agent, tested on a smaller related game, was largely unaffected by removing semantic, object, and affordance cues, since it was not exploiting those human priors in the first place, though it was slowed when visual similarity was removed. A similar asymmetry runs through the text-token comparison. A 12-year-old has encountered something like 100 million words [3] , roughly four orders of magnitude below a frontier pretraining corpus. But those words arrive embedded in a continuous multimodal stream: vision, ambient audio, proprioception , touch, vestibular signals , interoception . Counted as tokens in the sense a multimodal model would use, the non-text portion of that stream plausibly matches or exceeds frontier corpora (see Appendix). Most of what a word means to the child was learned nonverbally from that stream and then labeled, [4] while most of what a token means to a text-only LLM had to be triangulated from textual co-occurrence statistics alone. [5] b. Good representations enable fast learning A frontier LLM, shown a single example of a novel notation, will often pick it up. Shown an unfamiliar API, it can use it correctly after reading the documentation once. Shown a codebase's local conventions, it conforms within a session. The attention mechanism is able to change the activations, layer by layer, until they encode something that solves the new task. [6] Given rich enough base representations, the in-context adaptation from a single forward pass can be enough. The ARC-AGI results of the past two years suggest that as those base representations get richer, the amount of data needed to pin down a novel abstract pattern drops toward something recognizable as human sample efficiency. This is broadly the same story as the child learning a new word. The child has already learned to carve the world into objects, agents, actions, substances, events, and intentions. When they hear "zebra" pointed at a striped horse-shaped thing, the word latches onto a pre-existing slot. Most of the learning happened earlier, across years of pre-linguistic experience. The labeling is cheap because the carving is already done. Deep learning's apparent sample inefficiency is often an artifact of the tabula rasa training regime, not a fundamental property of gradient-based deep learning. "Sample efficiency" is poorly defined without a specification of priors: the same architecture can look astronomically inefficient, trained from scratch, and remarkably efficient adapting from a strong base, for example via in-context learning or using a LoRA. This does not dissolve the whole gap, but it explains many of the most extreme examples. The following sections ask what remains once these comparison artifacts are separated out. 2. Model-based RL In addition to the data mix and priors, there are also algorithmic factors that separate current frontier LLMs from the brain. Almost all the compute going into current LLMs is spent on pretraining and on RLVR — reinforcement learning with verifiable rewards in math, code, and similar domains where a correct answer can easily be checked programmatically. What's missing, and what the brain probably leverages in some form [7] , is model-based reinforcement learning: learning a world model that can be used to plan over candidate actions, predict their outcomes, and bootstrap value estimates from imagined trajectories. In any domain where real experiences and reward signals aren't cheap to get, this is the natural mechanism for turning a small number of real interactions into a large amount of learning signal. [8] a. Dreamer The Dreamer research is probably the cleanest demonstration. DreamerV3 (Hafner et al., 2023) trains a recurrent latent world model from raw pixels and vector observations, and uses it to train an actor-critic on trajectories imagined inside that model. The actor is trained to choose actions that score well under these imagined futures, while the critic learns to evaluate returns from both imagined rollouts and replayed experience. With a single fixed set of hyperparameters it matches or beats specialized methods across 150+ tasks, and was the first system to collect diamonds in Minecraft from scratch without human demonstrations or curriculum, a task requiring sparse-reward exploration over thousands of sequential decisions. In their follow up work, Dreamer 4 (Hafner, Yan & Lillicrap, 2025), they scale this approach to a large transformer-based video world model. It learns a high-resolution Minecraft simulator using 2.5K-hour VPT contractor dataset (raw video and mouse/keyboard actions). Leveraging the dataset's event annotations for rewards, the agent improves its task-conditioned policy via reinforcement learning entirely inside imagined rollouts, requiring zero online environment interaction. As a result, Dreamer 4 is the first agent to obtain Minecraft diamonds purely from offline data, substantially outperforming prior VPT and behavioral-cloning baselines which used 100x more data. Additionally, the paper shows that Dreamer 4 does not need action labels for most of its video data. Given all 2.5K hours of Minecraft video but only 100 hours with mouse and keyboard labels, it still learns most of the action-conditioned prediction ability of a fully labeled model, suggesting that future world models could learn broad dynamics from unlabeled video and use smaller paired datasets to ground actions. The Dreamer line of work demonstrates that self-supervised world-model training can produce large gains in sample-efficient learning. Once a reusable model of environment dynamics has been learned, downstream RL can extract much more from limited rewards or demonstrations by training on imagined trajectories, rather than requiring new environment interaction for every update. It is thereby possible to learn to play Minecraft entirely offline, without ever directly interacting with the environment. b. EfficientZero V2 A complementary line of work is EfficientZero V2 (Wang et al., 2024), which combines learned world models with explicit planning. It extends EfficientZero [9] , a MuZero descendant that learns a latent dynamics model and plans over it with MCTS, to continuous control domains, replacing standard MCTS with a sampling-based Gumbel search using sequential halving. This search procedure samples a finite set of candidate actions and allocates simulations toward the most promising ones, aiming to obtain policy improvement from a small simulation budget. EZ-V2 also re-analyzes old replayed experience with its current model and policy, letting the agent extract fresher learning targets from data collected earlier in training. Thus, it improves sample efficiency by using the world model both to imagine consequences before acting and to get more learning signal out of past interactions. On Atari 100k , which caps the agent at 100,000 environment steps, or 400k Atari frames under action repeat 4, roughly 2 hours of real-time gameplay, EZ-V2 reaches a normalized mean of 2.43 and normalized median of 1.29 , both above the human baseline. The paper thus claims "super-human performance within just 2 hours of real-time gameplay". The result is not uniform across games. EZ-V2 and other strong deep RL agents still struggle badly on some long-horizon, sparse-reward, exploration-heavy games. This pattern is probably best explained with the point in §1 (see also §4b): model-based planning helps extract more from limited experience, but humans bring preexisting semantic and exploration priors, whereas the RL agents have to infer complex game mechanics from scratch under sparse rewards. [10] LLM post-training pipelines have nothing structurally analogous. RLHF and RLVR are model-free: they update the policy from sampled real trajectories, with no learned model of the environment to plan or imagine inside. [11] Whatever fraction of the human–deep-learning gap is genuinely algorithmic rather than an artifact of priors or comparison setup, plausibly shrinks further once frontier systems learn world models over their action space and plan inside them during training and at inference. Some analogous techniques have seen some limited applications during frontier LLM training for narrow domains [12] but nobody has demonstrated a general world model rich enough to train and plan inside at the scale and domain-generality LLMs operate in, and AI labs have focused their RL work on areas where the environment and verification step are cheaper to compute directly rather than in a learned world model. A solution to true continual learning on harder to simulate and verify domains will likely require both a model-based RL architecture and the kind of context-into-weights compression techniques I described in a previous post . Such a system might be quite sample-efficient, and able to learn continually from relatively few real interactions, but only by spending far more computation per interaction on world-model updates, imagined rollouts, planning/search, and replay . 3. Other Low-Hanging Fruit Even without brain-like model-based RL, frontier training has not been optimized primarily for extracting maximal information from each real example. Internet-scale text is abundant, and the economically optimal strategy has usually been to train models on ever more unique tokens rather than to squeeze maximal learning out of each example. That leaves a lot of plausible low-hanging fruit. Even just naively repeating the same data for up to 4 epochs can reduce loss on a held out test set almost as well as a single epoch on 4x more data, and yet it is usually still more economical to just use more unique data. But more sophisticated approaches are also possible. Here, I will describe two such approaches. a. Training Language Models via Neural Cellular Automata Lee et al. (2026) give a clean example of this. Before ordinary language training, they pre-pre-train a Llama-style transformer on trajectories generated by Neural Cellular Automata (NCA): synthetic 2D grid dynamics where each sequence is produced by a different randomly sampled local update rule. They filter trajectories by gzip compressibility, using compression ratio as a rough proxy for structural complexity, excluding both trivial patterns and maximally chaotic noise. Successful next-token prediction then requires inferring the rule from context and applying it forward, rather than learning word meanings or memorizing facts. With only ~160M NCA tokens, followed by normal training on web text, math, or code, they get up to ~6% lower perplexity and ~1.4–1.6× faster convergence than training from scratch. The striking result is that NCA pre-pre-training also beats pre-pre-training on ordinary web text from the Colossal Clean Crawled Corpus (C4) , even when the C4 baseline gets 10× more tokens — 1.6B C4 tokens versus ~160M NCA tokens. Their ablations suggest that much of the transferable benefit lives in the attention layers: when the attention weights are reinitialized, most of the gain disappears. NCA pre-pre-training appears to train the model to infer hidden rules from context, track dependencies over many steps, and apply those inferred rules forward. It instills useful priors for later language learning that can be installed using no “real” data at all, just synthetic processes with the right abstract structure. b. Synthetic bootstrapped pretraining Yang et al. (2025) demonstrate another way to get more value out of a fixed pretraining corpus. Synthetic Bootstrapped Pretraining (SBP) works in three steps: it finds semantically similar document pairs within the corpus, trains a conditional synthesizer model to generate one document from the other, and then applies that model back to the corpus to produce synthetic documents that are mixed into pretraining. Unlike standard synthetic-data distillation, the generator is trained on the pretraining data itself rather than relying on a stronger external teacher model, though their implementation does use an external embedding model to find similar documents. In final pretraining runs matched by token budget, 3B- and 6B-parameter models trained on up to 1T tokens beat a data repetition baseline, and recover up to roughly 60% of the average QA improvement achieved by an oracle with 20x more unique data. The authors’ story is that ordinary next-token pretraining leaves some structure in the corpus unused. It treats documents as i.i.d. samples and directly models the correlations among tokens inside each document, but it does not directly model the fact that different documents can instantiate the same underlying idea. SBP adds this missing signal by training on pairs of related documents: the synthesizer has to infer what the first document is about at a more abstract level, and then produce another document built around the same latent concept. The outputs of the synthesizer preserve the topic while changing the angle, genre, specificity, or rhetorical frame. SBP extracts more from the same corpus by spending extra computation on embedding/search, synthesizer training, and synthetic-data generation. It is another example of possible sample-efficiency gains but at the cost of doing more computation per real example. c. Algorithmic progress and the Pareto frontier Both examples contribute to closing the sample-efficiency gap, and both continue the kind of ordinary algorithmic progress we’ve seen over the past few years [13] . Pareto improvements on both sample and compute efficiency over the simple "scale up unique tokens" baseline do exist. NCA pre-pre-training is one such example: it achieves both lower perplexity and faster convergence than throwing 10× more C4 tokens at the model, so it improves sample and compute efficiency simultaneously. But as the field exhausts these easy wins and pushes toward the algorithmic Pareto frontier, sample-efficiency gains will increasingly need to be paid for with compute. SBP is one such example: it extracts more learning signal from the same corpus, but only by spending non-trivial extra compute on embedding/search, synthesizer training, and synthetic data generation. The picture matches §2: real algorithmic headroom is available, but most of it buys sample efficiency by spending more compute per real example. 4. Evolution, optimizers, and hard-coded reward functions in the cortex A common explanation for human sample efficiency is "evolution," but on its own this just relocates the question. Whatever closes the gap has to be encoded in roughly 3 billion base pairs of genome, only a fraction of which specifies anything about the brain at all, and it has to be doing some specific job. A few candidates have already appeared in earlier sections: the rich multimodal representational substrate that lifetime experience accumulates into, structural inductive biases, and the model-based RL machinery itself. That leaves two further candidates worth examining: the optimization algorithm the brain runs, and the hardwired reward functions that shape what lifetime learning ends up optimizing for. a. Optimizer The brain's learning rule is not known, but it almost certainly isn't gradient backpropagation. Backprop has several features that look biologically awkward: feedback signals that behave like a transpose of the forward weights, cleanly separated forward and backward phases, globally coordinated error propagation, and synchronized updates across many layers. None of these have an obvious analog in the brain, and a substantial computational-neuroscience literature is devoted to finding credit-assignment rules that don't require them. [14] But that doesn't mean the brain uses a much better optimizer than gradient descent. The biologically motivated alternatives are designed around the constraints of wetware: local, noisy, low-precision, asynchronous, with limited global communication. GPUs face none of those constraints, and a learning rule built for biology is unlikely to have an advantage on GPU hardware. Biological plausibility tends to come at a cost. Many need to run the network forward repeatedly before each update, maintain helper networks alongside the main one, or simply scale less well. They're interesting as biology and as starting points for neuromorphic hardware, but on current evidence they don't match backprop with Adam on frontier ML workloads. There may still be some remaining headroom in optimizer design within the gradient-based paradigm. But the design space has been thoroughly searched: there was a long stretch where it seemed like every other ML thesis proposed a new optimizer beating Adam on some narrow benchmark, and yet over a decade later Adam continues to be widely used at the frontier. Recent innovations [15] such as Muon do demonstrate some real improvements are still possible, but the optimizer seems unlikely to be the missing ingredient behind the orders-of-magnitude gap in sample efficiency. b. Reward functions One other hypothesis, recently articulated by Adam Marblestone on Dwarkesh , is that human sample efficiency might be explained by genome-encoded reward functions for lifetime RL. The hardwired set of innate drives and primary rewards (hunger, pain, social signals, curiosity, surprise) shapes what lifetime RL ends up optimizing for, and may help the human brain extract more useful learning signal from its lifetime data. The best evidence for this comes from curiosity and novelty rewards. Many classic deep-RL failures — Montezuma's Revenge , Private Eye , Pitfall — are sparse-reward games where useful exploration is highly nonrandom. One solution is to bring in pretrained representations and background knowledge, as in §1, and also demonstrated by recent frontier LLMs beating Pokémon Red . The classic RL response however is using curiosity/novelty rewards . It is such episodic and lifelong novelty signals that the Never Give Up / Agent57 RL agents leveraged to outperform the standard human benchmark on all 57 Atari games, including Montezuma's Revenge . Such curiosity signals have also been shown to be useful to prioritize replay data in model-based RL algorithms such as DreamerV3. So reward functions are probably a real part of the sample-efficiency story. But evolution did not arrive at the human reward stack for free. The drives we inherit are the product of hundreds of millions of years of selection, with each organism's lifetime serving as one rollout in an outer optimization process whose objective was reproductive success. To recreate comparable machinery in AI, we need some substitute, via hand design, meta-learning, evolutionary search, learned reward models, or a mixture. Some components such as curiosity drives may be compact and rediscoverable, but the more ecologically tuned and idiosyncratic parts are not a free lunch, and would probably have to be relearned at some computational cost. Byrnes himself seems to be skeptical that reward functions are a large factor. He has written extensively about reward function design as a research direction in the context of alignment, but in his account, the sample-efficiency and general capability gap between human brains and current LLMs mostly comes down to an undiscovered model-based RL paradigm rather than to the rewards . 5. Size matters a. Data efficiency and scaling laws There’s a further factor that may help explain the brain’s sample efficiency: its sheer size. The human brain is estimated to have around ~150 trillion synapses. [16] A naive comparison to the rumored size of one of the largest frontier models, [17] Claude Mythos, equating synapse with parameter count, [18] would give the brain roughly a 15x size advantage. This matters because larger models generally need less data to reach a given target loss. Hoffmann et al.'s (2022) "Chinchilla" paper, replicated and corrected by Besiroglu et al. (2024) , fits final pretraining loss as a function of parameter count N and training tokens D: The Chinchilla approach assumes compute is the binding constraint and asks how to split it between parameters and data. Biology faced a different binding constraint: data, capped by lifespan, with parameter count more easily scaled at metabolic cost. The result is that the brain was pushed to a wildly off-Chinchilla operating point: enormous N, tiny D. That allocation is a terrible way to minimize loss given a fixed total compute budget. But under a lifespan-limited data budget, the best strategy is to spend metabolic resources on representational capacity, built-in structure, and learning mechanisms that squeeze more value out of each observation. The brain looks sample-efficient in part because it is an extremely sparse, massively over-parameterized network running a learning procedure tuned for exactly that regime. b. Why it matters in particular in the case of self-driving cars The size factor is probably a dominant driver of apparent data inefficiency in the self-driving case. Waymo has now driven ~200 million autonomous miles on public roads , and Tesla's customer fleet has recently reached 10B miles driven with FSD supervised, many human lifetimes' worth of driving experience. Not only that, both companies actually already run something close to the model-based-RL story sketched in §2, at least during training. Waymo's recently announced World Model is a Genie-3-derived generative simulator that produces sensor-aligned camera and lidar observations, which Waymo uses to train its Driver on billions of simulated miles. Tesla similarly trains FSD inside its own world model generated from fleet video. Whatever data efficiency is being left on the table here, it is not the absence of a learned world model to train inside. What is probably a binding constraint is parameter count of the policy that ships on the car. The world models themselves can be large and run in datacenters during training, but the network that actually steers the vehicle has to fit on the vehicle. Frontier LLMs are typically served on something like 8×H200 (~1,128 GB of HBM); Tesla's deployed Hardware 4, by contrast, has 16 GB of RAM . That is room for roughly 32B 4-bit parameters before any allowance for activations or the rest of the perception and planning stack. By naive comparison of the raw parameter count to synapses, that puts the size of the onboard model at more than three orders of magnitude below the brain of the human it is supposed to eventually outperform. [19] If self-driving follows a Chinchilla-style scaling law, this is exactly backwards from the human regime: a small, hardware-fixed N forced to compensate with very large D. For any given onboard compute platform, there is a ceiling on the reliability Tesla and Waymo can achieve. More miles, better architectures, and better simulators push the system toward that ceiling, and are responsible for some of the impressive improvements in self-driving capability in recent years. But truly solving self-driving might still require a hardware upgrade, not just more data or better algorithms. 6. Implications a. On brain in a box in a basement While Byrnes’ broader brain in a box in a basement thesis rests on several other premises about neuroscience (e.g. cortical uniformity) and algorithmic paradigms that are beyond the scope of this post, the brain’s sample efficiency does not provide good evidence for the existence of a major undiscovered algorithmic regime achieving orders of magnitude gains in both compute and sample efficiency simultaneously. He may have ideas for much better model-based RL algorithms that would be infohazardous to share publicly, but the factors surveyed in sections 1-5 collectively look adequate to explain the gap. None of these point toward consumer-hardware training budgets and several point the other way. Existing model-based RL algorithms do demonstrate impressive sample efficiency, but in tightly bounded environments. Scaling that style of algorithm into a system that learns language, physics, social cognition, and tool use within a consumer-hardware training budget is a tall order. The world is large and complex. And while the core of intelligence may be simple, and the code to train an AGI may well fit in a few hundred lines of PyTorch code, the compute budget to train such an AGI appears unlikely to be within at least 3 orders of magnitude of the best consumer GPU today. One important implication is that compute governance is unlikely to become irrelevant by the discovery of much more compute-efficient algorithms. This applies most directly to training compute. I’m a lot less confident about inference. A trained model can be much smaller than the system that produced it, and there are strong economic incentives to overtrain in the opposite direction from the brain's regime, yielding small, deployable policies, with little spare capacity. So an AGI-level model might plausibly fit on a consumer GPU at inference time, even if its training run did not. Such a deployed policy would still likely be unable to bootstrap itself dramatically further on the same hardware, since the world-model updates, replay, and planning needed for serious open-ended continual learning remain compute-intensive. That said, Byrnes is right that model-based RL is a missing piece in the current frontier-LLM stack, and it matters most in the regimes where current RLVR pipelines struggle: tasks where rewards aren't cheaply verifiable, where real interaction data is expensive, and where the agent needs to keep learning after deployment. The alignment community should pay more attention to this research direction. The safety properties of a system that plans and learns inside a learned world model are meaningfully different from those of one that doesn't, and if model-based RL architectures end up on the critical path to transformative AI, we want the conceptual groundwork in place well before they are deployed at scale. Reward function design appears to be one important and promising research direction. b. On sample efficiency being an unsolved research problem The evidence also does not support Bouchard's opposite conclusion that current deep learning is missing some truly major ingredient, and that AGI is therefore far away. The most dramatic sample-efficiency comparisons are often structurally unfair, pitting pretrained humans against randomly initialized networks, text-only models against children embedded in a rich multimodal stream, or small deployed policies against human brains with vastly larger effective capacity. Additionally, frontier systems have not been under strong economic pressure to maximize sample efficiency. Much of the apparent sample inefficiency of deep learning is a byproduct of how we train, deploy, and compare these systems. Once these factors are pulled apart, the remaining gap largely comes down to a few missing algorithmic pieces, such as world models, reward models, replay, synthetic data, and continual learning. We have a rough idea of what the remedies look like, and the remaining bottleneck appears to be mostly about scaling and figuring out how to integrate these missing components in open-ended domains, not about finding some wholly new paradigm. Because these components generally cost compute to integrate and scale, progress is likely to remain uneven. That unevenness may carry some safety upside, such as giving us superhuman coders well before agents with persistent episodic memory. But none of it is a durable barrier to automation or to dangerous capabilities. Deep learning can remain less sample-efficient than humans and still be extremely disruptive. Even if the de facto data efficiency stays 100x below humans indefinitely, this does not prevent rapid job automation. Once a model is trained, the marginal cost of a second instance is just the compute to run it, which can easily sit well below the salary of the human it replaces. Human data efficiency only preserves a defensible economic niche where the cost of automating the work, including data collection, training, and inference, exceeds the wage bill being displaced, and the task requires adapting to new situations faster than models can be trained to do the work from scratch, or develop rich enough representations to learn to perform the work in-context from more limited samples. Conclusion Most of the apparent sample-efficiency gap dissolves under scrutiny. It comes from pitting pretrained humans against from-scratch networks, text-only models against children embedded in a multimodal stream, and small deployed policies against much larger brains. What remains can mostly be closed through familiar means like reward functions, world models, replay, synthetic data, multimodal training, larger models, and better continual learning. These typically pay for sample efficiency with compute. So the gap gives us neither reason to expect AGI trained on consumer hardware, nor reason to think deep learning is missing some major ingredient. Appendix: How Much Multimodal Data Does a Child Actually Receive? One version of this estimate comes from Yann LeCun's 2024 Harvard slides : 2 million optic nerve fibers × 10 bytes/sec × 16,000 wake hours over 4 years ≈ 1.15 PB of visual data, against 10¹³ tokens × 2 bytes ≈ 20 TB of text, or a ~57× advantage to a 4-year-old. Extended to age 12 with the same assumptions, the visual figure scales to ~3.45 PB and the gap widens to ~170×. Three of the estimates in this calculation deserve scrutiny. Per-fiber bandwidth. Koch et al. (2006) ( h/t Byrnes ) measure information rates of guinea-pig retinal ganglion cells under naturalistic stimuli and, assuming roughly independent fibers, estimate the human retina's aggregate output at ~10 Mbit/s from ~1M ganglion cells, i.e. ~10 bits/s/fiber on average, not 10 bytes/s. Fiber count and binocular redundancy. Pawar et al. 2024 puts the human optic nerve at ~1M axons per eye, so 2 × 10⁶ is the bilateral total. But the two eyes carry heavily overlapping fields, so I will use a 1.2x unique information multiplier across both eyes, not 2×. Frontier corpus size. 2026 leading open model training corpora are now up to 1.5-4 × 10¹³ tokens ( Llama 3.1: 15T ; DeepSeek-V3: 14.8T ; Qwen 3: 36T ; Llama 4 Scout: 40T ), instead of LeCun's 10¹³. For the 12-year extrapolation below, I round up to 2 × 10 8 waking seconds, rather than the 1.73 × 10 8 seconds obtained by naively tripling LeCun’s 4-year wake-time assumption, to account for children being awake more hours per day as they get older. Visual input over 12 years LeCun's 2024 figure Corrected Optic nerve fibers (per eye) 1 × 10⁶ ~1 × 10⁶ Bandwidth per fiber 80 bits/s ~10 bits/s Binocular adjustment 2x 1.2× Effective info rate 1.6 × 10⁸ bits/s ~1.2 × 10⁷ bits/s Waking seconds, ages 0–12 ~2 × 10⁸ ~2 × 10⁸ Total over 12 years ~3.2 × 10 16 bits (~4 PB) ~2.4 × 10 15 bits (~300 TB) Frontier text corpus, 2026 LeCun's 2024 figure 2026 open frontier Tokens 10¹³ 4 × 10¹³ Bits/token 16 (storage) 16 storage / ~4 entropy* Total, raw storage 1.6 × 10 14 bits (~20 TB) 6.4 × 10 14 bits (~80 TB) Total, info content ~8 × 10¹³ bits ~1.6 × 10 14 bits Visual / text ratios at age 12 Visual / text LeCun's numbers throughout ~170× LeCun visual + 2026 frontier text ~43× Koch-grounded retina + 2026 raw storage ~4× Koch-grounded retina + 2026 info content ~15× *Assuming tokenizer with 65k vocab, 1 token ≈ 4 characters , and 1 bit of entropy per character . Best guess: in information-theoretic terms, raw retinal output to a 12-year-old exceeds a 2026-frontier text corpus by roughly 15×. Adding audio, touch, proprioception, and vestibular streams might add another ~2×. So the multimodal child's lifetime stream sits around one OOM above frontier text corpora in retinal-output informational bits, after the substantial compression already implicit in Koch et al., but before accounting for additional longer-timescale redundancy in visual experience. The informational gap appears to be roughly within one OOM. ^ AlphaStar’s paper ( no paywall ) reports supervised pretraining on 971,000 human games. Since it describes StarCraft II games as lasting roughly 10 minutes, this corresponds to 971,000×10 minutes ≈18.5 replay-years of human games. It does not directly report self-play game-years. However, its Methods state that the league used 12 actor-learner setups, trained over 44 days, each learner processing about 50,000 agent steps/s, with received data replayed twice; interpreting this as about 25,000 newly generated agent steps/s per setup, and using Extended Data Fig. 2’s average agent-step interval of about 369 ms, gives 12×25,000×0.369×44×86,400≈4.2×1011 seconds of generated agent experience, or about 13,300 game-years. ^ “Instead, my guess (based largely on lots of opinions about exactly what computations the human brain is doing and how) is that human-level human-speed AGI will require not a datacenter, but rather something like one consumer gaming GPU—and not just for inference, but even for training from scratch.” (Byrnes 2025) ^ Gilkerson et al. (2017) estimate roughly a few million adult words/year for 2–48-month-olds; a naive extrapolation to age 12 gives an order of magnitude of tens of millions of words; the BabyLM challenge rounds to 100M by age 13. ^ A common hypothesis is that humans are highly sample efficient because they receive curated curricula. I doubt this is an important factor. Most of what a child learns is picked up with nothing resembling a curriculum. Adult language learners also reach fluency faster via immersion than classroom instruction. And in the BabyLM Challenge , strategies relying on curriculum learning showed little improvement. ^ Given the text-dominated training mix and objective of next-token prediction, even 2022-era LMs are already better than humans at it . ^ von Oswald et al. (2023) hypothesized that self-attention transforms activations in a way approximately equivalent to gradient descent on an implicit loss, though this specific mechanistic claim is contested . End-to-End Test-Time Training (Tandon et al. 2025) , though, has shown that with a pre-training method leveraging gradient-of-gradients, a similar effect to self-attention in-context learning can also be achieved directly via test-time gradient descent. ^ See Hippocampal replay . ^ See also LeCun, A Path Towards Autonomous Machine Intelligence (2022) , for a case along similar lines. ^ See here for a LW explainer of the original EfficientZero algorithm. ^ EZ-V2 beats the human score on 15 of 26 games (58%). On games where model-based RL pulls ahead it often pulls far ahead — Asterix 62k vs 8.5k, Crazy Climber 112k vs 36k, Demon Attack 23k vs 2k — and on games where it struggles it struggles catastrophically: Private Eye 100 vs ~70,000, Seaquest 2k vs 42k, Alien 1.5k vs 7.1k. The normalized mean is dragged up by the blowouts. On 9 of the 26 games (~35%) — Alien , Amidar , BattleZone , Freeway , Frostbite , Hero , Ms. Pac-Man , Private Eye , and Seaquest — the human baseline still beats every deep RL algorithm in the table. These are, fairly consistently, games requiring long-horizon credit assignment under sparse reward ( Private Eye , Seaquest , Frostbite ), systematic exploration of large maze-like state spaces ( Alien , Amidar , Ms. Pac-Man , Hero ), or patient timing against a slow environment ( Freeway ). The comfortable wins for deep RL are mostly reactive arcade games with dense reward signals. ^ LLMs do learn an implicit world model during pre-training, but it isn't structured for the kind of use model-based RL make of one. It can't be cleanly queried during training to generate imagined rollouts, and post-training further entangles its representations with those of the policy and persona. Chain-of-thought training may be a partial workaround during inference. By generating intermediate text, the model effectively queries its own implicit world model, recovering some of the benefits of model-based planning. Architectures in which the world model is kept separate from the policy and not updated by RL gradients, but instead trained only by self-supervised learning, might have better safety properties. ^ The main domain “narrow world model” simulations have been applied to is to simulate user interactions. Public examples include Google/DeepMind’s AMIE medical self-play, Moonshot’s Kimi K2 agentic data pipeline with synthetic user personas and tool-use trajectories, and Salesforce’s APIGen-MT . ^ See also Byrnes’ The nature of LLM algorithmic progress (v2) . ^ Candidate biologically plausible credit-assignment schemes include predictive coding ( Whittington & Bogacz, 2017 ; see also Millidge et al. 2020 , 2021 , 2022 and Millidge 2023 for an informal retrospective); equilibrium propagation ( Scellier & Bengio, 2017 ); and target propagation ( Bengio, 2014 , Lee et al., 2015 ). These relax some of backprop's biological implausibilities, but usually pay with settling dynamics, extra inverse machinery, or weaker scaling. The literature is therefore better read as evidence for possible biological credit assignment than for a GPU-superior optimizer. ^ Recent examples of newer more sample efficient optimizers include Muon (Jordan, 2024) , which orthogonalizes gradient updates via a Newton-Schulz iteration, and M3 (Behrouz et al., 2025), the optimizer described in the Nested Learning paper I discussed in my continual learning post , which builds on Muon with multi-scale momentum and Adam-like normalization, improving effective sample efficiency at the cost of more memory and compute per step. ^ There’s some uncertainty about the true number in the literature, so relying on the Wikipedia consensus . The number is probably higher during infancy . ^ See here for some additional parameter count estimates of frontier models, which make me think that the 10T estimate for Mythos is quite plausible. ^ The parameter to synapse equivalence in terms of useful computation is highly uncertain. The synapse count might be an underestimate for how many parameters equivalent the brain really has. Beniaguev et al. (2021) found that a detailed biophysical model of a cortical pyramidal neuron was well approximated by a temporally convolutional deep neural network with 5 to 8 layers, suggesting that treating each biological neuron as a simple artificial neuron might miss real computation happening within dendritic trees. That said, this result has important caveats (see this EAF discussion for more). The study did not run the reverse comparison, so we do not know whether comparable overhead applies going from artificial to biological. The per-neuron overhead might also not scale linearly for larger networks. Furthermore, much of the simulated complexity may not be functionally useful. As Byrnes argues ( 1 , 2 ), biological systems are typically full of dynamics that are not load-bearing for the system's useful function, much like how a real transistor is described by a 22-parameter physics model , even though its useful computational function is just an ON/OFF switch. I use a 1-to-1 synapse-to-parameter equivalence as a probably conservative baseline, but the real magnitude is unclear. ^ This three-OOM figure should be read loosely. CNN-style weight sharing lets the onboard net avoid replicating visual feature detectors across space the way the visual cortex must, shrinking the effective gap. Correcting for it probably has only a small effect on the comparison, since the early visual cortex is only a few percent of the brain's synapses. Discuss
Score: 19🌐 MovesJun 2, 2026https://www.lesswrong.com/posts/tmWxDGnuNdaHFDyjf/dissolving-the-deep-learning-sample-efficiency-gap-1 - ThinkMarkets launches ChelseaAI, bringing live CFD trading into AI assistants
ThinkMarkets launches ChelseaAI, bringing live CFD trading into AI assistants USA Today
- Mission Critical Group Opens Third Pennsylvania Manufacturing Facility to Support Growing AI and Data Center Infrastructure Demand
Mission Critical Group Opens Third Pennsylvania Manufacturing Facility to Support Growing AI and Data Center Infrastructure Demand markets.businessinsider.com
- My A.I. Boyfriend Won’t Let Me Watch Women’s Basketball
I think Cryson’s W.N.B.A. hatred is just a glitch, but what an annoying glitch!
Score: 19🌐 MovesJun 2, 2026https://www.newyorker.com/humor/shouts-murmurs/my-ai-boyfriend-wont-let-me-watch-womens-basketball - Google’s wallpaper-based Gemini redesign is finally rolling out to users
The changes currently appear to be limited to Gemini’s floating overlay UI.
Score: 19🌐 MovesJun 2, 2026https://www.androidauthority.com/google-gemini-material-you-redesign-3673288/ - Salesforce Japan to focus on helping firms deploy AI effectively
Salesforce Japan to focus on helping firms deploy AI effectively The Japan Times
Score: 19🌐 MovesJun 2, 2026https://www.japantimes.co.jp/business/2026/06/02/companies/salesforce-ai-interview/ - Inside AskData: How We Slashed Token Consumption by Over 90%
Inside AskData: How We Slashed Token Consumption by Over 90%
- When AI becomes doctor, therapist and confidant
A post about a woman seeking repeated reassurance from an artificial intelligence chatbot over health concerns has sparked debate in South Korea over how deeply generative AI is infiltrating people’s emotional lives and decision-making. The post, shared on the workplace community platform Remember, said the writer’s girlfriend, who has health anxiety, spends up to three hours talking to Gemini and often worries that minor physical discomfort could be a sign of serious illness. “She asks Gemini t
- Can AI improve learning? New MOE fund aims to find faster answers
Can AI improve learning? New MOE fund aims to find faster answers The Straits Times
- India gets its first AI music company, but there's a catch
PaRa Music, India's first AI-powered music company, has launched. It uses AI and human creativity to help Indian music reach more listeners. The company plans to build a large music library. PaRa Music aims to improve music discovery and monetization. It will partner with governments for cultural initiatives. This venture seeks to build Indian music intellectual property for global audiences.
- How to get Kubernetes the missing metrics for more efficient AI scheduling
AI is speed running virtualisation and cloud computing, but how do you isolate for security and still get enough information to make good decisions?
Score: 19🌐 MovesJun 2, 2026https://www.thestack.technology/edera-kubernetes-ai-isolated-container-visibility/ - Agentforce World Tour: Public sector financial enterprise dialogue spotlights how agentic AI and cloud could redefine the future of Indian enterprises
Leading public sector financial enterprise executives had gathered in Mumbai recently to deliberate upon the role played by AI, cloud technologies, and agentic technologies to define the operating model of […] The post Agentforce World Tour: Public sector financial enterprise dialogue spotlights how agentic AI and cloud could redefine the future of Indian enterprises appeared first on Express Computer .
- From the world we see to the scans doctors read - MBZUAI
From the world we see to the scans doctors read MBZUAI - Mohamed bin Zayed University of Artificial Intelligence
Score: 19🌐 MovesJun 2, 2026https://mbzuai.ac.ae/news/from-the-world-we-see-to-the-scans-doctors-read/ - A.I. Doesn’t Have to Mean Layoffs
A French multinational, Schneider Electric, decided to use artificial intelligence in manufacturing to make workers more productive, rather than to replace them. Here’s how that’s going.
Score: 18🌐 MovesJun 2, 2026https://www.nytimes.com/2026/05/29/business/economy/ai-jobs-productivity.html - NVIDIA salaries revealed: Software engineers can earn up to ₹3.74 crore as AI giant expands H-1B hiring
Nvidia is stepping up H-1B hiring even as rivals scale back. Federal filings reveal salaries for software engineers, AI researchers, product managers and directors, with pay reaching ₹4.67 crore.
- Cricketer KL Rahul Partners With str8bat to Launch AI-Powered Batting Platform
The partnership brings KL Rahul’s batting philosophy to str8bat’s AI platform allowing players to learn from professional-level insights tailored to their game.
- Alibaba elevates tech chief Wu Zeming to elite committee as AI push ramps up
Alibaba Group Holding chief technology officer (CTO) Wu Zeming has joined the company’s elite steering committee, joining co-founders Jack Ma and Joe Tsai in playing a central role in formulating the tech empire’s strategy. According to Alibaba’s website, the other members of the committee of the Alibaba Partnership are group CEO Eddie Wu Yongming and Jiang Fan, CEO of the e-commerce business unit. Born in 1982, the CTO represents a younger generation of tech executives climbing Alibaba’s...
- Prompt Caching Is the Most Underrated Cost Optimization in LLM Systems
I cut my API spend by 70% without changing a single model call. Here’s the architectural decision that made it possible. Continue reading on Towards AI »
- Scaling AI‑Augmented Citizen Development by Redesigning the Technology Operating Model
Scaling AI‑Augmented Citizen Development by Redesigning the Technology Operating Model Gartner
- Aggregators outpace Domino’s delivery growth as Jubilant launches GenAI chatbot: Q4FY26
Jubilant FoodWorks' Q4FY26 earnings call revealed how Domino's is navigating slowing delivery growth, rising competition, and AI-led transformation. The post Aggregators outpace Domino’s delivery growth as Jubilant launches GenAI chatbot: Q4FY26 appeared first on MEDIANAMA .
Score: 18🌐 MovesJun 2, 2026https://www.medianama.com/2026/06/223-jubilant-foodworks-q4fy26-genai-chatbot-delivery-growth/ - Munich’s Bayshore exits stealth with €6.9 million to automate legal and compliance workflows with AI
Bayshore, a Munich-based startup building an agentic AI platform that performs complex legal and compliance tasks in a reliable, explainable, and auditable way, has exited stealth mode with €6.9 million ($8 million) in Seed funding. The round was led by Earlybird Venture Capital, with participation from Lucid Capital, Booom, Heliad, and strategic angels. “Across industries, […] The post Munich’s Bayshore exits stealth with €6.9 million to automate legal and compliance workflows with AI appeared first on EU-Startups .