Everything going on in AI - updated daily from 500+ sources
What Is a World Model? Inside the AI Idea Behind 2026’s $1 Billion Bet
Everything you need to understand the AI category that everyone is suddenly funding, explained slowly, with sources Hello DataChefs! 👩🍳 In March 2026, a Paris-based AI company you may not have heard of raised $1.03 billion at a $3.5 billion pre-money valuation. It had no product. It had roughly a dozen employees. It had been operating for a few months. The company is called Advanced Machine Intelligence Labs, or AMI Labs (the name is pronounced like the French word for friend). Its co-founder is Yann LeCun , one of the three scientists who shared the 2018 Turing Award . And its entire pitch can be summarized in four words: build a world model. If that phrase felt like it appeared overnight, you are reacting normally. “World model” went from a technical term most data scientists had never heard of to a billion-dollar funding category in roughly six months. I work in physical AI and scientific machine learning, which is a fancy way of saying “AI that has to deal with the actual physical world.” For the last six months, almost every conversation about my research has started with the same question: but what even is a world model? So in this piece, I want to walk you through what a world model actually is, where the term comes from, who is building one in 2026, and what it might mean for your career. No buzzwords. With sources. At a pace you can follow. (PS: this connects to my earlier piece, Model Recovery vs Model Learning . If you have read it, this is the same divide showing up in a much hotter setting.) 🥄 What a world model actually is Let me give you the definition first, then unpack it. A world model is an AI system that learns to predict how the environment around it will change when an agent takes an action. Compare that to what most people already know: A large language model , like ChatGPT or Claude, predicts the next word. A world model predicts the next state of the world. The phrase has a precise academic origin. In a 2018 paper titled simply “World Models,” researchers David Ha and Jürgen Schmidhuber defined a world model as a system that learns a “compressed spatial and temporal representation of the environment.” The agent then uses that internal representation to plan and act. The most famous result in their paper was that they trained an agent to play VizDoom: Take Cover entirely inside its own “hallucinated dream” generated by the world model, and then transferred that learned policy back into the real game, where it scored well above the threshold needed to solve it. That last sentence is the foundation. The agent learned in its imagination. The imagination was accurate enough that the policy worked in reality. The term itself is older. Jürgen Schmidhuber first described this idea in 1990 , in a technical report on using recurrent neural networks to predict future states. The 2018 paper made the idea practical and widely cited. The 2026 funding cycle made it famous. 🍳 The kitchen analogy If that definition still feels abstract, try this. Imagine three different kinds of cooks. Cook number one has memorized a million recipes. You ask her what comes after sautéing the onions, and she tells you, because she has read it written down a thousand times. She is fluent in describing food. But she has never actually cooked. She knows the words, not the warmth of the pan. This cook is a large language model . It is fluent in text about the world without ever having interacted with the world. Cook number two follows a thick rulebook. Page 47 says sugar caramelizes at 320 degrees Fahrenheit. Page 312 says cold butter and warm flour produce flaky pastry. He follows the rules exactly. The food is reliable, but the rulebook had to be written by a human who already understood the kitchen. This cook is a physics simulator , like MuJoCo or NVIDIA Isaac Sim. It models the world correctly because someone hand-coded every law. Cook number three has spent a million hours watching real kitchens. Nobody taught her the rules. She figured them out by observing. Hand her a raw onion and say “chop and sauté.” Before she even lifts the knife, she can mentally play the next thirty seconds: the sizzle, the color change, the smell, the texture of the onion once it softens. She predicts what will happen next, given what she does. This cook is a world model . The rules were learned, not programmed. And the prediction is action-conditioned , which is the technical way of saying her prediction depends on what she chooses to do next. The third cook is what almost every major AI lab is now chasing. 🧂 The kitchen test (how to spot a real world model) The phrase “world model” is being used loosely in marketing copy right now. Here is the simple test I use to tell a real one from a lookalike. A real world model takes in a state (the current situation) and an action (what you decide to do), and gives you back the next state (what happens as a result). The action input is the dividing line. If a system only takes text and produces a video, that is video generation. Beautiful, useful, but not the same thing. If a system takes “this is what the world looks like now, and this is what I will do,” and predicts what the world will look like a moment later, that is a world model. This matters because some impressive video models (like OpenAI’s Sora or Google’s Veo) sometimes get described as world models in casual conversation, but you cannot pause them mid-clip and say “now turn left.” A world model lets you do exactly that. The action is part of the input, not an afterthought. 🍲 The 2026 family tree (one table, everything in it) Here is the table I wish someone had handed me three months ago. I have seen “world model” used to describe at least five different kinds of systems. Only some of them actually qualify. The cleanest takeaway is in the second column. Does the system accept an action as input? If yes, it has the basic ingredients of a world model. If no, it is something else, even if it is impressive on its own terms. 🥘 Who is actually building one in 2026 Here is the short list, as of the date this article was written. AMI Labs . Yann LeCun’s new company in Paris. Building world models using JEPA, an architecture LeCun proposed in 2022 . JEPA stands for Joint Embedding Predictive Architecture, and the short explanation is that it learns abstract representations of how the world changes, rather than trying to predict every pixel. AMI Labs is still in the research phase. NVIDIA Cosmos . A family of “world foundation models” designed for robotics and autonomous vehicles. Companies like Agility Robotics, Figure AI, Waabi, and Uber have adopted Cosmos for training their physical AI systems with synthetic data. Genie 3 . A Google DeepMind system that generates interactive 3D environments in real time at 24 frames per second and 720p resolution. Waymo adopted a specialized version called the Waymo World Model in February 2026 for self-driving simulation. Marble . A commercial product from World Labs, the company co-founded by Stanford computer scientist Fei-Fei Li. Generates 3D scenes you can walk through. Pricing ranges from free to $95 per month. V-JEPA 2 . Meta’s video-based world model, developed by LeCun’s former team at Meta’s FAIR lab. Still being actively published on after his departure. Notice that these systems do not share one architecture. JEPA, diffusion models, transformer-based video models, and 3D scene generators are all sitting under the same label. The category is real. The blueprint is not finished. 🌶️ The honest part Three caveats I think are important, because the marketing tends to skip them. One. Long-horizon coherence is still hard. “Long-horizon” just means the model has to keep things consistent over a long sequence of predictions. If you roll most of today’s world models forward for thirty seconds, objects start drifting, disappearing, or quietly contradicting earlier frames. This is the open research problem, not a finished one. Two. Learning to predict frames that look physical is not the same as understanding physics. A model can produce video that obeys gravity on every test the researchers ran, and still break the moment it sees something outside its training distribution. This is the question I wrote about in Model Recovery vs Model Learning , and it is going to define which world models actually hold up under real-world deployment. Three. Training is genuinely expensive. Frontier-scale world models cost tens of millions of dollars to train and require massive GPU clusters. This is not a category most teams can replicate on their own hardware. Even AMI’s own CEO, Alex LeBrun, predicted that within six months, “every company will call itself a world model to raise funding.” That is your warning label. When that happens, run the kitchen test. 🥄 My personal take (the one no one else is saying out loud) Here is what I actually think will happen. The companies that will own this category in 2027 and 2028 are not going to be the ones with the biggest pure learned models. They are going to be the ones who figure out how to pair learned world models with the physics-aware methods that researchers like me have been quietly building for the last decade. Methods like SINDy and PINNs , Koopman operators, equation discovery, neural ODEs. The reason is simple. Pure learned models are great at imitating physics until they encounter something outside their training distribution, and then they fall apart in ways that are hard to predict. Physics-aware methods are bad at scaling, but they encode constraints that the universe actually obeys. The most useful world models are going to combine both. The companies that figure out that recipe will eat the next decade. If you are a researcher reading this and wondering whether to pivot, that is the lane I would point you to. Not pure end-to-end learned dynamics. Not pure classical physics. The hybrid. 🥢 What this means if you are building an AI career right now If you have spent the last two years learning about LLMs, you have not wasted your time. LLMs are not going anywhere. They are extraordinary at language, code, and reasoning over text, and that capability is being woven into more software, not less. But the next wave of compute, capital, and hiring is moving toward systems that have to reason about physical consequences. That is a slightly different skill set. A few areas that are suddenly worth investing in: Modeling sequences that are not text. Video, sensor data, robot trajectories. Self-supervised learning , especially the JEPA family. The core idea is learning to predict useful representations of future states without needing labels. Reinforcement learning. It quietly stopped being trendy. World models put it back in the center, because the whole purpose of having a world model is to train an agent inside it. Sim-to-real transfer. “Sim-to-real” means taking a policy that was trained in simulation and getting it to work in the real world. This is a hard problem, and it is the bottleneck for most robotics deployments. Physics-aware machine learning. SINDy, PINNs, Koopman operators, and Kolmogorov-Arnold Networks. This is the lane I write in, and the people who have been quietly working in it for a decade are suddenly relevant. If I were starting over today, I would pick one of these and go deep, not wide. The “I touched every framework” résumé is going to age the same way “I tried every JavaScript framework” did in the 2010s. 📖 A small glossary, for later A few terms I used in this piece, defined clearly so you can come back if you forget. Action-conditioned prediction. Predicting the next state given both the current state and the action the agent takes. This is the defining feature of a world model. JEPA (Joint Embedding Predictive Architecture). An architecture proposed by Yann LeCun in 2022 that learns abstract representations of the world by predicting representations of future states, rather than predicting raw pixels. Latent space. A compressed numerical summary of what is in an image, video, or scene. World models usually work in latent space instead of on raw pixels because it is far more efficient. Long-horizon coherence. Whether a model keeps things consistent over many predicted steps. Sim-to-real. The challenge of transferring a model trained in simulation to the real world without it breaking. World foundation model. A large pretrained world model designed to be adapted for many downstream tasks, similar to how GPT-style models are foundation models for language. 🎤 Final Mic Drop If I had to compress this whole shift into one sentence, here it is. 2017’s bet was that AI could see. 2022’s bet was that AI could talk. 2026’s bet is that AI can predict what happens next . That is a harder problem. It is also, honestly, a more useful one. Most of the things we want machines to do, from folding laundry to driving trucks to assisting in surgery, are problems about what happens next, not problems about words. Whether the bet pays off will take years to know. In the meantime, the marketing will get louder, and you will see “world model” used to describe many things that are not quite that. When that happens, run the test. Does it take in a state? Does it take in an action? Does it give back the next state? If yes, that is the real recipe. See you in the next one. 👩🍳 Sources used in this piece Ha, D., & Schmidhuber, J. (2018). World Models. arXiv:1803.10122 . AMI Labs funding coverage. TechCrunch, March 2026 . NVIDIA Cosmos and robotics adoption , NVIDIA Newsroom, 2025. Genie 3 specifications , Google DeepMind. Waymo World Model announcement , Waymo Blog, February 2026. Marble launch and pricing , TechCrunch, November 2025. V-JEPA 2 announcement , Meta AI Blog, June 2025. LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence . OpenReview. Schmidhuber’s 1990 origin of the term , Wikipedia: World model (artificial intelligence). What Is a World Model? Inside the AI Idea Behind 2026’s $1 Billion Bet was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Read Original Article →