AI News Archive: June 1, 2026 — Part 19
Sourced from 500+ daily AI sources, scored by relevance.
- Nvidia ramps up production of Vera Rubin, the foundation of the next generation of AI factories
Nvidia Corp. said early Monday at the Computex conference in Taipei that it’s gearing up the production of its forthcoming Vera Rubin platform, which is set to become the foundation of a new generation of artificial intelligence factories that will dominate the enterprise infrastructure story for years to come. The company unveiled Vera Rubin for […] The post Nvidia ramps up production of Vera Rubin, the foundation of the next generation of AI factories appeared first on SiliconANGLE .
- Taiwan’s industry titans turbocharge world’s AI infrastructure buildout with NVIDIA
Semiconductor and electronics manufacturing leaders are using NVIDIA AI to speed manufacturing from fabs to factory floors as they ramp up the production of NVIDIA Vera Rubin NVL72 infrastructure for agentic AI factories. Taiwan is home to more than 500 NVIDIA ecosystem partners. More than 1 million NVIDIA MGX rack components for NVIDIA Vera Rubin infrastructure come together in Taiwan, from across 25 factory sites. As Vera Rubin ramps into full production to power agentic AI factories worldwide, that ecosystem spans the full supply chain — from key wafer and chip partners such as TSMC, SPIL, Kinsus, KYEC and UMTC, to manufacturing and systems leaders including Foxconn, Pegatron, Quanta Cloud Technology (QCT), Wistron and Inventec. But, these partners are doing more than building AI factories. They are applying accelerated computing, simulation, AI agents and physical AI to their own operations, creating a model for how AI can make advanced manufacturing faster, more efficient and adaptive. Taiwan’s manufacturing leaders build future of AI, with NVIDIA AI Across chipmaking, server assembly and factory operations, Taiwan’s manufacturing leaders are applying NVIDIA technologies to reshape how AI infrastructure is designed, built, tested and scaled. TSMC is applying NVIDIA CUDA-X libraries and AI models across computational lithography, transistor and process simulation, advanced process control, yield analysis, fab operations and inspection. NVIDIA cuLitho improves cost-effectiveness or cycle time by 20-50% over CPU-based computational lithography at the same cost of ownership, while the NVIDIA cuEST library improves semiconductor material simulation by 50x on average, cuML library, Metropolis platform and TAO Toolkit help accelerate material simulations, improve process control and strengthen rare-defect inspection. Foxconn is using the new NVIDIA Factory Operations Blueprint and NemoClaw blueprints to build MoMClaw, its manufacturing operations management agent, connecting sensor and machine signals with specialized agents that give plant managers and operators real-time answers and action plans through a natural language interface with NVIDIA OpenShell privacy controls and safety guardrails. Foxconn estimates an 80% speed up in root-cause analysis time, a 15% increase in labor productivity and a 10% decrease in machine failure rates. Foxconn also uses DeepHow’s SOP Verification vision AI system using NVIDIA Cosmos and the NVIDIA Metropolis Blueprint for video search and summarization (VSS) to gain greater visibility into complex manufacturing processes, resulting in improved manufacturing efficiency and boosting first pass yield by 3%. The company is also applying NVIDIA Isaac Teleop, Isaac Sim, Isaac Lab and ROS 2 to wheeled humanoid robots operating in its factories, supporting precision assembly tasks such as pick and place, dual-arm collaboration and force-controlled screw fastening. Foxconn’s $1.4 billion AI cloud supercomputing center in Taiwan — powered by 10,000 NVIDIA GPUs — is being built with the NVIDIA GB300 NVL72 hybrid cooling architecture. Quanta Cloud Technology (QCT) is using NVIDIA Omniverse-based digital twins to accelerate factory planning, giving engineering, operations and logistics teams shared access to design data for faster layout feedback, optimized workflows and improved space utilization. QCT is also working with its subsidiary Techman Robot on a physical AI developer kit that uses QuantaGrid systems for data generation and model training. Techman Robot is using NVIDIA Jetson Thor and the Isaac GR00T platform to support the development of its next-generation robots, including the TM Xplore I humanoid, for advanced industrial tasks such as server fan assembly. Wistron is using the NVIDIA Omniverse DSX Blueprint, the NVIDIA PhysicsNeMo framework and Cadence Reality DC Design to simulate burn-in environments for stress-testing across global manufacturing sites and to optimize AI server manufacturing. Running on Wistron’s NVIDIA AI infrastructure with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, NVIDIA Omniverse and NVIDIA Metropolis libraries, these workflows speed layout analysis by as much as 70% and cut facility power demand by 20% through dynamic rack optimization. Pegatron is adopting the NVIDIA Omniverse DSX Blueprint, developing simulation-ready assets, and connecting design data, thermal simulation, digital twins and physical qualification — accelerating the design and deployment of AI factories. Pegatron is also using NVIDIA’s Defect Image Generation physical AI agent skill with NVIDIA Cosmos world foundation models and Isaac Sim to generate synthetic defect data, reducing AI visual inspection deployment time by 67% and operational effort by 10%. Inventec is using the Defect Image Generation agent skill in its Observation Agent to generate synthetic defect data for automated optical inspection. In notebook cosmetic inspection, internal validation produced more than 10,000 synthetic defect images and showed the potential to reduce real-world data collection and manual labeling by about 30%, shorten AI deployment time by about 25% and improve anomaly detection by about 10%. As NVIDIA Vera Rubin ramps into full production, Taiwan’s manufacturing leaders are showing how AI infrastructure becomes part of its own manufacturing engine — using accelerated computing, simulation, agents and physical AI to build the next generation of AI systems.
- NVIDIA factory operations blueprint gives factories a new AI brain
At GTC Taipei, and at COMPUTEX, NVIDIA announced the NVIDIA Factory Operations Blueprint (FOX) — a reference design for building an autonomous factory manager agent that continuously monitors and reasons across the real-time data and orchestrates a fleet of speciality agents and machines to quickly resolve issues at scale. FOX helps developers build secure, centralized factory manager agents for orchestrating and optimizing specialized industrial AI agents for quality control, material transport and worker safety. Built with NVIDIA NemoClaw, AI-Q Blueprint and NVIDIA Nemotron open models, the blueprint provides a customizable foundation for connecting factory systems, automating model development and running intelligent operations at scale. The blueprint is optimized to run on NVIDIA DGX Station, the ultimate deskside AI supercomputer companion for factory managers. DGX Station is powered by the NVIDIA GB300 Grace Blackwell Ultra Desktop Superchip, featuring 20 petaflops of FP4 performance and 748GB of coherent memory, and is capable of running large AI models up to 1 trillion parameters, making it ideal for developing and running powerful AI agents locally. The superchip features the NVIDIA Blackwell Ultra GPU connected to a high-performance NVIDIA Grace CPU using the NVIDIA NVLink-C2C interconnect to deliver best-in-class system communication and performance, ideal for lightning-fast interactions between NemoClaw and AI models. Key capabilities of the FOX blueprint include: Connecting factory systems and agents: FOX integrates with industrial data sources, machines, applications and robot fleets, and can connect to specialized agents from leading software developers through standard application programming interfaces and agent skills. Automating AI model training: Using NVIDIA TAO skills, factory manager agents can automate the full model-training lifecycle — identifying accuracy gaps, sourcing or synthetically generating training data, fine-tuning models and redeploying them into production. Operating intelligent factory workflows: Visual inspection, process compliance and material transport agents can be managed with NVIDIA open models and blueprints, including the NVIDIA Metropolis Blueprint for video search and summarization (VSS). Real-time factory data can also be visualized in an operational twin built with NVIDIA Omniverse libraries. Taiwan manufacturers Advantech, Foxconn, Pegatron and Wistron are the first to deploy autonomous factory manager agents using the NVIDIA FOX blueprint and NemoClaw. Foxconn, the world’s largest electronics manufacturer, is using the FOX blueprint and NemoClaw to build MoMClaw, a manufacturing operations multi-agent system. Running alongside a live production work, MoMClaw connects sensors, machine signals and other digital systems with hundreds of specialized agents in a single agentic layer — giving plant managers and operators real-time answers and action plans through a natural language interface with NVIDIA OpenShell privacy controls and safety guardrails. With MoMClaw, Foxconn projects an 80% improvement in root cause analysis time, a 15% increase in labor productivity and a 10% decrease in machine failure rates. Pegatron is using the FOX blueprint and NemoClaw to build a factory manager agent that orchestrates specialized agents for material transport, AI inspection, standard operating procedure guidance and machine-to-machine coordination. With the factory manager agent, Pegatron can orchestrate robot utilization more efficiently, eliminating the need for expensive standby equipment, with an estimated 15% reduction in asset redundancy costs. Advantech has introduced the AI Factory Brain, an intelligent multi-agent system led by a factory manager agent built with the FOX blueprint and NemoClaw. Advantech has deployed the factory manager agent in its own factories to autonomously manage energy across HVAC and lighting specialized agents and projects to cut energy consumption by 10%. Wistron is adopting the FOX blueprint and using NVIDIA Cosmos, NVIDIA Nemotron open models and the NVIDIA Metropolis VSS blueprint to build surface-mount technology agents that analyze and orchestrate production-line operations, enabling real-time root-cause analysis and quality control. To monitor manufacturing operations, improve quality, verify standard operating procedures and improve worker safety, companies including DeepHow, Overview AI, Roboflow and Spingence are building specialized agents powered by NVIDIA AI and the NVIDIA VSS blueprint: * DeepHow is using the Metropolis VSS Blueprint and Cosmos 3 to develop a standard operating procedure agent for Foxconn that supports assembly of Bianca boards for NVIDIA GB300 servers. Running on NVIDIA RTX PRO Servers, the agent accurately understands complex assembly motions to help improve first-pass yield by 3%, minimizing rework and production waste. * Spingence is using the NVIDIA Defect Image Generation skill, NVIDIA Cosmos open vision language model and NVIDIA TAO Toolkit for fine-tuning to develop a factory manager agent for Cooler Master that connects automated optical inspection and model-building agents, achieving 99.6% defect recall, reducing defect escapes by 78% and increasing inspection capacity by 3x. * Overview AI is using an NVIDIA agent skill for defect image generation and NVIDIA Cosmos to help Amphenol improve manufacturing efficiency with its Advanced GenAI Toolkit. The toolkit generates synthetic defect data and deploys visual inspection AI models 12x faster, reducing time to first inference to under 30 minutes across more than 300 products. * Roboflow is using NVIDIA Cosmos to develop a model-building agent for Corning Fiber Optics that generates synthetic defect images when training data is limited, delivering near-perfect detection rates and demonstrates the potential to reduce daily manual image review.
- NVIDIA DSX gives infrastructure builders playbook for AI Factories
NVIDIA has announced the NVIDIA DSX platform, which gives infrastructure builders a complete playbook to create AI factories. NVIDIA DSX brings together open source, modular software libraries, application programming interfaces, reference designs, NVIDIA accelerated computing platforms and partner technologies into a common, codesigned platform for AI factory design, deployment and operations. NVIDIA is the only company that builds the full AI factory. By aligning every layer of the stack across compute, software, facilities and partner technologies, DSX provides infrastructure builders with a proven framework to design, deploy, and operate AI factories at scale. The integrated platform accelerates deployment, improves operational reliability and resiliency at scale and enables a broad ecosystem of solutions designed to turn every megawatt into more intelligence at the lowest token cost. “We’re not just shipping chips — we’re giving every infrastructure builder a complete playbook to build AI factories,” said Jensen Huang, founder and CEO of NVIDIA. “With the DSX platform, you can simulate the entire factory before you spend a dollar, validate performance before a single rack is installed and operate with the kind of reliability that production AI demands.” DSX platform elements DSX now spans the full stack, from silicon and systems to infrastructure software, facilities and partner technologies. The latest additions to the platform include new open source software: DSX MaxLPS: A suite of technologies to maximize token performance per megawatt within a fixed power budget, enabling lowest token cost for AI factories. Combining 45-degrees-Celsius liquid cooling with in-rack technologies that optimize performance per watt, DSX MaxLPS lets operators run up to 40% more GPUs at their most energy-efficient operating point with minimal impact on workload performance. DSX OS: Open source, modular software purpose-built for AI factory operations, providing lifecycle management, intelligence scheduling, runtime consistency, health automation, resiliency, multi-tenant operations and platform services. DSX MaxLPS and DSX OS join an existing set of features under the DSX platform: DSX Reference Design: Generation-specific, validated AI factory architectures covering compute, networking, storage, hardware cluster design and facilities infrastructure — including power, cooling and controls, as well as civil, structural and architectural design. DSX Sim: High-fidelity simulation layer for the AI factory lifecycle, helping NVIDIA, partners and customers to model, validate and optimize infrastructure decisions from planning and design through deployment and operations. DSX Flex: Connects AI factories to power-grid services, enabling dynamic workload adaptation to grid signals such as load shedding, demand response and pricing events, and orchestrating renewable and hybrid power across utility, onsite renewables and storage. DSX Exchange: Enables scalable, secure integration of compute, network, energy, power and cooling plant signals between IT, operational technology and operations agents. Growing DSX ecosystem NVIDIA is partnering with industry-leading Taiwan system manufacturers to expand the DSX ecosystem, supporting the buildout of AI factories with extreme codesign at their core. NVIDIA cloud partners CoreWeave, Crusoe, Firmus, IREN, Lambda, Nebius, Nscale and Yotta Data Services are deploying core components of the DSX platform stack — DSX Sim, DSX MaxLPS and DSX OS — to reduce risk, improve GPU utilization and bring AI cloud capacity online faster. Dell Technologies, HPE, Lenovo and Supermicro together with ASUS, Foxconn, GIGABYTE, Pegatron, Quanta Cloud Technology (QCT), Wistron and Wiwynn are building NVIDIA DSX-ready systems and contributing simulation-ready assets that enable customers to deploy complete, full-stack AI factory solutions at global scale. Within the ecosystem, model-based systems engineering serves as the bridge between rack design to facility deployment, for an AI infrastructure optimized for token performance per megawatt. Quanta Cloud Technology (QCT) and Pegatron are working with Dassault Systèmes to create a live AI factory digital twin configurator to automate rack-to-facility design with increased quality and reduced workload. The adoption of DSX Sim by system manufacturers expands the NVIDIA Omniverse DSX Blueprint ecosystem, deepening integration with software partners Cadence, PTC and Siemens. DSX Flex is powering a commercial, multi-megawatt pilot with Emerald AI and Silicon Valley Power to demonstrate grid-responsive AI factories that can dynamically adjust power consumption in response to utility signals while protecting AI workload performance, helping safeguard grid reliability and affordability for customers while unlocking additional power capacity to support AI growth. Partners are adopting various DSX OS software components for lifecycle management, multi-tenancy, security, health automation, resilience and platform services. Ecosystem partners adopting DSX OS components include Aible, BeyondAI, Bhashini, DCAI, Mirantis, OpenNebula Systems, Rafay, Red Hat, Sarvam, Simplismart, Spectro Cloud, Supermicro, vCluster and Vultr.
- Nemotron 3 Ultra announced: high-speed, leading US open weights intelligence
NVIDIA announces Nemotron 3 Ultra, a high-speed open weights model.
- Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
- Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3
Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's...
- How Cosmos 3 Helps Physical AI Think Before It Acts
How Cosmos 3 Helps Physical AI Think Before It Acts
- NVIDIA DGX Station for Windows Puts a Trillion-Parameter AI Supercomputer on Every Enterprise Desk
NVIDIA today announced NVIDIA DGX Station™ for Windows, the world’s most powerful deskside AI supercomputer designed to build, run and connect always-on AI agents to Windows applications and workflows, capable of running frontier AI models of up to 1 trillion parameters locally.
- NYT: Senator Sanders Proposes Gov't Take 50% Ownership of AI Labs
Quoting from Senator Bernie Sanders Op-Ed in the New York Times today : (...) I will soon be introducing the American A.I. Sovereign Wealth Fund Act. This legislation would give the public a direct ownership stake in the largest A.I. companies in our country. How? It would create a sovereign wealth fund through a one-time 50 percent tax — not on the profits of OpenAI, Anthropic, xAI and other companies, but paid with something far more valuable than that: the stock. If passed, this legislation would do two crucial things. First, it would give the public a direct role in determining the future of this technology. No longer would the future of A.I. and the transformation of human life that it will bring be dictated by a handful of Big Tech oligarchs. The federal government would have the power, through its voting shares and an equal representation on each company’s board, to block decisions that hurt our citizens and to push for policies that help them. Second, this legislation would guarantee that the trillions of dollars potentially generated by A.I. are used to improve the lives of all of us — not simply to make the richest people in the world even richer. If the big A.I. companies continue to grow as rapidly as many analysts expect, then the value of the sovereign wealth fund will grow as well — and the benefits to the American people will grow along with it. As you may know, Senator Bernie Sanders has recently started taking the idea of AGI/ASI much more seriously . Now he's proposing partial nationalization on the premise that AI is a uniquely valuable and important technology. [1] While this particular upcoming bill of his is rather unlikely to pass, I like that ideas are being proposed which are at least somewhat commensurate to the problem. ^ Quote from article: "Artificial intelligence will almost certainly be the most transformational technology in the history of the world." Discuss
- Intel puts agentic AI to work with Xeon 6+, networking, and AI systems
Intel expands its AI‑ready platform across data center, network, and edge—showing why the CPU is at the heart of agentic AI orchestration, scale and data movement
- Japan stations, facilities using AI system to prevent suicide by jumping
KYODO — About 40 stations and commercial buildings in Japan have introduced an artificial intelligence (AI) system to prevent suicide that has helped to save the lives of at least 2 people, according to its developer.
- New York Times Publisher Slams AI Companies' 'Brazen Theft' From News Outlets
New York Times Publisher Slams AI Companies' 'Brazen Theft' From News Outlets Barron's
- New York Times publisher slams AI companies' 'brazen theft' from news outlets
The New York Times publisher on Monday slammed artificial intelligence companies for "brazen theft of intellectual property," warning they threaten the future of journalism during a speech at the World News Media Congress in the French city of Marseille.
- What are AI PCs and what can they do that your computer can’t?
What are AI PCs and what can they do that your computer can’t? The Straits Times
- ASUS Unveils Revolutionary ProArt PCs Powered by NVIDIA RTX Spark at COMPUTEX 2026
ASUS Unveils Revolutionary ProArt PCs Powered by NVIDIA RTX Spark at COMPUTEX 2026 The Straits Times
- Connected vehicle data ‘can have intelligence value’ to adversaries: federal document
Connected vehicle data ‘can have intelligence value’ to adversaries: federal document Automotive News
- This AI Kidnapping Scam Is Every Parent's Worst Nightmare
This AI Kidnapping Scam Is Every Parent's Worst Nightmare PCMag UK
- Cruise giant says six million customers’ personal information was exposed in breach
Cruise giant says six million customers’ personal information was exposed in breach
- VinDynamics Debuts Its First Humanoid Robot At Two Of The World’s Leading Technology Events
VinDynamics Debuts Its First Humanoid Robot At Two Of The World’s Leading Technology Events USA Today
- AI helping build better AI: How agents accelerate model experimentation
How agents accelerate model experimentation with AI
- Young and unemployed? Remote work, not AI, may be the problem, study finds
Young and unemployed? Remote work, not AI, may be the problem, study finds Austin American-Statesman
- Young and unemployed? Remote work, not AI, may be the problem, study finds
Young and unemployed? Remote work, not AI, may be the problem, study finds Boston Herald
- As the Pentagon pushes for battlefield AI, some military leaders urge caution
As the Pentagon pushes for battlefield AI, some military leaders urge caution The Boston Globe
- As the Pentagon pushes for battlefield AI, some military leaders urge caution
As the Pentagon pushes for battlefield AI, some military leaders urge caution Dallas News
- As the Pentagon pushes for battlefield AI, some military leaders urge caution
As the Pentagon pushes for battlefield AI, some military leaders urge caution Boston Herald
- As the Pentagon Pushes for Battlefield AI, Some Military Leaders Urge Caution
AI’s use in the military is part of the administration’s larger push to grow the capability it sees as a unique American advantage. The post As the Pentagon Pushes for Battlefield AI, Some Military Leaders Urge Caution appeared first on SecurityWeek .
- Around 1 in 5 young people use AI chatbots for mental health advice, survey finds
Nearly 1 in 5 adolescents and young adults are turning to AI chatbots for advice when they’re sad, angry, nervous or stressed, according to a new study.
- AI chatbot use and disclosure for mental health among US adolescents and young adults
AI chatbot use and disclosure for mental health among US adolescents and young adults EurekAlert!
- OpenAI’s next legal battle is against states who claim its models are dangerous
OpenAI’s next legal battle is against states who claim its models are dangerous
- Your phone screen doesn't have the same color range as the human eye, and AI widens the gap
A peacock feather in sunlight shifts from blue to green to bronze as you turn it. Photograph it, and this shimmer collapses into one angle, one exposure, one compromise.
- What's the Company Culture Like at Dropzone AI 2026?
What's the Company Culture Like at Dropzone AI 2026? Built In
- What's It Like to Work at Dropzone AI 2026?
What's It Like to Work at Dropzone AI 2026? Built In
- Dropzone AI Company Growth, Stability & Outlook 2026
Dropzone AI Company Growth, Stability & Outlook 2026 Built In
- What's the Work-Life Balance Like at Dropzone AI 2026?
What's the Work-Life Balance Like at Dropzone AI 2026? Built In
- MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost
Big news in enterprise AI broke over the weekend as Chinese AI startup MiniMax released its highly anticipated M3 large language model on Sunday evening Eastern time, pairing frontier-tier coding and agentic performance with a 1-million-token context window and native multimodality for a fraction of the cost of leading proprietary models, with pricing starting at just $20 per month under its new subscription token plans. The company's leadership also announced plans to deliver the model under an open source license including "open weights," allowing for full enterprise downloading and customizability free-of-charge, coming sometime in the next 10 days. For now, it is available via the MiniMax API at a special discounted price of $0.3 per 1 million input tokens and $1.20 per million output tokens (on fresh cache) for the next week — beating proprietary U.S. giants like Google, OpenAI and Anthropic handily on cost, while also eclipsing the performance of the latest models from the former two on selected benchmarks. Even at its full price of $0.6/$2.40 per million input/output tokens, MiniMax-M3 remains at just 8-20% the cost of the leading, proprietary U.S. models. The traditional matrix governing large language model development has long dictated a rigid choice: software developers can either access top-tier closed-source intelligence behind restrictive APIs, or deploy nimble, cost-effective open models that falter on multi-step reasoning, dense coding tasks, and massive data sequences. MiniMax-M3 fundamentally upends this paradigm. By unifying these two historically separated frontier capabilities, M3 introduces a level of comprehensive utility previously restricted to expensive, closed-source ecosystems, effectively shifting the baseline of open-weights systems while drastically minimizing the operational compute footprint required to execute complex development loops. VentureBeat Frontier AI Model API Pricing Snapshot Model Input Output Total Cost Source MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi MiMo deepseek-v4-flash $0.14 $0.28 $0.42 DeepSeek deepseek-v4-pro $0.435 $0.87 $1.305 DeepSeek MiniMax-M3 $0.30 $1.20 $1.50 (limited time only) MiniMax Gemini 3.1 Flash-Lite $0.25 $1.50 $1.75 Google MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi MiMo Grok 4.3 low context $1.25 $2.50 $3.75 xAI GLM-5 $1.00 $3.20 $4.20 Z.ai Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot/Kimi GLM-5.1 $1.40 $4.40 $5.80 Z.ai Grok 4.3 high context $2.50 $5.00 $7.50 xAI Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud Gemini 3.5 Flash $1.50 $9.00 $10.50 Google Gemini 3.1 Pro Preview ≤200K $2.00 $12.00 $14.00 Google GPT-5.4 $2.50 $15.00 $17.50 OpenAI Gemini 3.1 Pro Preview >200K $4.00 $18.00 $22.00 Google Claude Opus 4.8 $5.00 $25.00 $30.00 Anthropic GPT-5.5 $5.00 $30.00 $35.00 OpenAI New MiniMax Sparse Attention (MSA) technique helps keep the model's cost low At the core of the model's efficiency lies an architectural departure from classic Transformer networks. Standard attention mechanisms scale quadratically ($O(N^2)$) , meaning computational and financial costs explode as text inputs lengthen. To combat this "inherent flaw," the engineering team implements MiniMax Sparse Attention (MSA), a clean, extensible sparse attention blueprint. To visualize this innovation, think of traditional full attention as an editor reading an entire library from scratch every time they need to verify a single sentence. MSA acts as an intelligent indexing clerk , using a pre-filtering phase to partition Key-Value (KV) matrices into highly precise blocks. At the operator level, MSA uses a "KV outer gather Q" approach. The system treats KV blocks as an outer loop, dynamically aggregating only the specific queries that hit them. Because each data block is read exactly once and memory access remains strictly contiguous, hardware utilization skyrockets. In internal trials, MSA runs more than 4x faster than alternative open-source solutions like Flash-Sparse-Attention or flash-moba. When managing a maxed-out context length of 1 million tokens, M3’s per-token compute demand drops to just 1/20th of the previous generation model, translating into a 9x acceleration in the prefilling stage and a 15x boost during decoding. Rather than taking a pretrained text network and fusing it with a separate vision model, MiniMax engineered M3 as a natively multimodal system from "Step Zero". The company overhauled its data ingest machinery to blend naturally interleaved sequences of text, images, and visual components, scaling the total pretraining corpus beyond 100 trillion tokens. This deep data alignment enables the model to translate complex visual geometries, such as programming charts or coordinate maps, into structural code without losing contextual fidelity. On standardized assessments, M3 validates this engineering path. The model records a 59.0% on SWE-Bench Pro , an autonomous agent metric, positioning it ahead of closed models like GPT-5.5 and Gemini 3.1 Pro. It achieves a 66.0% on Terminal Bench 2.1, a 74.2% on MCP Atlas, and an 83.5 on BrowseComp—outstripping Claude Opus 4.7’s benchmark score of 79.3 in autonomous browsing and information retrieval. However, when contrasted with Anthropic's newly released, premium frontier model, Claude Opus 4.8, from last week, the competitive ceiling of M3's efficient sparse-attention footprint becomes evident across directly comparable, tool-intensive agent benchmarks. In the domain of pure code modification on SWE-Bench Pro, M3’s 59.0% score drops behind Opus 4.8’s leading 69.2% threshold. A similar performance delta manifests in automated system environments via Terminal-Bench 2.1; while M3’s 66.0% terminal execution score effectively runs neck-and-neck with the previous-generation Opus 4.7 baseline of 66.1%, it trails the upgraded Opus 4.8 architecture, which achieves 74.6%. Furthermore, evaluations tracking continuous GUI interaction on the OSWorld-Verified sandbox place M3’s automated computer use at 70.0%, compared to a higher 83.4% validation rate secured by Opus 4.8. These standardized evaluations illustrate the structural trade-offs currently defining the ecosystem: closed-source systems like Opus 4.8 maintain absolute margin leads on hyper-complex reasoning vectors, yet M3 delivers a highly capable baseline of local, tier-one automated operation without the compounding premium of closed-door API subscription fees. When positioned alongside the heavy-duty inference metrics of the newly minted, fellow open weights model DeepSeek-V4 Pro Max, M3 holds its ground across core agentic categories while asserting narrow advantages in specialized code synthesis. On the software engineering matrix of SWE-Bench Pro , M3's 59.0% resolution efficiency edges past DeepSeek-V4 Pro Max’s score of 55.4%. However, the competitive friction tightens in command-line environments; under Terminal Bench evaluations, DeepSeek-V4 Pro Max pulls slightly ahead with a 67.9% execution accuracy over M3’s 66.0% mark. In web orchestration and open-world browsing simulations, the two architectures reach a virtual statistical parity, with M3 registering an 83.5% on BrowseComp compared to DeepSeek's 83.4%. Similarly, on the MCP Atlas tool-use framework, M3 secures a narrow lead at 74.2% against DeepSeek’s 73.6%. This close alignment demonstrates that while DeepSeek handles a massive 1.6-trillion total parameter footprint with specialized high-effort reasoning modes, MiniMax's block-filtered sparse attention mechanism yields directly competitive execution efficiencies without requiring extensive parameter activation scaling. MiniMax Code AI agent offers Agentic Team capabilities MiniMax translates these architectural gains into immediate utility through an updated product suite divided between standalone applications, customizable subscription tiers, and raw developer infrastructure. For end-user orchestration, the flagship implementation is MiniMax Code , an AI agent product designed to maximize M3's multi-step capabilities. Operating via web or native desktop apps, MiniMax Code runs an "Agent Team" capable of breaking massive engineering tasks into multi-stage, concurrent workflows. The system relies on a "Producer + Verifier" adversarial harness loop. As one agent instance generates code, a secondary verifier instance aggressively tests and reflects upon execution outputs, allowing the network to self-correct and operate autonomously for days without human oversight. Because of its native visual grounding, MiniMax Code supports direct computer use. A developer can issue a cross-application voice prompt via their phone to have the model open a localized enterprise ERP client and batch-populate data tables directly from an open Excel spreadsheet. For custom setups, developers can pipeline M3 directly into existing workflows using an API key ( sk-cp ) compatible with common alternative IDE environments like Claude Code, Cursor, Roo Code, and Cline. The API introduces a toggleable "thinking mode". When enabled, M3 routes processing power into deep reasoning and long-horizon planning; when disabled, the model runs at minimal latency for quick text completion. The companion Token Plan models an aggressive pricing strategy structured around shared multimodal quotas. Billed annually, three options are available: Plus ($20/month) : Supplies ~1.7B tokens per month and handles 3–4 concurrent agents. Max ($50/month) : Supplies ~5.1B tokens per month, manages 4–5 concurrent agents, and adds 3 automated video clips per day via Hailuo 2.3. Ultra ($120/month) : Supplies ~9.8B tokens per month, facilitates 6–7 concurrent agents, and extends video capacity to 5 daily clips. Open weights makes M3 much more attractive for enterprise use MinMax's pledge to release M3 under an open-weights license model—with weights and technical documentation launching on HuggingFace and GitHub within 10 days—carries significant strategic weight for enterprise infrastructure managers. However, it is still to be determined precisely which license the weights will be available under, and whether or not it will be permissible for consumer usage, e.g. MIT, Apache 2.0 or the new OpenMDW license . If so, the calculus looks like this: Feature / Model Attribute Closed API Providers (e.g., GPT-5.5, Opus 4.7) Open-Weights Frontier (MiniMax M3) Data Privacy & Boundaries Requires external API requests; potential data ingestion vectors. Total local isolation; runs entirely inside private user clusters. Custom Optimization Limited to basic fine-tuning wrappers or prompt engineering. Full pipeline control; architecture allows deep adapter/weights customization. Cost Vector Consistency Bound to perpetual per-token API pricing models. Computational demands cut to 1/20th; mitigates hardware ceiling. By shipping the underlying model weights directly to the community, MiniMax departs from the closed-door approach favored by major American AI labs. For enterprise users bound by strict compliance and privacy rules, open weights mean they can run M3 locally on internal hardware. This setup completely removes the risk of data leakage associated with public APIs. Furthermore, it permits engineering teams to run bespoke fine-tuning passes, modify internal architectures, or embed specialized system prompts deep within the model layers—transforming an off-the-shelf system into a highly targeted proprietary asset. Initial community reactions are resoundingly positive The developer ecosystem reacted immediately to M3’s operational benchmarks, singling out its long-horizon autonomous behavior and cost-to-performance profile. A major focal point of discussion is a 12-hour automated verification test where M3 was tasked with reproducing an ICLR 2025 Outstanding Paper Award winner, titled "Learning Dynamics of LLM Finetuning" . As MiniMax's own researcher @MikaStars39 highlighted on X: "M3 ran autonomously for nearly 12 hours, producing 18 commits and 23 experimental figures on its own, and got the core experiments working: it matched the predicted probability trends in the SFT stage clearly observed the squeezing effect central to the DPO experiments validated the Extend mitigation method proposed in the original paper." Simultaneously, creators of developer tools highlighted the practical economic advantages of the model's new attention mechanism. The official team behind the agentic AI coding harness Cline posted an alert confirming day-one compatibility, stating: "The new MiniMax-M3 is their first model to have 1m context, multimodal, and agentic coding capability. Congratulations to @MiniMax_AI for the breakthrough in sparse-attention architecture cutting compute & cost to 1/20th their previous generation." This sharp drop in execution costs shifts how developers view the relationship between financial investment and capability. Tech commentator @jumperz mapped out this disruption, noting how M3 breaks a historical pattern in machine learning pricing: By addressing context scaling limitations through fundamental attention-level optimizations rather than brute-force hardware scaling, MiniMax has established a highly efficient open-source baseline. M3 demonstrates that the next phase of agent development will not just be driven by larger datasets, but by efficient architectural choices that make frontier-level performance accessible to the broader open-source community. For enterprises building autonomous software development or agent infrastructure, MiniMax M3 provides the ultimate "bang for the buck." While DeepSeek-V4 Pro holds a microscopic price advantage of $0.195 per million tokens, MiniMax M3 justifies its marginal premium by delivering superior autonomous software engineering resolution rates (59.0% SWE-Bench Pro). More importantly, because M3 is an open-weights model, the calculation extends far beyond the API chart. By deploying M3's weights locally inside private enterprise clouds, organizations completely bypass cloud data egress tracking, eliminate structural vendor lock-in, and can implement custom prefix-caching models on internal hardware. This technical approach transforms a highly efficient runtime budget into a permanent, privately owned corporate asset.
- China’s MiniMax Launches New Model as Open-Source AI Coding Battle Heats Up
China’s MiniMax Launches New Model as Open-Source AI Coding Battle Heats Up The Information
- MiniMax debuts AI model built for long and complex coding tasks
Chinese artificial intelligence start-up MiniMax has unveiled its latest flagship AI model, M3, designed to anchor the company’s push into coding agents and automated workflows. The Shanghai-based company said on Monday that the model’s redesigned architecture reduced computational requirements to as little as one-twentieth of previous levels, slashing inference costs while boosting response speeds. Notably, MiniMax said M3 could process up to 1 million tokens of data at once – five times more...
- MiniMax Launches M3 Model With 1M Context and Native Multimodal Capabilities
MiniMax released its M3 flagship model, claiming it as the first domestic AI model to combine frontier coding, agentic capabilities, 1M-token context windows, and native multimodal processing in a single architecture.
- MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding MarkTechPost
- eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion
While Large Language Models (LLMs) achieve impressive performance on multi-step reasoning tasks, their reliability is persistently hindered by critical limitations such as unconstrained hallucinations and poor numerical computation. Fundamentally, these issues arise because standard models treat rea...
- Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings
The increasing integration of renewable energy sources into power systems, particularly in buildings equipped with photovoltaic (PV) panels and energy storage systems, introduces significant complexity in energy systems. Volatile power generation, varying electricity tariffs, and increased entities,...
- Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift
Artificial intelligence provides a practical framework for crop damage assessment from imagery data, supporting early decision-making in agricultural management. In peach orchards, climate change increases abiotic stress and biotic pressures, including pests and diseases, which often produce visuall...
- RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network
Medical imaging interpretation is a foundational pillar of modern clinical diagnostics, yet the manual generation of radiology reports remains a time-consuming process prone to interpretation inconsistencies. Within the field of medical AI, automating these descriptions through deep learning promise...
- OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large coll...
- Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association
Multi-view object association is an important computer vision problem that underlies many multi-camera perception tasks. While this task is naturally formulated as a constrained one-to-one matching problem, recent works heavily rely on pairwise ranking metrics like AP and FPR-95 for model evaluation...
- PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing
PlanarBench tests whether LLMs can draw planar graphs as ASCII art given only an edge list -- a spatial reasoning task that resists memorization because edge order, edge orientation, and node labels are all permutable. We evaluate 91 models on the 199 simplest non-isomorphic connected planar graph...
- Why Do Time Series Models Need Long Context Windows?
Modern deep learning models for forecasting groups of time series rely on increasingly longer observation windows. However, the benefit of increasing the window size is often simply attributed to capturing long-range dependencies, and broader discussion on how global forecasting models leverage inpu...
- MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?
Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the g...
- A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision
Industrial anomaly detection has historically been a unimodal task. Recent multimodal vision-language models have produced systems that admit textual input alongside the image and are presented as enabling text-guided zero- and few-shot inspection. Yet these methods are evaluated with protocols inhe...