AI News Archive: May 28, 2026 — Part 2
Sourced from 500+ daily AI sources, scored by relevance.
- Elon Musk Is Already Preparing to Evict Anthropic from SpaceX’s Data Center
Sounds like those CPUs are available to the highest bidder... unless Elon figures out how to make a successful AI model.
Score: 55🌐 MovesMay 28, 2026https://gizmodo.com/elon-musk-is-already-preparing-to-evict-anthropic-from-spacexs-data-center-2000764631 - Tencent bets on AI agents, smaller models in race with Alibaba, ByteDance
Tencent bets on AI agents, smaller models in race with Alibaba, ByteDance Nikkei Asia
- Amazon plans to make SpaceX's Grok models available on its flagship AI service
Amazon plans to make SpaceX's Grok models available on its flagship AI service Business Insider
- Why Tesla’s AI trainers don’t trust its self-driving tech – or its safety stats
Why Tesla’s AI trainers don’t trust its self-driving tech – or its safety stats Reuters
- Researchers automated LLM reasoning strategy design and cut token usage by 69.5%
Test-time scaling (TTS) has emerged as a proven method to improve the performance of large language models in real-world applications by giving them extra compute cycles at inference time. However, TTS strategies have historically been handcrafted, relying heavily on human intuition to dictate the rules of the model’s reasoning. To address this bottleneck, researchers from Meta, Google, and several universities have introduced AutoTTS , a framework that automatically discovers optimal TTS strategies. This automated approach allows enterprise organizations to dynamically optimize compute allocation without manually tuning heuristics. By implementing the optimal strategies discovered by AutoTTS, organizations can directly reduce the token usage and operational costs of deploying advanced reasoning models in production environments. In experimental trials, AutoTTS managed inference budgets efficiently, successfully reducing token consumption by up to 69.5% without sacrificing accuracy. The manual bottleneck in test-time scaling Test-time scaling enhances LLMs by granting them extra compute when generating answers. This extra compute allows the model to generate multiple reasoning paths or evaluate its intermediate steps before arriving at a final response. The primary challenge for designing TTS strategies is determining how to allocate this extra computation optimally. Historically, researchers have designed these strategies manually, relying on guesswork to build rigid heuristics. Engineers must hypothesize the rules and thresholds for when a model should branch out into new reasoning paths, probe deeper into an existing path, prune an unpromising branch, or stop reasoning altogether. Because this manual tuning process is constrained by human intuition, a vast amount of possible approaches remain unexplored. This often results in suboptimal trade-offs between model accuracy and computing costs. Current TTS algorithms can be mapped to a width-depth control space — "width" being the number of reasoning branches explored, "depth" being how far each develops. Self-consistency (SC) samples a fixed number of trajectories and majority-votes the answer. Adaptive-consistency (ASC) saves compute by stopping early once a confidence threshold is hit. Parallel-probe takes a more granular approach, pruning unpromising branches while deepening the rest. All three are hand-crafted, and that's the constraint AutoTTS is designed to break. While some more advanced methods employ richer structures like tree search or external verifiers, they all share one key characteristic: they are meticulously hand-crafted. This manual approach restricts the scope of strategy discovery, leaving a massive portion of the potential resource-allocation space untouched. Automating strategy discovery with AutoTTS AutoTTS reframes the way test-time scaling is optimized. Instead of treating strategy design as a human task, AutoTTS approaches it as an algorithmic search problem within a controlled environment. This framework redefines the roles of both the human engineer and the AI model. Rather than hand-crafting specific rules for when an LLM should branch, prune, or stop reasoning, the engineer's role shifts to constructing the discovery environment. The human defines the boundaries, including the control space of states and actions, optimization objectives balancing accuracy versus cost, and the specific feedback mechanisms. An explorer LLM, such as Claude Code, designs the strategy. This explorer acts as an autonomous agent that iteratively proposes TTS “controllers.” These controllers are code-defined policies or algorithms that dictate how an AI model allocates its computational budget during inference. The explorer tests and refines these controllers based on feedback until it discovers an optimal resource-allocation policy. To make this automated search computationally affordable, AutoTTS relies on an “offline replay environment.” If the explorer LLM had to invoke a base reasoning model to generate new tokens every time it tested a new strategy, the compute costs would be astronomical. Instead, it relies on thousands of reasoning trajectories pre-collected from the base LLM. These trajectories include "probe signals," which are intermediate answers that help the controller evaluate progress across different reasoning branches. During the discovery loop, the explorer agent proposes a controller and evaluates it against this offline data. The agent observes the execution traces of the proposed controller that show it allocated compute over time. By analyzing these traces, the agent can diagnose specific failure modes, such as noting if a controller pruned branches too aggressively in a specific scenario. This provides an advantage over just viewing a final result. The agent then iteratively rewrites its code to improve the accuracy-cost tradeoff. Inside the AI-designed controller Because the explorer agent is not constrained by human intuition, it can discover highly coordinated, complex rules that a human engineer would likely never hand-code. One optimal controller discovered by AutoTTS, named the Confidence Momentum Controller, leverages several non-obvious mechanisms to manage compute: Trend-based stopping : Hand-crafted strategies often instruct the model to stop reasoning once it hits a certain instantaneous confidence threshold. The AutoTTS agent discovered that instantaneous confidence can be misleading due to temporary spikes. Instead, the controller tracks an exponential moving average (EMA) of confidence and only stops if the overall confidence level is high and the trend is not actively declining. Coupled width-depth control : Manually designed algorithms usually treat the "widening" of new reasoning paths and the "deepening" of current paths as separate decisions. AutoTTS discovered a closed feedback loop where the two actions are linked. If the confidence of the current branches stalls or regresses, the controller automatically triggers the spawning of new branches. Alignment-aware depth allocation : Instead of giving all active reasoning branches an equal computation budget, the controller dynamically identifies which branches agree with the current leading answer. It then gives those branches priority "bursts" of extra computation. This concentrates the computational budget on the emerging consensus to quickly verify if it is correct. Cost savings and accuracy gains in real-world benchmarks To test whether an AI could autonomously discover a better test-time scaling strategy, researchers set up a rigorous evaluation framework. The core experiments were conducted on Qwen3 models ranging from 0.6B to 8B parameters. The researchers also tested the system's ability to generalize on a distilled 8B version of the DeepSeek-R1 model. The explorer AI agent was initially tasked with discovering an optimal strategy using the AIME24 mathematical reasoning benchmark. This discovered strategy was then tested on two held-out math benchmarks, AIME25 and HMMT25, as well as the graduate-level general reasoning benchmark GPQA-Diamond. The AutoTTS discovered controller was pitted against four manually designed test-time scaling algorithms in the industry. These baselines included Self-Consistency with 64 parallel reasoning paths (SC@64), Adaptive-Consistency (ASC), Parallel-Probe, and Early-Stopping Self-Consistency (ESC). ESC is a hybrid approach that generates trajectories in parallel and stops early when an answer seems stable. When set to a balanced, cost-conscious mode, the AutoTTS-discovered controller reduced total token consumption by approximately 69.5% compared to SC@64. At the same time, the controller maintained the same average accuracy across the four Qwen models. When the inference budget was turned up, AutoTTS pushed peak accuracy beyond all handcrafted baselines in five out of eight test cases. This efficiency translated to other tasks. On the GPQA-Diamond benchmark, the balanced AutoTTS variant slashed the inference token cost from 510K tokens down to just 151K tokens, while slightly improving overall accuracy. On the DeepSeek model, AutoTTS achieved the highest overall accuracy on the HMMT25 benchmark while cutting the token spend nearly in half. For practitioners building enterprise AI applications, these experiments highlight two major operational benefits: Raising peak performance: AutoTTS doesn't just save money on token consumption. It actively raises the peak attainable performance of the base model. The AI-designed controller is remarkably good at detecting noisy or unproductive reasoning branches on the fly and continuously redirecting its compute budget toward the branches generating the most useful reasoning signals. Cost-effective custom development : Because the framework relies on an offline replay environment, the entire discovery process cost only $39.90 and took 160 minutes. For enterprise teams, that means optimized reasoning strategies tailored to proprietary models and internal tasks are now within reach — without a dedicated research budget. Both the AutoTTS framework and the Confidence Momentum Controller are available on GitHub; the CMC can be used as a drop-in replacement for other TTS controllers.
- YouTube will let you ask AI to make a custom video feed
Just describe the kind of videos you want to see, from workout guides to hobby inspiration.
Score: 55🌐 MovesMay 28, 2026https://www.theverge.com/streaming/938759/youtube-custom-ai-feed-prompt-availability - Mistral strikes deals with BMW and Airbus in ‘industrial AI’ push
Mistral strikes deals with BMW and Airbus in ‘industrial AI’ push
- QumulusAI and Shadeform Deploy Two NVIDIA H200 Clusters Totaling 680 GPUs for Leading AI Inference Platforms
QumulusAI and Shadeform Deploy Two NVIDIA H200 Clusters Totaling 680 GPUs for Leading AI Inference Platforms USA Today
- Majestic Labs Raises $100M for Memory Pooling AI Server
Server architecture will offer up to 100 TB of DRAM per accelerator. The post Majestic Labs Raises $100M for Memory Pooling AI Server appeared first on EE Times .
Score: 55💰 MoneyMay 28, 2026https://www.eetimes.com/majestic-labs-raises-100m-for-memory-pooling-ai-server/ - Calling Doctor GPT: AI responses to healthcare queries are nearly 76% accurate
Calling Doctor GPT: AI responses to healthcare queries are nearly 76% accurate EurekAlert!
- Humanoid Robots Are Now Part of the War Machine—And America’s Newest ‘Soldier’ Is Ready for Action
Will humanoids redefine how battles are fought and won?
Score: 54🌐 MovesMay 28, 2026https://www.popularmechanics.com/military/a71423388/humanoid-robots-are-already-in-combat/ - Swiss AI mouse aims to replace live animal testing
Swiss researchers have created an artificial intelligence (AI) version of a mouse that would replace live animal testing by assessing the effects of new active compounds on a computer. +Get the most important news from Switzerland in your inbox The Swiss Federal Laboratories for Materials Science and Technology (Empa) said the model could be used as a decision-making aid in drug development. It calculates how nanoparticles with certain properties are distributed in the organism. This is particularly relevant for the development of therapies against brain tumours, as nanoparticles can cross the blood-brain barrier. Join the debate: Researcher Jimeng Wu developed the so-called physiologically based pharmacokinetic model. She used 18 previous studies with mice as a data basis. With the help of machine learning, the model can adapt its parameters to the properties of the respective nanoparticle. The AI tool makes it possible to virtually test which particles are suitable for a task ...
- Cognizant deploys Anthropic’s Claude AI to modernise Travelport’s travel platform
Cognizant deploys Anthropic’s Claude AI to modernise Travelport’s travel platform Techcircle
- First Look at AI Camera and Photos Features in iOS 27 Revealed
Apple is planning sweeping AI-driven upgrades to its Camera and Photos apps in iOS 27 , Bloomberg 's Mark Gurman reports . Image via Bloomberg . The report offers a first look at the appearance of several major iOS 27 features that Apple plans to announce at its Worldwide Developers Conference on June 8. The images are based on information viewed by Bloomberg and people said to be familiar with Apple's plans. The Camera app is set to gain a dedicated Siri mode, positioned alongside existing options like Photo and Video. Gurman reports the feature would replace the current Visual Intelligence experience, allowing users to photograph objects and have them analyzed by a third-party AI agent or run through a Google reverse image search. By elevating the capability directly into the Camera app rather than restricting it to the Camera Control button, Apple reportedly aims to increase adoption and help acclimate users to visual AI ahead of future products, including smart glasses and camera-equipped AirPods. The Camera app is also said to be getting a new "Add Widgets " panel that makes the interface more customizable. The top row of shortcuts currently displayed across capture modes would become replaceable, letting users prioritize more professional controls such as depth adjustments, or surface tools like timers and Night mode more prominently in the interface. Gurman says the changes are aimed at making Apple's camera software more appealing to advanced photographers. Image via Bloomberg . The Photos app is set to receive new Apple Intelligence tools called "Reframe" and "Extend." Reframe would allow users to change the perspective of a photo, while Extend uses AI to generate additional portions of an image, such as filling in the lower half of a building that was cut off in the original shot. Apple is also said to be testing natural language prompt-based photo editing, which would let users request specific edits by voice or text, such as cropping or adjusting colors. Gurman notes that this specific feature may not arrive in the first version of iOS 27. Elsewhere in iOS 27, the Shortcuts app is said to be getting a significant overhaul that would allow users to create automations using natural language. Instead of manually building workflows step by step, users can describe what they want to happen; Gurman's example has a user setting up a routine that automatically starts a music playlist and sends a spouse an ETA when they begin driving home from work. Bloomberg previously reporte ond AI-created wallpapers , a systemwide grammar checker for text input, and a revamped Image Playground app offering improved quality for AI-generated pictures and Genmoji custom emoji. See Bloomberg 's full report for more information. Related Roundup: iOS 27 Tags: Apple Intelligence , Bloomberg , iOS 27 , Mark Gurman , Photos This article, " First Look at AI Camera and Photos Features in iOS 27 Revealed " first appeared on MacRumors.com Discuss this article in our forums
Score: 54🌐 MovesMay 28, 2026https://www.macrumors.com/2026/05/28/ios-27-ai-camera-and-photos-features/ - Investigating the potential use of frontier AI models for offensive cyberattacks: A human uplift study
The UK Artificial Intelligence Security Institute (UK AISI) commissioned RAND to conduct a human uplift study to evaluate the impact of AI access on offensive cyber operations among lower-skilled threat actors.
- Anthropic Says Its Latest Claude Model Is the ‘Most Honest’ Yet
Hallucinations are a big problem for AI. Anthropic says Opus 4.8 is ‘less likely’ to make stuff up.
Score: 53🤖 ModelsMay 28, 2026https://www.inc.com/ben-sherry/anthropic-says-its-latest-claude-model-is-the-most-honest-yet/91351657 - Naver to pump 1 trillion won into content to power AI evolution
Naver Chief Data and Contents Officer Kim Kwang-hyun speaks during a media roundtable at the Plaza Hotel in Jung District, central Seoul, on May 28. [NAVER] In the midst of a fierce AI race, Korean internet giant Naver plans to invest 1 trillion won ($665 million) over the next five years in content that will power its AI search engines, ads and recommendations. The company plans to roll out a new rewards program for content creators and expand its AI features to strengthen its platform ecosystem and differentiate its AI search services through creator-generated content and proprietary data, as competition intensifies among global tech companies developing generative AI. Related Article YouTube to begin automatically detecting, labeling photorealistic AI content Naver considers acquiring food delivery app Baemin from Germany's Delivery Hero Social media app Setlog gives young Koreans a real-time, unvarnished glimpse at their friends' lives The initiative will support Naver’s user-generated content (UGC) ecosystem, where around 20 million creators produce 630 million pieces of content each year, according to the company, which unveiled the plan during a media roundtable on its AI-era data and content strategy at a hotel in the Jung District, central Seoul, on Thursday. The move comes as competition among global Big Tech companies in generative AI continues to intensify and the performance rankings for the latest AI models frequently change within just a few months. As it becomes harder for companies to stand out based on AI performance alone, Naver is positioning the quality of creator-generated content and data as a key competitive advantage. “Naver has built its own creator ecosystem since the early days of search engines," said Kim Kwang-hyun, the chief data and content officer at Naver. "We believe [our creator ecosystem] is our biggest asset for sustainable growth in the AI era." The company plans to introduce Naver Mate, a new content creator rewards program, scheduled for launch in June. Through the new program, the company will select around 3,000 outstanding content creators every month across its UGC services, such as blogs and online community forums. Naver headquarters in Seongnam, Gyeonggi, on Nov. 5, 2025 [YONHAP] Selection criteria will include how frequently a creator’s content is cited by AI Briefing, Naver’s AI-powered search summary feature. Selected creators will receive greater exposure across major Naver services, such as its integrated search and AI Briefing, along with a monthly budget of at least 300,000 won each. Among those selected, Naver will choose the top 10 creators across 10 categories such as travel, lifestyle and tech for a total of 100 creators who will each receive 3 million won per month. The top-ranked creator in each category will receive 10 million won per month. Naver plans to expand the selection pool to include creators on its short-form video platform Clip in the second half of the year. Naver headquarters in Seongnam, Gyeonggi on Nov. 5,2025 [YONHAP] Naver also plans to expand its AI search services using the vast amount of content accumulated on its platform. The offerings include AI Briefing, the company’s AI-powered search summary feature introduced in March last year and the conversational AI search service AI Tab, which was launched in beta last month. The platform will launch a newly enhanced version of its Smart Lens search service by the end of next month. The service lets users point their camera at an object to instantly access information and take direct action, such as making a payment or reservation. Although Naver continues to post solid earnings, the company faces growing pressure to distinguish itself as global Big Tech firms intensify competition in AI platforms. Naver argues that it is uniquely positioned to offer everything from search and shopping to reservations within a single platform. The key, however, is how effectively the company can deliver a differentiated experience to users throughout that process. “Our goal is to deliver a seamless AI search experience,” said Kim Sang-bum, head of Naver’s search platform division. "Our goal is to provide fully optimized AI search by leveraging product-native large language models that learn from users’ journeys from search to purchase, extensive data tools and ‘harness engineering’ that efficiently manages them.” This article was originally written in Korean and translated by a bilingual reporter with the help of generative AI tools. It was then edited by a native English-speaking editor. All AI-assisted translations are reviewed and refined by our newsroom. BY HONG SANG-JI [lee.jiwon10@joongang.co.kr]
- Vance warns AI shouldn’t outrank humans in war
Vice President JD Vance said he worries about how emerging AI will be used in warfare, urging graduating Air Force cadets not to allow technology to supersede their judgment.
Score: 52🌐 MovesMay 28, 2026https://www.nbcnews.com/politics/jd-vance/vance-warns-ai-not-outrank-humans-war-rcna347357 - Elite law firm to splash £370m on building own AI tool
The world’s largest law firm by revenue, Kirkland & Ellis, is investing £370m ($500m) to develop its own custom-built AI platform, challenging competitors that rely on widely used third-party AI tools. The US-based firm, which has a large office in the City, said it plans to spend hundreds of millions over the next three to [...]
Score: 52🌐 MovesMay 28, 2026https://www.cityam.com/elite-law-firm-to-splash-370m-on-building-own-ai-tool/ - Alibaba’s new AI model climbs coding rankings
Code Arena, previously called WebDev Arena, uses blind votes on anonymized model outputs to compare how well AI systems build web applications from user prompts.
- Exclusive: Orbital Industries, startup using AI to discover exotic new materials, raises $50 million Series B funding round
Exclusive: Orbital Industries, startup using AI to discover exotic new materials, raises $50 million Series B funding round Fortune
- How DeepSeek’s radical architecture is shattering Silicon Valley's token moat
DeepSeek’s announcement over the weekend that it has made its 75% price cut permanent on its flagship V4 Pro model is a disruptive assault on the capital-heavy business models of Silicon Valley’s frontier labs. The reduction on DeepSeek V4 Pro directly undercuts comparable Western models used as workhorses for enterprise production. It is 7x cheaper on inputs and 17x cheaper on outputs than Anthropic’s Claude Sonnet or OpenAI’s GPT 5.5-Med, while the lightweight DeepSeek V4 Flash undercuts entry-tier alternatives like Claude Haiku by 10x to 25x . The price cuts are enabled by a series of hardware-software innovations, especially around cache, that make DeepSeek's models radically more efficient to run. When hosted natively in China, DeepSeek’s cache-read pricing is a whopping 87x cheaper than Western clouds — a deflationary floor so aggressive that handset giant Xiaomi just moved to match the exact pricing tier for its newly deployed MiMo architecture. DeepSeek V4 Pro’s performance is ranked almost on par with Western frontier models , hitting 80.6% on coding-agent tasks via the SWE-bench Verified leaderboard and an elite reasoning score of 87.5 on the advanced MMLU-Pro technical index . Both V4 Pro and V4 Flash — a hyper-optimized speedy version for developers — are open-weight and issued under a permissive MIT license. This gives enterprises complete flexibility over deployment. This dual-model strategy allows technical teams to route their heaviest, multi-step autonomous agent workloads to the lightning-fast Flash model, while reserving the heavy Pro model for deep reasoning tasks, drastically lowering costs at a time when budget concerns have grown considerably. This also comes at a time when the closed Western labs, in particular OpenAI and Anthropic, face an intense return-on-investment scrutiny for their multi-billion dollar general-purpose hardware infrastructure investments. This deflationary collapse will not affect all Silicon Valley labs equally, signaling a permanent bifurcation of the enterprise AI market. While a premium, deterministic tier will endure for mission-critical engineering workflows, the high-volume background agentic layer is being completely commoditized by open weights. Ultimately, it creates a much more dangerous exposure for OpenAI — whose revenue mix relies heavily on general-purpose commodity API streams — than for software-insulated peers like Anthropic. The token cost crisis Uber says it burned through its entire 2026 budget for Claude Code and Cursor in just the first four months of the year; its COO said that the cost related to high token usage by some of its engineers was getting “harder to justify ” without better products to show for it. Airbnb's Brian Chesky said last year that while the company uses OpenAI's latest models, they don't rely on them heavily in production — favoring faster, cheaper alternatives like Alibaba's Qwen. And in the latest episode of VentureBeat’s podcast Beyond the Pilot, Pinterest CTO Matt Madrigal confirmed that the company went all-in on an open-source AI strategy , post-training Alibaba’s open Qwen model on the company’s proprietary "taste graph" to drive Pinterest’s assistant — achieving frontier-like quality at a 90% reduction in costs. DeepSeek’s subsequent price drop makes the possibility of such cost differences even greater . [Looking for the blueprint? The token-cost crisis and hardware-software alignment covered in this piece are driving the agenda at VB Transform 2026 on July 14-15. Built specifically for technology executives and AI practitioners deploying autonomous enterprise systems, the event features dedicated sessions on agentic infrastructure architecture, compute density optimization, and real-world post-mortems from engineering leads moving away from closed loops. Review the speaker lineup and secure your pass here: https://venturebeat.com/vbtransform2026 ] Geopolitical headwinds and compliance defenses Widespread enterprise adoption of Chinese models faces massive geopolitical headwinds in the West. For highly regulated U.S. giants in finance, healthcare, and defense, getting comfortable with DeepSeek will take time. Even though an open-weights architecture under an MIT license allows a company to self-host the model locally and prevent active data exfiltration to foreign servers, corporate compliance boards remain deeply paranoid over software supply chain risks, potential hidden backdoors, and the legal threat of sudden federal sanctions. Smaller, more nimble software teams, on the other hand, face far less bureaucratic gridlock. Free from multi-month security review cycles, these fast-moving organizations view the immediate 75% infrastructure savings as a massive competitive edge worth deploying right now The OpenRouter clearinghouse: mapping global token traffic Take the token usage metrics on OpenRouter, a leading public proxy for what models are the most popular among developers. OpenRouter allows developers an easy way to compare and deploy models, and while its data is by no means a full proxy for real model popularity — it confirms this structural migration is already taking place within company data pipelines. DeepSeek V4 Flash model has captured the No. 1 position on the OpenRouter leaderboard over the past week, surging 48% in token usage. Its advanced counterpart, V4 Pro, sits at No. 6. DeepSeek’s top three models processed nearly 6 trillion tokens on OpenRouter over the past week, giving it a huge lead over other competitors. For example, OpenAI’s premium model, GPT-5.5, has slipped down to No. 15 at 470B tokens. It’s not clear exactly how much of the world’s token traffic is on OpenRouter. Conservative estimates put it at about 3%. It does not show the massive amounts of tokens being served by the APIs offered directly to developers by companies like Anthropic, OpenAI and Google. But recent estimates suggest OpenRouter processes between 15 and 40% of each of OpenAI’s and Google’s token usage , and growing, making it a significant indicator of relative trends regardless of the exact percentage it represents. While skeptics often dismiss aggregator traffic as an indie developer signal rather than a reflection of Fortune 500 IT spend, the corporate pipeline reality is shifting. An infrastructure analysis by a leading venture capital firm, Andreessen Horowitz , revealed that enterprise production environments deploy a median of 14 different models simultaneously to price-route workloads and avoid single-vendor lock-in. This structural architecture shift is why OpenRouter recently secured a massive $113 million Series B funding round backed directly by the big enterprise data and software vendors that serve corporate America — including ServiceNow Ventures, Snowflake Ventures, Databricks Ventures, Nvidia's NVentures, and Google’s CapitalG. Stripe also cited OpenRouter’s enterprise customers in its decision to partner closely with the company . That’s why DeepSeek’s surge on this leaderboard is so eye-opening. DeepSeek itself offers an API directly to developers, and so it too delivers more token traffic than what OpenRouter lets on. Beyond chatbots: the rise of multi-step autonomous agents The DeepSeek spike on OpenRouter indicates a deeper structural shift in how automated software architectures consume machine intelligence. Technical teams are moving beyond using trivial, single-turn chatbots, and starting to deploy more sophisticated autonomous agents that persist for hours at a time — recursively looping through codebases and data lakes. Their huge number of tool calls, and continuous rereading of long context histories, means AI token consumption expands exponentially. Running these recursive loops on closed, premium Western APIs quickly creates unsustainable infrastructure costs. While corporate tech teams spent last year experimenting freely with early, single-turn prototypes without worrying about budgets, the onset of token-prolific autonomous agents has triggered an enterprise line-item crisis. VentureBeat's Q1 2026 research, which surveyed enterprise users at organizations with over 100 employees (n=65, in the U.S. software, finance and healthcare industries), confirms the shift: “Cost per token or licensing model” jumped from 25.4% in January to 36.7% in March, trailing only raw performance as the primary selection criterion for enterprise buyers. DeepSeek target-optimized its weights for this specific trend of agentic high-token use. It has locked in on a standard input cost of $0.435 per million tokens and a standard output rate of $0.87 per million tokens, alongside a rock-bottom prefix-cached read cost of $0.003625 per million. It's this third cost item — for cache — which is arguably the most significant. “If you measure how all of these agents now are using tokens, 80 to 90% of the tokens are cache-read tokens,” said Val Bercovici, Chief AI Officer at WEKA, a company that provides fast storage for much of this cache. “Which means that [that price] is almost by far the most important price, making the others irrelevant — nearly a rounding error. So what DeepSeek did is not just say we're going to be 5% cheaper, 10% cheaper, 20% cheaper. They're like 87x cheaper on that cache-read price with DeepSeek V4 Pro. So that's really set the industry on notice.” The infrastructure coup: Decoupling HBM from Context DeepSeek's core innovations are around hardware-software alignment. This is where we get a little technical. While Western frontier labs like OpenAI have prioritized performance at all cost, they’ve invested billions into uncompressed "dense" neural architectures. DeepSeek, by contrast, has systematically sought to extract maximum intelligence from lower grade hardware, given that they’ve lacked access to Nvidia’s GPUs. By pioneering deep software optimizations as early as its V2 architectures in 2024 , the lab engineered a series of four interconnected hardware-software alignment breakthroughs that decoupled a model's operational context from expensive computing overhead: Breakthrough 1: Sequence Dimension Compression via CSA and HCA The transformer architecture that most LLMs use is bottlenecked by something called the Key-Value (KV) cache. As an agent executes long, multi-step sessions, historical context keys clog the high-bandwidth memory (HBM) on the GPU, causing severe latency spikes and an expensive infrastructure tax. DeepSeek resolved this structural bottleneck by introducing a hybrid attention mechanism — documented in the DeepSeek V4 Architecture Paper — that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to cut overall KV-cache usage by a massive 90% across its 1-million-token context window. While traditional models try to keep a unique memory log for every individual word, DeepSeek compresses the rows of its memory cache. CSA acts as a local filter, condensing small windows of text into concise, indexable blocks so the model doesn't sweat the fine-grained details. HCA acts as an aggressive global index, crushing massive spans of text deep within a session's history into high-density summaries. By interleaving these layers, DeepSeek shrinks millions of memory rows down to a fraction of their size. Breakthrough 2: Native memory offloading via Multi-head Latent Attention (MLA) Using something called Multi-head Latent Attention (MLA), DeepSeek strips the active memory footprint of its context history down to a fraction of standard models. It achieves this by running a physical division of labor between hardware chips . While traditional models force expensive GPUs to hold a session's entire history, DeepSeek’s architecture keeps only the tiny, highly compressed search index tags (the Keys) on the GPU. Meanwhile, it offloads the heavy data payloads (the Values) entirely into cheaper system memory and local storage tiers . Once the GPU handles the high-speed matching to find relevant data, it calls the values from storage only on an as-needed basis. DeepSeek’s architecture is so different that the inference engines that load an AI model's weights into GPU memory, in order to be ready for prompting, are being stretched. The three most popular engines — Nvidia TensorRT-LLM, the UC Berkeley one, SGLang and the really popular vLLM — “are all being stretched to keep up with being able to offer it, which is not normal,” explains WEKA's Bercovici. "Every other open model has had some similarity to other open models. This one from DeepSeek is just built different." DeepSeek's software engineering means its massive 1.6-trillion parameter model requires an astonishingly tiny 5.48 GB of HBM to hold a 1-million-token context loop in production, according to calculations by an analyst using hardware modeling benchmarks. For comparison, smaller models utilizing standard Western architectures choke up to 89 GB of HBM under the exact same context load. Model Framework / Metric Tier Active HBM Needed (1M Context) Context Length Capacity Multi-Step Cached Economics DeepSeek V4-Pro (1.6T MoE) 5.48 GB 1,000,000 tokens 80% to 90% of workflow tokens Qwen3-235B-A22B (GQA Standard) 89.00 GB 1,000,000 tokens Subject to steep hardware tax GPT-5.5 / Claude 4.7-class ( Western Frontier / MoE) 180+ GB 1,000,000 tokens Prohibitive premium infrastructure tax DeepSeek’s extreme compression of the KV cache down to 5.48 GB of HBM is also a calculated geopolitical strategy to bypass U.S. export bans on top-tier Nvidia GPUs. By reducing the need for HBM and Nvidia’s CUDA ecosystem, DeepSeek’s software design allows frontier AI to run efficiently on domestic, lower-cost, and unsanctioned Chinese storage tiers like NAND flash, commodity SSDs, and LPDDR memory (produced by domestic giants like YMTC and CXMT). Breakthrough 3: Ultra-Low Footprint Inference via FP4 Quantization-Aware Training (QAT) To keep compute costs low over massive context windows, DeepSeek moved away from the old approach of scanning bulky, uncompressed numbers every time the model searches its memory. Instead, as detailed in the DeepSeek V4 Technical Report , the architecture runs an advanced form of data compression directly on the active pathways it uses to find information during training. This compression slashes memory demands to deliver a 2x hardware speedup, yet it maintains a near-flawless 99.7% accuracy in how the system targets and indexes specific data blocks. This engineering win allows enterprise workflows to process massive, multi-step agent tasks smoothly while keeping an exceptional 83.5% retrieval accuracy on extreme, million-token "needle-in-a-haystack" benchmarks—eliminating performance lags without draining expensive GPU power. Breakthrough 4: Ultra-scale training stability via manifold-constrained hyper-connections (mHC) Training a 1.6-trillion parameter model creates instability risk — causing too many data pathways and processing signals to cascade out of control, crashing the run. DeepSeek resolved this with a framework called Manifold-Constrained Hyper-Connections (mHC), which uses a balancing routine to force the model's internal data tables to always sum to one — a mathematical safety valve that lets complex data move through deep networks without runaway spikes. The infrastructure pivot: rebuilding corporate plumbing DeepSeek’s significant architectural cache efficiency alters the underlying unit economics for the cloud platforms hosting these models. On developer aggregators like OpenRouter, where third-party providers routinely offer advanced endpoints at a loss, to capture developer mindshare, this hardware-software decoupling alters the balance sheet. DeepSeek's extremely low cost likely gives DeepSeek a profit, at least when it comes to serving the model in China, Bercovici said. This transformation in provider-side unit economics is mirrored on the buy-side, which shows a structural change happening across enterprise IT budgets. VentureBeat's Q1 2026 AI Infrastructure and Compute tracker survey — which tracks enterprise technology buyers at organizations with over 100 employees (n=53 in January, n=39 in February) across software, financial services, healthcare, and manufacturing sectors — revealed that enterprise adoption of custom, self-managed inference stacks utilizing open-source frameworks like Triton, vLLM, Ray, and Kubernetes surged from 11.3% to 17.9%. Because these software layers allow corporate engineering teams to deploy open-weights architectures natively across their own clusters, they act as an operational escape hatch from closed cloud ecosystems. This software shift is paired with an aggressive hardware migration: enterprise workloads moving to specialized, inference-first AI clouds like CoreWeave, Lambda, and Crusoe grew from 30.2% to 35.9% in the latest survey window. These infrastructure metrics indicate that corporate technology leaders are no longer just prototyping with open alternatives; they are actively laying down the physical plumbing required to host architectures like DeepSeek V4 independently, increasingly pricing away the premium markup of Western API gatekeepers. The strategic split for Western labs This baseline cost reduction could soon fracture the competitive field in Silicon Valley, by rewriting the expectations for labs attempting to yield a return on massive infrastructure investments. For now, though, the Silicon Valley music is unlikely to stop anytime soon. Anthropic remains on an extraordinary enterprise trajectory, driven by widespread adoption of Claude Code and its codebase-aware terminal execution. For enterprise engineering teams, paying a premium for Anthropic's deterministic accuracy makes perfect sense for core production software development. Yet even an elite frontier lab scaling at this pace must watch DeepSeek with caution: an open-weights architecture under an MIT license offering near-frontier utility at a 75% cost reduction places downward pricing pressure on the high-volume operational layers of any multi-agent system. The primary structural margin squeeze may land more squarely on OpenAI, despite its aggressive pivot toward a multi-cloud footprint. To support its staggering consumer and API token volumes, OpenAI fundamentally altered its historic seven-year exclusive alliance with Microsoft, unbundling its distribution so it can serve models across Azure, Oracle, AWS, and Google Cloud. Yet this multi-cloud strategy, while providing raw capacity at scale, leaves the company intensely exposed to infrastructure commodity pressure. Unlike Anthropic, which has successfully insulated its margins by embedding its models into premium, high-utility software environments like Claude Code, a massive portion of OpenAI's enterprise revenue relies on high-volume, general-purpose API token streams. To be fair, Western labs have already begun quietly retreating from this territory — aggressively launching deep batch API discounts, prompt caching features, and lightweight entry models to stem the bleed. Yet this tactical retreat only reinforces the structural crisis: Silicon Valley is actively conceding the high-volume commodity layer because they know they cannot defend its margins. When those exact same automated background workflows can be handled natively by highly intelligent open weights like DeepSeek V4, defending a premium price point for raw cloud text completion ceases to be a defensible strategy. More significantly, unlike OpenAI or Anthropic, DeepSeek has much less interest in urgently building consumer wrappers or locking developers into subscription frameworks. Instead, DeepSeek is positioned for a longer-term ecosystem play . Supported by a massive state-backed funding round led by China’s "Big Fund" — which has pushed the startup's targeted valuation into the $10 billion to $45 billion range — the lab’s more likely objective is to prove the viability of a self-sufficient, independent Chinese AI hardware stack that could one day be worth up to $10 trillion . Premium deterministic tier (Anthropic / OpenAI / Google) High-volume agentic tier (DeepSeek / open ecosystems) • Core Codebase Refactoring • Strict Corporate Compliance & Guardrails • Mission-Critical Financial/Legal Precision • High CapEx / R&D Premium Margins • Recursive Multi-Agent Loops • Prefix-Cached Autonomous Tool Swarms • Massive Real-Time Ingestion Logs • Bare-Metal / Optimized HBM Economics The operational division between western labs and models like DeepSeek V4 Pro is already showing up. Financial company Ramp benchmarked automated cybersecurity agent swarms , and showed that while DeepSeek V4 Pro completely flatlines on the most complex security logic, it achieves a flawless 100% detection rate on high-volume baseline tasks like cloud configuration triage — significantly outperforming OpenAI’s GPT-5.5 (44%). For an enterprise CISO, the strategy is clear: You offload the high-volume token burn of routine background noise to cheap open weights, and reserve premium frontier models strictly for the high-level reasoning required to catch the most sophisticated flaws. The enterprise verdict For IT operations directors and data pipeline managers, the choice to migrate to an open architecture like DeepSeek V4-Pro is a smart governance decision. The open model gives companies total architecture control, allowing them to host it on-premise or via any specialized cloud layer they choose. Crucially, it provides enterprise infrastructure leads with a strategic operational fallback that closed vendors can’t match: the power to download raw model weights and execute them privately for zero marginal token cost if public cloud pricing or API access conditions change. The assumption that closed frontier labs hold a permanent monopoly on useful enterprise reasoning has collapsed. While engineering directors will continue to pay a premium to protect specialized, deterministic workflows, the financial foundation of the frontier lab model has fundamentally shifted. By diverting the immense, day-to-day token volume of recursive background agents onto highly optimized, open-source clusters, enterprise teams are starving proprietary clouds of their highest-margin fuel. Silicon Valley’s multi-billion dollar token moat didn't just narrow — it was completely drained from the bottom up.
- AI part of another tech layoff as Wix CEO announces 20% workforce cut
Israel-based web development company Wix is cutting about 20% of its workforce, CEO Avishai Abrahami said.
- Humanoids dance and thread needles as Japanese robotics developers look to outdo Chinese
The Humanoids Summit Tokyo showcases advanced robotics, highlighting China's growing influence
- IBM, Red Hat Launch Project Lightwell to Secure Open Source Software from Frontier Models
IBM, Red Hat Launch Project Lightwell to Secure Open Source Software from Frontier Models DevOps.com
Score: 51🌐 MovesMay 28, 2026https://devops.com/ibm-red-hat-launch-project-lightwell-to-secure-open-source-software-from-frontier-models/ - Torc Robotics and Mila team up on physical AI for autonomous trucks
Torc Robotics Mila partnership advances physical AI for autonomous trucking. Torc is now Mila's only autonomous trucking company with dedicated research space in Montreal. The post Torc Robotics and Mila team up on physical AI for autonomous trucks appeared first on FreightWaves .
Score: 51🌐 MovesMay 28, 2026https://www.freightwaves.com/news/torc-robotics-mila-partnership-physical-ai-autonomous-trucks - Merom Coal Plant to Power Google & Amazon Data Centers; Hoosiers at Risk for Footing the Bill
INDIANAPOLIS — NIPSCO GenCo recently signed a 12-year contract with Halldor Energy Company to power new Google (Michigan City) and Amazon data centers with its Merom coal-fired power plant. The Michigan City data center would be Google’s first in the country to contract for a coal plant’s capacity, to Sierra Club knowledge– ... [continued] The post Merom Coal Plant to Power Google & Amazon Data Centers; Hoosiers at Risk for Footing the Bill appeared first on CleanTechnica .
Score: 50🌐 MovesMay 28, 2026https://cleantechnica.com/2026/05/27/merom-coal-plant-to-power-google-hoosiers-at-risk-for-footing-the-bill/ - Should AI companies be legally obligated to report a human user contemplating violence?
On Feb. 10, 2026, an 18-year-old woman, Jesse Van Rootselaar, killed eight people and herself in a mass shooting in Tumbler Ridge, British Columbia. OpenAI had previously flagged her ChatGPT conversations as having a disturbing fascination with extreme violence, and suspended her account, but reportedly the company did not notify law enforcement. On Oct. 2, 2025, a young man named Jonathan Gavalas in Jupiter, Florida, took his own life after developing what his father’s lawsuit described as a romantic attachment to Google’s Gemini chatbot . The suit claimed that Gemini coached Gavalas to shed his own body. The suit said Google had flagged Gavalas’s account 38 times over five weeks for sensitive content, but didn’t restrict or cut off the account. These tragedies and others show that generative AI can potentially play a role in harming people, organizations and the environment . I’m a legal scholar who has focused on AI liability for nearly a decade and explored new ways of analyzing AI companies’ responsibilities. In my view, cases like these force questions the legal community has not come to terms with: If an AI company becomes aware of warning signs about harm, does it have a legal obligation to at least warn the appropriate authorities? And if the company doesn’t intervene, should its failure to act be considered negligence? A need to raise red flags U.S. tort law provides a framework for thinking about this type of responsibility. In 1969 a University of California psychiatric patient named Prosenjit Poddar told his therapist he intended to kill a woman named Tatiana Tarasoff. The therapist notified campus police, who briefly detained Poddar but eventually let him go. Nobody warned Tarasoff, and Poddar killed her shortly after. Her family sued the university, arguing that its lack of warning amounted to negligence. In 1976 the California Supreme Court ruled that when a mental health professional has good reason to believe a client poses a serious danger to an identifiable person, they have a legal duty to take reasonable steps to protect that person, including warning them or notifying law enforcement. Today, most U.S. states recognize some version of the Tarasoff duty to protect or warn. The logic is simple: If you have special knowledge of a serious threat and are in a position to address it, even if only to warn the authorities or the potential victim, the law may require you to act. But does that logic apply to AI companies? The argument for yes is appealing . AI platforms interact with millions of users daily, often about deeply personal matters such as mental health struggles, relationship problems and violent thoughts. Most companies have systems to detect conversations that raise red flags. Requiring a response might be less controversial for AI than for a human therapist. Therapists are bound by strict confidentiality obligations that make warning third parties ethically and legally complicated. AI companies operate under much weaker rules , at least in the U.S., where no comprehensive federal privacy law exists. That lesser restriction makes it easier to justify requiring AI companies to act when it seems that someone’s life may be at risk. But balancing that with protecting privacy is still important. Who to warn, and when The first challenge in applying the Tarasoff framework to the AI world is accuracy. Predicting violence is hard , even for trained mental health professionals. AI systems, or human moderators who review flagged content, are not clinicians. Requiring them to judge who poses a genuine threat could lead to numerous false positives, with real consequences for people whose accounts are suspended or whose information is shared with authorities based on misread signals. The second challenge is scale. A therapist sees dozens of patients. AI platforms have hundreds of millions of users . Imposing a duty to monitor and act on worrisome content could create perverse incentives. AI companies might reduce their monitoring to avoid acquiring knowledge that would trigger a legal duty, reasoning that what they do not know cannot make them liable . The third challenge is identifying who is at risk. In the 1969 case, Poddar had named Tarasoff as a potential victim. But in many AI interactions, violent or self-destructive language is diffuse and doesn’t identify a target. Courts will need to develop clear standards for when a threat is specific enough to trigger a duty to warn, and to whom any warning or protective action should be directed. Growing urgency The AI industry is expanding rapidly , yet the legal rules governing what AI companies owe their users and the public are deeply unclear. Courts are beginning to grapple with questions case by case, such as whether OpenAI bears any responsibility for a gunman accused of killing two students at Florida State University on April 17, 2025. The gunman in that case was armed with a semi-automatic pistol and allegedly had extensive conversations with ChatGPT about how to use the weapon most effectively. A narrow, carefully defined duty to warn, triggered only when an AI system flags a user’s behavior and it is reviewed by humans, would be a meaningful step forward. And it could focus initially on the most serious and credible threats. The practice could also shift the conversation away from thorny technical debates about whether AI chatbots are products, services or media, which complicates legal claims , toward a more human question: Did this company know someone was in danger, and did it do enough to warn them and authorities? Anat Lior is an assistant professor of law at Drexel University . This article is republished from The Conversation under a Creative Commons license. Read the original article .
- AlphaFold-predicted structures of selected de novo designed PET hydrolases in comparison with their naturally evolved counterparts.
AlphaFold-predicted structures of selected de novo designed PET hydrolases in comparison with their naturally evolved counterparts. EurekAlert!
- A New Era of Innovation: Google Research at I/O 2026
General Science
Score: 50🌐 MovesMay 28, 2026https://research.google/blog/a-new-era-of-innovation-google-research-at-io-2026/ - Daywatch: Groupon laying off nearly 25% of its workforce in AI shift
Daywatch: Groupon laying off nearly 25% of its workforce in AI shift Chicago Tribune
Score: 50🌐 MovesMay 28, 2026https://www.chicagotribune.com/2026/05/28/daywatch-groupon-laying-off-nearly-25-of-its-workforce-in-ai-shift/ - Sam Altman and Anthropic’s CEO Just Walked Back Their Dire AI Layoff Warnings
New data, and reassurances from leading AI developers, suggest the work automating tech isn’t generating feared job destruction—yet, anyway.
- NC AI wins defense AI project with Hyundai Rotem
NC AI said Thursday it had been selected as the final contractor for a government-led defense R&D project ordered by South Korea's Agency for Defense Development in a consortium with Hyundai Rotem. The project aims to develop a physical AI-based integrated simulator and modular robotic system designed to improve the operational efficiency of future manned-unmanned combat systems. Under the project, NC AI will lead development of a “world model,” a core technology regarded as essential for next-g
- Thai police used AI to fake drag arrest image
Thai police used AI to fake drag arrest image The Telegraph
- Some AI mental health apps are harmful for kids, says report—what experts say parents should keep in mind
The organization found that school-based mental health apps were safer than direct-to-consumer apps.
- Dell Shares Soar 40% After Outlook Tops Estimates on AI Boom
Dell Technologies Inc. shares surged 38% in premarket trading Friday after the hardware maker gave an outlook for annual sales that far surpassed analysts’ estimates, fueled by demand for servers that power artificial intelligence work.
- Asana acquires no-code agent-builder StackAI
Asana will incorporate StackAI into its growing suite of AI workflow tools.
Score: 50💰 MoneyMay 28, 2026https://techcrunch.com/2026/05/28/asana-acquires-no-code-agent-builder-stack-ai/ - Robots will trigger the real AI revolution – and China is in the lead
Robots will trigger the real AI revolution – and China is in the lead The Telegraph
- How AI is helping make flying even safer in the United States
How AI is helping make flying even safer in the United States USA Today
Score: 49🌐 MovesMay 28, 2026https://www.usatoday.com/videos/travel/2026/05/27/how-ai-is-helping-the-faa-improve-flight-safety/90271615007/ - Top Japanese banks to use OpenAI's new model against cyberattacks
Top Japanese banks to use OpenAI's new model against cyberattacks Nikkei Asia
- Private equity firm EQT partners with Google Cloud for AI rollout
Private equity firm EQT partners with Google Cloud for AI rollout Reuters
- AI storage excitement pushes three companies with local ties past $1 trillion market cap
Three storage technology companies with local operations have each passed the $1 trillion market capitalization threshold, largely because of investor interest in artificial technology chips and data center storage.
Score: 49🌐 MovesMay 28, 2026https://www.bizjournals.com/sacramento/news/2026/05/28/sk-hynix-micron-samsung.html?ana=brss_6150 - Snowflake surges 36% for best day ever on AI frenzy, fueling software rally
The Snowflake rally also lifted shares of ServiceNow, Oracle and Palantir, while Salesforce bucked the trend.
Score: 48🌐 MovesMay 28, 2026https://www.cnbc.com/2026/05/28/snowflake-snow-software-stock-rally.html - Amazon Thinks the Future of Data Centers Depends on a Technical Problem It Just Solved
The tech giant says a breakthrough in data center networking has dramatically accelerated the flow of information through its massive cloud infrastructure.
- SK chief to meet Nvidia’s Jensen Huang again in Taiwan next week
SK chief to meet Nvidia’s Jensen Huang again in Taiwan next week 매일경제
- Spotify’s AI Remix Looks To Turn Fans Into Revenue
Spotify is testing a potentially lucrative idea that fans should not just stream songs, but also should remix them and pay for the privilege.
Score: 48🌐 MovesMay 28, 2026https://www.forbes.com/sites/ronschmelzer/2026/05/28/spotifys-ai-remix-looks-to-turn-fans-into-revenue/ - AI boom squeezes optical tech and Huawei makes a chip comeback
The inside story on the Asia tech trends that matter, from Nikkei Asia and the Financial Times
- World’s appetite for AI makes China less afraid of stronger yuan
World’s appetite for AI makes China less afraid of stronger yuan The Japan Times
Score: 48🌐 MovesMay 28, 2026https://www.japantimes.co.jp/business/2026/05/28/tech/ai-china-stronger-yuan/ - Dell and H2O.ai target the token-cost problem with vertical AI models
As artificial intelligence adoption accelerates inside enterprises, the economics of generative AI are forcing a fundamental rethink. Runaway token costs, data sovereignty demands and a growing gap between AI pilots and production ROI are pushing organizations to reconsider where their models run — and what kind of vertical AI models they actually need. The answer […] The post Dell and H2O.ai target the token-cost problem with vertical AI models appeared first on SiliconANGLE .
Score: 48🌐 MovesMay 28, 2026https://siliconangle.com/2026/05/28/vertical-ai-models-enterprise-delltechworld/ - Newsom signs order focused on AI’s workforce impacts
The executive order directs state agencies to evaluate a range of approaches, including “safety net” options for displaced workers.
Score: 48🌐 MovesMay 28, 2026https://www.hrdive.com/news/newsom-signs-order-focused-ais-workforce-impacts/821274/