AI News Archive: June 26, 2026 — Part 10
Sourced from 500+ daily AI sources, scored by relevance.
- Google's Gemini AI Can Do a Lot, But Here Are 15 Features You'll Actually Use
Google's Gemini AI Can Do a Lot, But Here Are 15 Features You'll Actually Use PCMag Australia
- Google's Gemini AI Can Do a Lot, But Here Are 15 Features You'll Actually Use
Google's Gemini AI Can Do a Lot, But Here Are 15 Features You'll Actually Use PCMag UK
- New AI Scams Are Targeting You, LastPass Was Breached Again, and an Urgent Warning for Apple Owners
New AI Scams Are Targeting You, LastPass Was Breached Again, and an Urgent Warning for Apple Owners PCMag Australia
- Renault China adds software office to develop SDVs for global market
Renault China adds software office to develop SDVs for global market Automotive News
- Waymo registers German entity for future European expansion
Waymo registers German entity for future European expansion Automotive News
- Waymo Launches in Nashville
Waymo has just begun robotaxi service in another city today. If you’re in Nashville, you can now hail a robotaxi from the USA’s #1 robotaxi company. Just start by downloading the Waymo app. “Since April, we’ve welcomed tens of thousands of riders from our interest list to experience the safety, ... [continued] The post Waymo Launches in Nashville appeared first on CleanTechnica .
- 2026 Cannes Lions: Authenticity In The Age of AI
Authenticity. Outcomes. Workflow….and containerization? After a week sweating on the Croisette, the AdExchanger team offers you a sweat-free synopsis of the 2026 Cannes Lions. The post 2026 Cannes Lions: Authenticity In The Age of AI appeared first on AdExchanger .
- Pacdora Launches Canvas: An AI Workspace for Multi-SKU Packaging Design
Pacdora Launches Canvas: An AI Workspace for Multi-SKU Packaging Design azcentral.com and The Arizona Republic
- AI designs the ideal burger for taste, health, and planet
AI designs the ideal burger for taste, health, and planet EurekAlert!
- Researchers create PaperTok, an AI system that helps users turn research papers into short, engaging videos
Students in the University of Washington's Prosocial Computing Group noticed a trend on social media: People were using generative artificial intelligence to make short science videos. The trouble was that these people weren't scientists, which, given AI's proclivity to be convincingly wrong, could accelerate the spread of misinformation. So the lab wondered how to enable scientists and other researchers to better adapt to platforms like TikTok.
- Gemini’s new Google Play Store integration lets you chat to find Android apps, games
As previewed at I/O 2026 , the Gemini app on Android now integrates with the Google Play Store.
- Gemini begins rolling out for existing Android Automotive vehicles
Some months after its confirmation , Gemini for Android Automotive appears to be finally rolling out to existing vehicles.
- AI can be an ally in rooting out ransomware threats
AI can be an ally in rooting out ransomware threats EurekAlert!
- Could the Army’s light squad vehicle power battlefield drones?
A mobile brigade in the 101st Airborne Division put drones to the test in a recent training rotation—and used the Infantry Squad Vehicle to keep unmanned systems running.
- Why 35 US news publishers are suing OpenAI and Microsoft
A new copyright lawsuit claims OpenAI copied millions of newspaper articles into AI training datasets while stripping copyright notices and author information before training GPT models. The post Why 35 US news publishers are suing OpenAI and Microsoft appeared first on MEDIANAMA .
- How People in China Keep Outsmarting Anthropic’s Geolocation Restrictions
As Anthropic tightens restrictions on access to Claude in China, users keep finding new workarounds, from proxy services to fake identities sourced on Telegram.
- After Mythos suspension, India pushes for stable AI access from the US
Meity Secretary S. Krishnan said that there was "an understanding" with the US that "access to technology, once it is provided, will not be cut off" as India tries to maintain "trusted partners" after the Mythos model suspension. The post After Mythos suspension, India pushes for stable AI access from the US appeared first on MEDIANAMA .
- Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro
Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro MarkTechPost
- From Local LLM to Tool-Using Agent
Using Gemma 4, Ollama, OpenAI Agents SDK, and Tavily MCP to build a lightweight research agent The post From Local LLM to Tool-Using Agent appeared first on Towards Data Science .
- How to Slash Your LLM Bill With a Multi-Agent Setup
Running every task through your most powerful AI model is like paying a senior architect to photocopy. The smarter setup is a team, one expensive model that thinks and plans, and a crew of cheap, fast models that do the actual work. The price gap between those tiers is not small, it is often ten to a hundred times per token, so the savings are real and large. Here is how the structure works, why the math is so lopsided, and how to build it without a single special tool or library. There’s a habit almost everyone falls into when they start building with AI. You find the best model available, the smartest, most capable one, and you route everything through it. Planning, hard reasoning, simple formatting, renaming variables, reading a file, writing boilerplate, all of it goes to the top-tier model, because if that model is the best, surely using it for everything gives the best result. It does give a good result. It also gives you a bill several times larger than it needed to be, because most of what you asked it to do didn’t require anything close to its full ability. You used a senior architect to operate the photocopier, and then you paid the architect’s hourly rate for the copying. There’s a better structure, and it’s borrowed from how real teams work. Instead of one expensive expert doing every job, you build a team. One powerful model acts as the brain, it takes the goal, thinks it through, and breaks it into pieces. Then it hands the pieces off to cheaper, faster models that do the actual labor. The brain plans, the workers execute, and you pay top-tier prices only for the small slice of work that genuinely needs top-tier intelligence. The result is the same finished job at a fraction of the cost, and you can build the whole thing with nothing but ordinary API calls. No framework, no library, no special tooling. Here’s how it works and why the savings are so large. Why the math is so lopsided To see why this works, you have to look at what different models actually cost, because the gap is far bigger than most people realize. AI models are priced per token, roughly per chunk of text in and out, and the price difference between the top tier and the bottom tier is enormous. A frontier model, the kind you would pick as your brain, runs in the neighborhood of 5 dollars per million tokens of input and 25 to 30 dollars per million tokens of output. A capable budget model, the kind you would use as a worker, can cost as little as 10 to 40 cents per million tokens, and some are cheaper still. That isn’t a small discount. Depending on which models you compare, the cheap one can be ten, twenty, even a hundred times less expensive per token than the expensive one. Across the whole market, the price of output tokens spans from under 30 cents to over 30 dollars, a range of more than a hundred to one. Now think about what that means for a real job. Suppose you use one frontier model to handle a thousand small tasks, most of which are simple, formatting text, extracting a field, making a routine edit. You’re paying the highest per-token rate in the market for work a model costing a fraction as much could do just as well. Route those simple tasks to a cheap worker model and keep only the genuinely hard reasoning on the expensive brain, and the bill does not drop by a little. It can drop by most of itself. One widely cited example of this exact setup took a workload from about 10,500 dollars a month to roughly 1,500 dollars a month, an 86 percent cut, with almost no drop in quality because the expensive model was still doing all the hard parts. The cheap models just stopped being paid premium rates for cheap work. That’s the whole financial case in one line. The expensive model is worth its price for the hard 10 percent of the work and a waste of money on the easy 90 percent, and a team structure lets you pay the right price for each. The team, by role The useful way to design this is to think in roles, exactly like staffing a real project, and you don’t have to stay within one company’s models. The best setups mix providers, using whichever model is cheapest for the capability each role needs. The brain is your most capable model, and it does the least typing and the most thinking. Its job is to take the goal, reason about how to approach it, break it into clear sub-tasks, and decide which worker handles each. It’s also where you escalate anything a worker gets stuck on. Because the brain mostly plans rather than producing huge volumes of output, you use it sparingly, and its high per-token cost barely matters since it’s responsible for only a sliver of the total tokens. A current frontier model like Opus 4.8 or a top GPT or Gemini model is the right pick here. The managers are your mid-tier models, the ones with solid reasoning that costs a fraction of frontier prices. They handle the work that needs real competence but not the absolute ceiling, drafting, moderate analysis, multi-step tasks that are not the hardest ones. A model like Claude Sonnet or Gemini Pro sits in this tier, capable enough to be trusted with substance, cheap enough to use freely. The hustlers are your cheap, fast workers, and they do the bulk of the labor. Formatting, extraction, simple edits, classification, routine transformations, the high-volume mechanical work that makes up most of any real job. This is where a small, inexpensive model earns its keep, a Haiku-class model, a Gemini Flash tier, or a low-cost model like DeepSeek, any of which costs pennies per million tokens. You run these constantly and barely feel the cost, which is the entire point. Match the model to the difficulty of the task, and mix across providers to get the cheapest capable option for each tier. The brain thinks, the managers handle the real-but-not-hardest work, and the hustlers grind through the volume. Each one is priced for its job. Two ways to do this There are two honest ways to put this team structure to work, and they suit different people. The first uses an agentic tool that already exists and needs almost no code. The second is the build-it-yourself route for full control. Both get you the same payoff, and both are laid out as steps below. Way one, using a tool that already supports it, step by step For most people this is the easier path. Several of the popular agentic coding tools already let you assign a different model to each agent, so you just configure the roles and the tool handles the orchestration. OpenCode and Claude’s command-line tool are two clear examples, and the steps are nearly identical in spirit. Here it is using OpenCode, which has the advantage of letting you mix models from different providers in one team. Step 1, install the tool and connect your models. Install OpenCode, then add the providers whose models you want to use, so you have a frontier model, a mid-tier model, and a cheap model available. Set a sensible default model in the config file, for example a mid-tier model, so anything unspecified runs at a reasonable price. Step 2, create a planner agent on a strong model. Agents are just small markdown files you drop in an agents folder, with a few lines of settings at the top and a system prompt below. Make one for your planner, set its model to a frontier model, and tell it in the prompt to break a task into smaller sub-tasks and delegate rather than do everything itself. In practice the top of that file is a handful of lines, a description, the mode set to primary so it can orchestrate, and the model line pointing at your frontier model. --- description: Plans work and delegates to worker agents mode: primary model: anthropic/claude-opus-4-8 --- You are the planner. Break the request into small sub-tasks, decide which worker should handle each, and delegate. Do the hard reasoning yourself, but hand routine work to the workers. Step 3, create worker agents on cheaper models. Make one or more worker agents the same way, each its own markdown file, but set their model to a cheaper tier and their mode to subagent so the planner can invoke them. A bulk-work agent might point at a small, cheap model, while a mid-tier worker for moderately hard tasks points at a mid model. This is where the diversification happens, and you can point different workers at different providers to get the cheapest capable option for each. --- description: Handles routine, high-volume tasks mode: subagent model: anthropic/claude-haiku-4-5 --- You are a worker. Do the specific task you are given quickly and return the result. Keep it focused. Step 4, let the planner delegate. Start a session with your planner as the active agent. When you give it a goal, it breaks the work down and hands sub-tasks to the worker agents, which run on their cheaper models, while the planner stays on the expensive one only for the planning and the hard parts. The tool manages the message-passing between them. Many of these tools also let the main agent pick a worker automatically based on each agent’s description, so once your agents are defined, the routing largely takes care of itself. Step 5, tune which worker gets what. Watch where the cheap workers do well and where they struggle, and adjust, sharpen each agent’s description so the planner routes the right tasks to it, or move a task type up to a more capable worker if the cheap one keeps failing it. That tuning is the whole job once the agents exist. That is the entire setup in a tool, a planner on a strong model, workers on cheap ones, each defined in a few lines, and the tool handling the orchestration between them. If one of these tools already fits your workflow, this is the fastest route by far. Way two, building it yourself, step by step This is the right choice when you want full control over the routing, want to use models or providers a tool does not support, or are building this into your own software rather than a coding session. The reason to do it yourself isn’t that it’s easier, it’s that you decide exactly how tasks are split, routed, and escalated. Here’s that build, step by step. Step 1, get API access across your three tiers. Sign up for the providers you want and get an API key for each, a frontier model for the brain, a mid-tier model for the managers, and a cheap model for the hustlers. They don’t have to be from the same company, and mixing providers to get the cheapest capable option per tier is the point, so pick whatever gives you the best price at each level. Step 2, write one function per tier. Make three small functions, one that sends a prompt to your brain model, one to your mid-tier model, and one to your cheap worker. Each is just a basic API call that takes a prompt and returns the response. This is the whole toolkit, three functions that each talk to one model. Step 3, have the brain make a plan. Send your overall goal to the brain model with an instruction along the lines of, break this task into a list of smaller sub-tasks, and for each one label how hard it is, simple, medium, or hard. The brain returns a structured plan, a list of sub-tasks each tagged with a difficulty. This single call is the only place you use your expensive model heavily, and it’s cheap because it’s one planning step, not the whole job. Step 4, write the routing rule. This is the heart of it, and it’s a few lines. For each sub-task in the plan, look at its difficulty label and send it to the matching function, simple goes to the cheap worker, medium goes to the mid-tier model, hard goes back to the brain. That’s the entire routing logic, a rule that maps a difficulty label to one of your three functions. Step 5, loop through the tasks and collect the results. Walk the list of sub-tasks, run each one through the routing rule so it goes to the right model, and gather the outputs as they come back. At the end you stitch the pieces together into the finished result. Most of these calls are hitting your cheap worker, which is exactly why the bill stays low. Step 6, add an escalation check. For each result, do a quick check, did the worker actually complete the task, or did it fail or return something obviously wrong. If a cheap model couldn’t handle a piece, send that one piece up to a more capable model and use that answer instead. This is what keeps quality high, the cheap models do the easy bulk, and anything they stumble on gets escalated. That’s the whole architecture, three model functions, a planning call, a routing rule, a loop, and an escalation check. Nothing else. One simplification worth knowing. You don’t even strictly need the brain to do the planning and labeling. For many jobs a crude rule works fine, route by task length, by whether code is involved, by a keyword or two, sending anything that looks simple to the cheap model and anything that looks hard to the expensive one. A middle option is to use a cheap model itself as the classifier, a fast, inexpensive call that just labels each task simple, medium, or hard before the real work begins. Either way, the routing stays a few lines, not a dependency. The only real tuning is deciding where your cutoffs are, which tasks count as simple enough for a hustler and which need to go up the chain, and you dial that in by watching where the cheap models succeed and where they fall down. The honest tradeoffs This is a genuinely better setup for most real work, but it isn’t free, and pretending otherwise would be selling you something. Coordination costs tokens. The brain spends tokens planning and splitting the work, and that overhead is real, though it’s usually small next to what you save by keeping the bulk of the labor cheap. Cheap models fail more often on anything tricky, which means sometimes a hustler botches a task, you detect it, and you pay again to have a better model redo it, so your savings are a little smaller than the raw price gap suggests. There’s added complexity too, more moving parts, more places for something to go wrong, and more effort to debug a pipeline than a single call. And a subtle one worth knowing, in long multi-step jobs the biggest hidden cost is often the accumulated context being re-sent on every call, which a team structure helps with but does not entirely solve. None of that changes the conclusion. Even after you account for the coordination overhead, the occasional retry, and the extra complexity, the team structure comes out well ahead on cost for any job with a meaningful amount of routine work in it, which is almost all of them. The savings from not paying frontier prices for grunt work are simply much larger than the costs of organizing the team. The one case where it isn’t worth it is a small, one-off task, where a single capable model is simpler and the savings would be trivial. For anything high-volume or repeated, the math favors the team, and it favors it heavily. The shift in how you think about it The real change here isn’t a trick, it’s a way of seeing the problem. Most people optimize by asking which model is best and then using it for everything. The team approach asks a sharper question for every piece of work, which is the cheapest model that can do this particular task well. That question, asked task by task, is what turns a giant bill into a small one. The expensive model isn’t the enemy. It’s genuinely the best at the hard parts, and you should absolutely use it for them. The mistake is letting it do the easy parts too, at the same premium price. Hand the thinking to your most capable model, hand the labor to the cheap ones, mix providers to get the best price at every tier, and let each model do the job it is actually priced for. You’ll get the same work done for a fraction of the cost, and once you have seen the structure, using one model for everything will look like exactly what it is, paying a genius to do grunt work. If you build a tiered setup like this, drop a comment with the models you used for each role and the mix of providers you landed on. The combinations people actually run, and where they drew the line between cheap and expensive, are the most useful thing for the next person trying to cut their bill. How to Slash Your LLM Bill With a Multi-Agent Setup was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- Context Readiness Is the New AI Coding Benchmark
Context Readiness Is the New AI Coding Benchmark AI coding assistants have moved from novelty to daily infrastructure. Developers now use tools such as Claude, Cursor, Windsurf, Microsoft Copilot, and other AI-powered coding environments to explain code, generate tests, refactor modules, draft pull requests, and investigate production issues. Adoption is no longer the hard part. […] The post Context Readiness Is the New AI Coding Benchmark appeared first on Tabnine .
- Prompt Caching with Deep Agents
Deep Agents now support prompt caching to reduce latency and cost.
- Build Self-Improving Agent Skills with cognee and n8n
When a review run falls short, this workflow catches it and proposes a fix to the skill, then writes the improved instructions back once you approve the diff.
- The AI Agent Tech Stack Explained
• <a href="https://www.
- AI Threat Index Report 2026: Agentic AI Is Rewriting the Rules of Exam Security
AI Threat Index Report 2026: Agentic AI Is Rewriting the Rules of Exam Security azcentral.com and The Arizona Republic
- AI Coding Agents Are Pulling CI Feedback Into the Inner Loop
AI Coding Agents Are Pulling CI Feedback Into the Inner Loop DevOps.com
- Notion kills its Gmail client after AI agents keep humans from troubling inbox
More than half of users now let bots handle email, so service is headed for shutdown
- ZTE showcases practical path to Level-4 autonomous networks through agentic AI and cross-domain innovation at DTW Ignite 2026
PARTNER CONTENT: Highlighting Level-4 autonomous network solutions, TM Forum Excellence Award finalists, and joint operator trials powering cross-domain fault management and Dynamic 5G Slicing
- Manus for SEO
SEO Audits, Keywords & Backlinks From One Prompt
- SafeStreets by Streets & Commons
Walkability and pedestrian safety score for any address
- Atlas
Every AI tool you use should know how your company works
- ModuleX
AI workspace that’s already connected to everything
- SquidHub
Multiplayer mode for humans and AI
- LockIn MCP
Let AI block distractions for you when you need to lock in
- AI Slide Editor by CubeOne
The editor PowerPoint should've shipped
- DMV by Agent Community
A community-governed namespace for AI agents
- Cewsco
All-in-one AI assistant — chat, images, voice & market data
- VoiceX
Write Twice as Much. In Half the Time
- PageGains
AI-powered landing page conversion audits
- PixelDrivePro
API & MCP-first Bulk image generation for developers and AI
- Penqwin
Engineering knowledge that evolves with your codebase
- UX Agent
Find why shoppers don't convert, and what to fix first.
- Context Mode Insight
Engineering signal for AI-assisted teams
- PDF to Markdown
PDF to clean, LLM-ready Markdown for humans & AI agents
- Sop software
Turn any site workflow into shareable guides
- AO2 Memory
Portable context for every AI. Never explain yourself twice.
- Flamingo Compliance
Tax residency and visa tracking for the globally mobile
- zero
Keep your inbox at only what still needs you
- Pelica Health
AI operating system for value based care teams
- AI Search Radar
The technical SEO platform for ChatGPT and AI Search