AI News Archive: May 26, 2026 — Part 8
Sourced from 500+ daily AI sources, scored by relevance.
- Perplexity Bumblebee Shakes Loose Hidden Threats on Dev Desktops
Perplexity Bumblebee Shakes Loose Hidden Threats on Dev Desktops DevOps.com
Score: 38🌐 MovesMay 26, 2026https://devops.com/perplexity-bumblebee-shakes-loose-hidden-threats-on-dev-desktops/ - OpenClaw Explained: How the Open-Source AI Agent Works
OpenClaw Explained: How the Open-Source AI Agent Works Built In
- The Canvas hack was a warning about our AI future. Are we listening?
The Canvas hack was a warning about our AI future. Are we listening? San Francisco Chronicle
Score: 38🌐 MovesMay 26, 2026https://www.sfchronicle.com/opinion/openforum/article/canvas-shinyhunters-ai-anthropic-mythos-22259764.php - As AI Transforms education, new article highlights the human dimensions of teaching
As AI Transforms education, new article highlights the human dimensions of teaching EurekAlert!
- Co-Developing an AI Native Observability Platform
Co-Developing an AI Native Observability Platform DevOps.com
- At Upper Bound, AI experts debate what it means to be a person
Richard Sutton says machines aren’t persons yet, but will be someday. The post At Upper Bound, AI experts debate what it means to be a person first appeared on BetaKit .
Score: 37🌐 MovesMay 26, 2026https://betakit.com/at-upper-bound-ai-experts-debate-what-it-means-to-be-a-person/ - MonoClaw Debuts: Hong Kong’s First Local AI Secretary Ushers in the Software 3.0 Era
MonoClaw Debuts: Hong Kong’s First Local AI Secretary Ushers in the Software 3.0 Era
- How do AI, software fit into warehouse robotics?
Investing in intelligent software systems is crucial for warehouse operators that employ robotics.
Score: 37🌐 MovesMay 26, 2026https://www.supplychaindive.com/news/warehouse-robotics-implementation-ai-software/816122/ - BAYSOFT.AI SHIPS FIRST 300 UNITS OF iSkreen™, THE WORLD'S FIRST BI-DIRECTIONAL WEARABLE IDENTITY PLATFORM
BAYSOFT.AI SHIPS FIRST 300 UNITS OF iSkreen™, THE WORLD'S FIRST BI-DIRECTIONAL WEARABLE IDENTITY PLATFORM azcentral.com and The Arizona Republic
- HELM Arabic Enterprise
We present HELM Arabic Enterprise, a leaderboard for transparent, reproducible evaluation of large language models on Arabic-language benchmarks designed around enterprise use cases. The leaderboard was developed in collaboration with Arabic.AI and builds on the HELM evaluation methodology: standardized prompting, fully logged requests and responses, and reproducible scoring through the open-source HELM framework.
- Thermal Cameras and AI Help Ships Steer Clear of Gray Whales
The whales have begun to make impromptu and dangerous stops in San Francisco Bay
- What are AI Evals and Why They Matter (It’s Not Just Testing)
The world of AI is moving fast. Models like GPT-5.x, Claude, and Gemini are no longer just answering questions, as they’re writing production code, drafting medical summaries, executing trades, and orchestrating multi-step workflows on real systems. But here’s a question almost nobody asked five years ago: how do we actually know they work? In July 2025, an AI coding assistant from Replit deleted an entire production database despite being explicitly told not to. The team had run benchmarks. The model “passed.” It passed in the same way a student can ace a multiple-choice test and still be unable to write a real essay. The benchmarks measured something — just not the thing that mattered. This is the gap that evals are designed to close. When developers first try to test AI systems, they reach for what they know: unit tests. Write inputs, assert expected outputs, run the suite. That worked beautifully for forty years of deterministic software. It does not work for AI. The same prompt produces different outputs each run. There is no single “correct” answer to summarize this email or write a polite refund response . Quality is a spectrum, not a checkbox. Traditional testing wasn’t built for systems that reason in probabilities, but evals are the answer for such ambiguous systems. What is an eval, in plain language? Think of an eval like a driving test . To know if someone can drive safely, you don’t just quiz them on traffic rules. You put them in a real car, in real traffic, and watch how they handle parallel parking, highway merges, and a pedestrian stepping off the curb. You score them on multiple dimensions: control, judgment, awareness. You don’t expect a perfect score — you expect a high enough one to trust them on the road. An eval does the same thing for an AI system. You give it a curated set of realistic situations. You watch what it does. You score the outputs against what “good” looks like. You aggregate those scores into something a team can act on. How an Eval works Technically, an eval is a structured measurement of an AI system’s output quality on a defined task . It has four moving parts: A test set — curated inputs that represent what the system will actually face. The best ones, called golden datasets , are built from real production failures, not synthetic examples. The AI system under test — could be a single model, a RAG pipeline, or a full multi-step agent. The outputs — what the AI actually produced: answers, tool calls, full execution traces. The evaluator — the thing that grades. This might be deterministic code, another LLM, a human reviewer, or all three layered together. The output is a score : pass rate, accuracy, faithfulness, tone — whatever dimensions matter for your application. Traditional testing vs. AI evals The mismatch between unit tests and AI is worth showing directly. A unit test for add(2, 2) expects exactly 4. Forever. Anything else is a bug. An eval for “summarize this support ticket” might accept dozens of valid summaries. Some are concise, some are thorough, some emphasize the customer’s frustration, some focus on the technical issue. None of them are wrong . The job of the eval isn’t to pick the one true answer — it’s to measure, across many examples, whether the system tends to produce summaries that are accurate, faithful to the source, and appropriately toned. This shift has consequences: Tests are binary; evals are statistical. You’re looking at a distribution of scores, not a green checkmark. Tests are cheap; evals can be expensive. An LLM-as-judge run on 500 examples costs real money. A human-graded eval costs more. Tests catch bugs; evals catch behaviors. Including ones nobody intended — biases, hallucinations, subtle quality drift after a prompt change. The four types of evals (and why teams use all of them) There isn’t one kind of eval — there’s a stack. Each layer trades cost for trust. 1. Deterministic checks. Plain code, plain rules. Did the model return valid JSON? Did the agent call refund_order instead of cancel_subscription? Does the email contain a @? Cheap, fast, and surprisingly effective for a huge class of failures. The "did the AI even attempt the right shape of answer" layer. 2. LLM-as-judge. A frontier model reads the AI’s output and scores it against a written rubric. This is now the default approach for teams that need throughput beyond what humans can provide. It offers roughly 500x to 5000x cost savings over human review while achieving around 80% agreement with human preferences — close to how much two humans agree with each other on the same task. It only works if you calibrate it: you need a sample of human-graded examples to verify the judge isn’t drifting from what real reviewers would say. 3. Benchmark evals. Standardized public tests like MMLU-Pro, GPQA, SWE-bench, and HumanEval. These are useful for picking which base model to start with — like SAT scores when comparing students. They’re a poor fit for measuring whether YOUR specific application is any good. As of 2026, frontier models have saturated most benchmarks the industry relied on two years ago, and new ones (HLE, SWE-bench Pro, LiveCodeBench) are constantly being introduced specifically because the old ones stopped discriminating between models. 4. Human evaluation. The gold standard, and the source of truth used to calibrate everything above it. Slow, expensive, irreplaceable for high-stakes domains — medical, legal, anywhere a hallucination has real consequences. A medical-specific benchmark like HealthBench exists for exactly this reason: general benchmarks don’t capture domain-specific failure modes. In production, mature teams stack all four. Code rules catch the obvious. LLM judges scale. Humans calibrate the judges and handle the edge cases. Why evals exploded in importance in 2025–2026 For a long time, evals were a nice-to-have — a thing the research lab cared about. Three things changed that. 1. AI moved from prototype to production. According to LangChain’s 2026 State of AI Agents report, more than half of organizations now have AI agents running in production. The Replit incident wasn’t an outlier, rather it was a preview. When an AI takes autonomous actions on real systems, “vibe-checking” the outputs in dev stops being acceptable. 2. Frontier models broke the old benchmarks. When everyone scores 95%+ on MMLU, the benchmark stops telling you anything. Teams now have to build their own evals against their own data, because the public ones can’t differentiate quality at the level that matters to a specific product. 3. Regulation arrived. ISO 42001 and the NIST AI Risk Management Framework are now being baked directly into evaluation pipelines as compliance gates, particularly in finance and healthcare. “We tested it and it seemed fine” is no longer an answer a regulator accepts. There’s a quote from Greg Brockman, OpenAI’s president, that gets passed around eval circles: “Evals are surprisingly often all you need.” What he means is that the discipline of measuring forces clarity . You can’t build a good eval without first answering “what does good actually look like for this task?” — and once you’ve answered that, half the problem is solved. Where this is going Evals are shifting from a one-off pre-launch test to a continuous discipline that runs at every stage of the AI lifecycle. A few specific shifts worth watching: Continuous evaluation, not pre-launch testing. Quality gates run on every pull request. Sampled production traffic gets scored live. Drift triggers alerts the same way latency spikes do. Cross-functional ownership. Evals are no longer engineering-only. Product managers validate behavior against requirements, QA owns regression, domain experts flag edge cases. If every change requires an engineer to write a script, engineering becomes the bottleneck for every quality decision. Domain-specific evals. HealthBench for medical, specialized benchmarks for law, finance, code. General benchmarks are necessary but never sufficient. Agent-specific metrics. Single-turn accuracy stops being the right measurement when an AI is taking 30 actions across 5 tools. Metrics like pass@k (does any of k attempts succeed?) and pass^k (do all k attempts succeed?) are becoming standard for agentic systems where consistency, not just capability, is what users feel. Why this matters As AI takes on more consequential tasks, the difference between a working demo and a trustworthy product is almost entirely a question of evaluation. Demos get built on confidence. Products get built on evidence. Where traditional testing gave us reliable code, evals give us accountable AI . They convert “the model felt smart in the demo” into “the model is correct on 89% of real cases, with these specific failure modes, monitored continuously, and gated by compliance checks before each release.” If you’re building anything with AI, evals aren’t optional anymore. It’s the layer that lets AI graduate from clever to trusted — and it’s quietly becoming the most important piece of infrastructure in the AI stack. Photo by Angelo Casto on Unsplash What are AI Evals and Why They Matter (It’s Not Just Testing) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- How Australian CIOs can unlock the agentic AI advantage
With CIOs under pressure to deliver transformation at scale, agentic AI is emerging as a powerful lever to reduce complexity and free up capacity for higher-value work, but it also demands a fundamental shift in how organisations operate. Agentic AI goes beyond traditional models that rely on fixed rules or algorithms by adapting to new situations, learning from experience and autonomously taking action to achieve goals without constant human oversight. The shift is happening quickly. Gartner predicts that by 2028 , one-third of enterprise software applications will incorporate agentic AI, with at least 15% of day-to-day work decisions made autonomously. IDC, meanwhile, forecasts that agentic AI will drive the next wave of IT investment , accounting for more than a quarter of global IT spending over the next five years. “An important takeaway from this forecast is the clear alignment between the growth in AI spending and IT leaders’ trust that effective use of AI can boost future business success,” says Rick Villars, group vice president, Worldwide Research at IDC. Yet in conversations with Australian CIOs, the challenge is not access to technology — it’s how to operationalise it. From pilots to production: where organisations are unlocking value At a recent CIO roundtable discussion in Sydney, held in partnership with Google Cloud, leaders pointed to some challenges in scaling AI but also where organisations are starting to make real progress. Data readiness continues to be a key focus area, but it’s increasingly seen as an opportunity rather than a blocker. Many organisations are investing in improving data quality, accessibility and context — recognising that these foundations are what enable agentic AI to deliver reliable, scalable outcomes. As these capabilities mature, they are unlocking new use cases and accelerating the path to production. At the same time, there is a growing shift in how leaders think about AI system design. Success with agentic AI is no longer viewed as just a model challenge. Instead, CIOs are recognising the value of bringing together the three critical elements of reasoning, data context and execution as a single, integrated capability. Historically, these components have often been developed in isolation. But organisations that are seeing the most traction are those taking a more holistic approach by orchestrating these elements so that insights are not only generated, but acted on in real time, within the flow of business operations. What deployment looks like at scale For CIOs, the clearest signal of the agentic era isn’t analyst forecasts; it’s where organisations are already committing. Wesfarmers, for example, has entered a multi-year partnership with Google Cloud to deploy agentic AI across its retail portfolio, including Kmart, Officeworks and Priceline. Its OnePass platform is evolving into a unified, conversational interface that allows customers to search and transact seamlessly across brands. Internally, Gemini Enterprise has been rolled out to thousands of employees, automating work across marketing, finance and supply chain functions. Across financial services, organisations are taking a similarly pragmatic approach by focusing on targeted use cases where agentic AI can improve customer experience and streamline operations, while maintaining strict controls around security and compliance. For example, Macquarie Bank is improving customer experience and staff productivity with new agentic capabilities via its multi-year partnership with Google Cloud, equipping staff with advanced AI tools and a suite of specialised, pre-built agents through Geminin Enterprise. The Australian bank rolled out a 24/7 AI agent “Q” to autonomously answer banking questions for its 2 million customers. “With this foundation Macquarie has already cut client losses by half. That’s secure, frictionless banking at scale,” said Karthik Nahrain, chief product and business officer, speaking at Google Cloud NEXT keynote . These examples reinforce a key insight from CIO discussions: The organisations making progress are not trying to do everything at once. They are prioritising high-value use cases, building confidence and scaling deliberately. Mindset shift required to thrive in agentic era According to Google Cloud’s AI Agent Trends Report, leaders must question old assumptions and drive cultural change to thrive in the agentic AI era. Oliver Parker, vice president, global GTM for generative AI at Google Cloud, says the shift to agentic AI isn’t just a technology change, it also requires a rethinking of mindset. “AI agents are the leap from being an ‘add-on’ approach to being an ‘AI-first’ process. It’s a fundamental change in workflow, a new way to work that will require a profound shift in mindset and corporate culture.” That’s echoed by IDC’s president Crawford Del Prete who says their forecast on agentic AI raises several important issues for businesses to consider about the interconnection between the workforce and AI investment. “As an example, business leaders will need to pay particular attention to employee roles in an enterprise and how roles change as agents become more commonplace in business. Agents will change the nature of work, making some roles highly productive and others obsolete. Workers and enterprises will need to be more agile than ever before to keep pace.” Ultimately, the opportunity is clear, but so is the work required to realise it. For CIOs, the focus now is on making deliberate choices about where to scale, who to partner with, how to build trust and how to embed agentic AI into the fabric of everyday operations. To find out more about how Google Cloud can help scale agentic AI across your business click here .
Score: 37🌐 MovesMay 26, 2026https://www.cio.com/article/4177107/how-australian-cios-can-unlock-the-agentic-ai-advantage.html - When AI makes you worse at your job
If you’ve ever used an online patient portal to message your doctor in the middle of the night, you won’t be surprised to learn that responding to those messages takes an increasingly big bite out of clinicians’ workdays. So in recent years, hospitals have begun adopting an AI tool that can draft responses for them. […]
Score: 36🌐 MovesMay 26, 2026https://www.vox.com/technology/489223/ai-work-jobs-productivity-agents-claude - Wake is developing a policy on AI use. How the school board says it falls short
Wake is developing a policy on AI use. How the school board says it falls short Raleigh News & Observer
- Marlene Moorman on How AI-Enabled Accountability Can Redefine the Future of Outsourcing Services
Marlene Moorman on How AI-Enabled Accountability Can Redefine the Future of Outsourcing Services USA Today
- Open Source DockSec Uses AI to Cut Through Vulnerability Noise in Docker Images
DockSec, an OWASP incubator project, correlates findings from multiple container security scanners and uses AI to generate plain-English remediation guidance and exact Dockerfile fixes. The post Open Source DockSec Uses AI to Cut Through Vulnerability Noise in Docker Images appeared first on SecurityWeek .
Score: 36🌐 MovesMay 26, 2026https://www.securityweek.com/open-source-docksec-uses-ai-to-cut-through-vulnerability-noise-in-docker-images/ - Australia Post is co-developing two ML models to prioritise its incident queue
Unpacking its five month-old partnership with a security startup.
- UK Mining Conference to Feature Presentation on Autonomous Underground Equipment and Productivity Gains
UK Mining Conference to Feature Presentation on Autonomous Underground Equipment and Productivity Gains Mining Technology
- AI Middleware Architecture: The Control Layer Production LLM Apps Need Now
AI Middleware Architecture Your AI app probably does not need one more clever prompt tweak. It needs a place where model calls, tool calls, retries, approvals, traces, cache hits, and policies can be intercepted before damage spreads. For the last two years, many teams treated LLM integration as a direct line from product code to a model API. The first version was simple: call a model, parse a response, ship. Then the feature needed tools, retrieval, streaming, retries, fallback, logging, cost reports, and SQL safeguards. The timing matters. Google’s Genkit team recently announced middleware hooks for agentic apps, including generate, model, and tool layers for retries, fallbacks, human approval, context injection, and inspection in the developer UI. Vercel’s AI Gateway docs emphasize provider routing, budgets, usage monitoring, and fallback behavior. OpenTelemetry is also pushing deeper generative AI semantic conventions. These are not isolated product updates. They point to the same architecture shift: production AI systems need an explicit control layer between business logic and model execution. This guide explains what AI middleware architecture is, where it belongs, and how to implement it without building a bloated framework. What Is AI Middleware Architecture? AI middleware is the programmable layer between your application and the systems your AI feature depends on: model providers, tools, databases, vector stores, queues, policy engines, and observability pipelines. Traditional middleware handled cross-cutting concerns such as authentication, logging, rate limiting, and request transformation. AI middleware handles similar plumbing for workflows where one user request can trigger model calls, tool calls, retrieval, validation loops, and human review. A good AI middleware layer can answer questions like: Which model should handle this request? Should this prompt include more context, less context, or no private context at all? Is this tool call allowed for this user, tenant, workflow, and risk level? Should this failure be retried, routed to another provider, queued, or escalated? How much did this feature cost, not just this project? Can we replay this interaction during an incident review? Middleware is not just a convenience wrapper. It is where AI features become operable. Why Direct Model Calls Break Down Direct model calls are fine for prototypes. They are dangerous as the default production pattern because they spread control logic across the codebase. One service adds a retry loop. Another adds a fallback model. A third logs token usage but forgets streaming responses. A fourth blocks unsafe content before the prompt, but not inside tool arguments. After a few months, nobody knows which rules actually apply. The pain usually shows up in five ways. 1. Tool Calls Create New Risk Text generation failures are annoying. Tool execution failures can be expensive or destructive. An agent that runs a destructive database query, sends an email to the wrong customer, or calls a paid API in a loop is a different class of risk. Prompt instructions alone are too soft for this boundary. The safer pattern is to treat tool execution as a policy-controlled operation. The middleware should inspect the tool name, arguments, user, tenant, workflow state, and risk score before execution. 2. Provider Fallbacks Are Not Just Endpoint Swaps Switching providers sounds simple until tool schemas, streaming formats, error types, token counting, safety behavior, and JSON modes differ. Teams often discover this under pressure when their primary model returns capacity errors or latency spikes. A middleware layer can normalize provider differences, but only if you design it around explicit contracts. If your application code depends on every provider quirk, fallback becomes a rewrite project. 3. Cost Attribution Gets Messy Fast Project-level AI spend is not enough. Developers need cost by feature, tenant, workflow, experiment, prompt version, model, and user cohort. Otherwise, a background summarization job can look the same as a customer-facing agent, and a retry loop can hide inside a blended bill. Middleware is the best place to attach consistent metadata before the request leaves your system. Do it once, enforce it everywhere, and make missing metadata a development error. 4. Observability Needs the Whole Loop Logging the final model response is not observability. Production AI teams need traces that show model input shape, retrieved context IDs, tool decisions, provider latency, retry attempts, fallback path, validation result, token usage, safety decision, and final user-visible outcome. OpenTelemetry’s generative AI work matters because it gives teams a shared vocabulary for spans, attributes, events, and metrics. You do not need to wait for every standard to settle, but avoid inventing a naming scheme that cannot map to emerging conventions later. 5. Governance Must Happen Before Execution Dashboards help after something happens. Middleware helps before something happens. It can block a tool call, trim sensitive context, require approval, enforce a budget, or downgrade a risky request to a safer path. The best AI governance is not a PDF. It is a set of runtime controls that developers can test. The Core Components of an AI Middleware Layer You do not need every component on day one. But if you are building production LLM apps, your architecture should leave room for these capabilities. A useful middleware layer makes cross-cutting AI controls explicit instead of scattering them across feature code. Request Normalization Start by converting every AI request into a common internal envelope. The envelope should include the prompt or message list, model preference, user ID, tenant ID, feature name, prompt version, risk level, required output contract, trace ID, and budget policy. This envelope becomes the object your middleware can reason about. Without it, each hook receives a slightly different payload and every policy becomes harder to enforce. { "feature": "support_reply_draft", "tenant_id": "tenant_123", "user_id": "user_456", "prompt_version": "support.reply.v17", "risk_level": "medium", "model_policy": "balanced", "output_contract": "support_reply_json", "trace_id": "trace_abc", "budget_cents_max": 12 } Policy Engine The policy engine decides what is allowed. It should be code or configuration that can be reviewed, tested, versioned, and audited. Useful policies define which tools a workflow can call, which data sources a tenant can access, which actions require human approval, which models can process sensitive data, which requests must be blocked or redacted, and which budget limit applies. A practical rule: if the policy would matter during an incident review, it belongs outside the prompt. Context Builder The context builder decides what the model gets to see. It can inject retrieved documents, compress prior conversation state, remove stale context, redact sensitive fields, and attach system instructions. This is where many “prompt engineering” problems become software architecture problems. Instead of asking developers to paste bigger prompts, middleware should assemble context with clear provenance: source IDs, retrieval scores, timestamps, permissions, and truncation notes. Model Router The model router chooses the model and provider path using a static rule, capability matrix, evaluation score, latency target, cost budget, or fallback chain. Keep routing explainable. A developer should be able to inspect a request and understand why it went to one model instead of another. Retry and Fallback Handler Retries should be narrow. If the model API call fails with a transient provider error, retry that call with backoff and jitter. Do not blindly replay the entire tool loop. Fallbacks should preserve contracts. If provider B returns a different shape than provider A, normalize it before the application sees it. If the fallback cannot support the required contract, fail clearly. Tool Gate The tool gate is the checkpoint before the AI system touches the outside world. It validates schema, permissions, arguments, rate limits, spend limits, and approval requirements. For sensitive tools, use a two-step pattern: The model proposes an action in a structured format. The middleware validates, transforms, approves, rejects, or queues the action. Trace Emitter Every meaningful step should emit trace data. At minimum, capture request metadata, model choice, prompt version, retrieval IDs, tool calls, validation outcomes, retry count, fallback path, token usage, latency, cost estimate, and final status. Use stable IDs. A trace that cannot connect the user-visible answer to the model calls and tool calls behind it is not useful during debugging. Three Architecture Patterns There are three common ways to implement AI middleware. Each can work. The right choice depends on team size, latency tolerance, framework diversity, and security needs. Pattern 1: In-Process Middleware In-process middleware runs inside your application or agent framework. This is often the fastest path because it has direct access to application state, user identity, feature flags, and local types. Use it when your team has one main application stack and needs low latency. It is especially useful for prompt assembly, schema validation, local policy checks, and framework-specific hooks. The risk is coupling. Keep your request envelope and policy contracts framework-neutral. Pattern 2: Gateway Middleware Gateway middleware sits between your services and model providers. It is useful for provider routing, budgets, rate limits, API key management, fallback chains, request logging, and shared model access across teams. Use it when multiple services call models, when you need centralized cost controls, or when teams use different languages and frameworks. The risk is that a gateway can see model traffic but not always business intent. Send feature metadata with every request. Pattern 3: Tool-Side Middleware Tool-side middleware wraps databases, APIs, file systems, queues, and external services. It is the right place for authorization, destructive action checks, data filtering, and audit logs. Use it when tool calls can create side effects or expose sensitive data. Even if the model or agent framework fails, the tool boundary should still enforce least privilege. The safest production pattern is usually a hybrid: in-process hooks for local context and validation, a gateway for provider control, and tool-side enforcement for sensitive actions. A Minimal Implementation Plan If you are starting from direct model calls, do not build the perfect AI platform in one sprint. Build the smallest middleware layer that removes real risk. Step 1: Wrap Every Model Call Create one internal function or service for model calls. Ban new direct SDK calls from product code. Require feature name, prompt version, tenant identity, output contract, and trace ID. async function runModel(request) { const envelope = normalizeAiRequest(request) await enforceRequestPolicy(envelope) const context = await buildContext(envelope) const route = chooseModelRoute(envelope, context) return withTrace(envelope, async () => { return callWithRetryAndFallback(route, context) }) } This is not fancy, but it creates a control point. Without this, every later improvement is optional and unevenly adopted. Step 2: Add Metadata Before Optimization Do not start with complex routing. Start with reliable metadata. Capture feature, tenant, prompt version, model, provider, tokens, latency, estimated cost, status, and trace ID. Once metadata is consistent, cost optimization and reliability work become measurable. Before that, you are guessing. Step 3: Put Tool Calls Behind Typed Adapters Do not let agents call arbitrary functions. Give each tool a typed schema, permission policy, idempotency rule, risk level, timeout, and audit behavior. const sendRefundTool = defineTool({ name: "send_refund", risk: "high", schema: RefundRequestSchema, requiresApproval: true, maxAmountCents: 5000, handler: async (args, context) => { await assertTenantPermission(context.tenantId, "refund:create") return payments.refund(args) } }) The agent framework should not be the only barrier between a model and a real-world action. Step 4: Separate Retries From Replays Retry provider timeouts, 429s, and transient network failures when the request is safe. Replay a workflow only when every prior step is idempotent or explicitly checkpointed. This matters because a tool loop may have already sent an email, created a ticket, charged a card, or modified a record. Step 5: Create a Policy Test Suite Every policy should have tests. Can a read-only agent call a write tool? Can a trial tenant use the expensive model? Can PII be sent to a non-approved provider? Can a high-risk action execute without approval? These are ordinary software tests around the control layer. They catch mistakes before you need a postmortem. What to Measure AI middleware should make reliability, cost, and policy decisions visible at the feature level. The point of middleware is not architecture for its own sake. It should improve reliability, cost control, security, and iteration speed. Track metrics that prove that. For reliability, track model error rate by provider and feature, retry rate, retry success rate, fallback rate, tool failure rate, schema validation failure rate, and p95 or p99 latency by route. For cost, track cost per successful outcome, cost by feature and tenant, retry cost as a percentage of total spend, cache hit rate, fallback cost delta, and tokens per workflow step. For governance, track policy blocks, approval queue volume, approval latency, sensitive-data redaction count, unauthorized tool attempts, rate-limit events, and audit log completeness. Do not measure only what is easy. Measure what would have helped in the last incident, the last bill spike, and the last confusing user complaint. Common Mistakes First, do not turn middleware into a giant framework. Keep the core interfaces small and prefer composable hooks over a monolithic server that owns every decision. If developers cannot answer “what happens to this request?” in a few minutes, the middleware has become another opaque AI system. Second, do not protect only the prompt. Sensitive failures can happen in retrieved context, tool arguments, tool outputs, hidden system messages, and fallback paths. Inspect the whole request lifecycle. Third, do not add fallbacks without review. A fallback may be better than an outage, but it can also create silent product degradation. Track fallback events and sample the results. Fourth, do not treat human approval as a modal dialog. Approval requests need persistence, identity, context, timeout behavior, audit logs, and a way to continue after a process restart. Finally, do not log too much sensitive data. Capture enough to debug, but classify fields, redact secrets, hash identifiers when possible, and set retention policies. How to Choose Tools There is no single winner because AI middleware spans several categories. You may combine an agent framework, gateway, observability tool, policy engine, queue, and internal adapters. Evaluate tools by asking whether they preserve your output contracts, trace model calls and tool calls together, support testable policies, work across the languages you use, fail closed for high-risk tools, export to your observability stack, and degrade safely when unavailable. Framework-native middleware is great for speed. Gateways are useful for centralized provider control. OpenTelemetry-compatible tracing helps avoid lock-in. The architecture should decide the tool mix, not the other way around. The Practical Blueprint If you remember one pattern, make it this: Normalize the request. Attach metadata. Build context with provenance. Enforce policy before model and tool execution. Route to the right model path. Retry narrowly. Gate tools before side effects. Emit traces for every important step. Measure cost and quality by feature. This is the difference between a demo and a system an engineering team can operate. Final Takeaway AI middleware architecture is becoming the control plane for production LLM applications. It is where teams enforce cost limits, reduce provider risk, keep tool calls accountable, connect traces, manage approvals, and make model behavior inspectable. The best version is not heavy. It is a thin, explicit, testable layer that catches the messy parts of AI execution before they leak into every feature team’s code. Start small. Wrap model calls. Require metadata. Gate tools. Emit traces. Add policy tests. Then improve routing, fallback, caching, and approvals as real production pressure demands them. Prompts and models still matter. In production, the layer around the model often decides whether the system is reliable, affordable, and safe enough to trust. FAQ What is AI middleware architecture? AI middleware architecture is the control layer between an application and its AI dependencies, including model providers, tools, retrieval systems, policy checks, routing, retries, observability, caching, and approvals. It gives developers one place to intercept and manage AI behavior before it reaches external systems or users. Is AI middleware the same as an LLM gateway? No. An LLM gateway is one type of AI middleware, usually focused on provider access, routing, fallbacks, budgets, and usage monitoring. AI middleware can also live inside an app or around tools, where it handles context building, tool approval, policy enforcement, schema validation, and workflow tracing. When should a team add AI middleware? Add AI middleware when more than one production feature calls models, when agents can call tools, when costs need feature-level tracking, when provider fallback matters, or when security policies must be enforced before execution. If model logic is duplicated across services, the middleware layer is overdue. Does AI middleware add latency? It can, but it can also reduce latency by improving routing, caching, retries, and failure handling. The goal is not to add every possible hook. The goal is to put high-value controls on the request path and keep expensive checks reserved for higher-risk workflows. What should developers build first? Build a single model-call wrapper with required metadata, tracing, output contract validation, and basic policy checks. Then add typed tool adapters, narrow retries, fallback rules, and feature-level cost reporting. This creates a practical foundation without forcing a full platform rewrite. AI Middleware Architecture: The Control Layer Production LLM Apps Need Now was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- From cost center to growth engine: 5 lessons from Leadership Circle's AI-first support transformation
From cost center to growth engine: 5 lessons from Leadership Circle's AI-first support transformation Atlassian
Score: 36🌐 MovesMay 26, 2026https://www.atlassian.com/blog/jira-service-management/ai-first-customer-support-transformation-lessons - Why scaling AI requires both left-brain rigor and right-brain ingenuity
Neuroscience often describes the human brain as operating through two complementary modes of thinking, commonly referred to as the left and right brains. While modern neuroscience debates the strict division between these hemispheres, the metaphor remains useful and highly relevant, particularly in an enterprise context, to illustrate two distinct cognitive approaches. The left hemisphere is associated with logic, structure and analytical reasoning. The right hemisphere enables pattern recognition and creativity. Analytical thinking drives execution. Creative thinking enables adaptation. This distinction is increasingly relevant in the age of AI. GenAI systems are inherently probabilistic, capable of producing a range of possible outputs based on patterns and context. They enable vivid exploration with increasing effectiveness but lack consistency and predictability in real-world execution. Deterministic systems, by contrast, provide the structure, control and repeatability required to translate those insights into outcomes. This analogy draws on early neuroscience work by Nobel laureate Roger Sperry, who demonstrated that the brain’s hemispheres contribute differently to reasoning and perception. Human intelligence ultimately emerges from the interaction between these complementary capabilities. Enterprises operate in a similar dual mode. The analytical side builds infrastructure, governance and discipline, forming the deterministic layer that ensures reliability and control. The creative side rethinks workflows, interprets signals and redesigns decision-making, where probabilistic intelligence plays a critical role. Organizations that scale AI successfully bring these capabilities together. Many, however, remain focused on infrastructure and models, limiting AI to incremental optimization rather than transformation. While data platforms, governance frameworks and model performance are advancing, scaling remains uneven. According to the 2026 AI and Data Leadership Executive Benchmark Survey published in MIT Sloan Management Review, only 39 percent of companies have implemented AI in production at scale, despite years of investment in foundations and governance. Deloitte’s State of AI in the Enterprise 2026 reinforces the divide. Only 34 percent of organizations are using AI to deeply transform their business, while 37 percent remain at a surface level with little or no change to existing processes. This reflects a gap between technical readiness and workflow transformation. Enterprises have strengthened their analytical brain Over the past several years, CIOs have focused on building the analytical backbone required to deploy AI responsibly. Infrastructure has been modernized. Data platforms have matured. Governance and risk management frameworks are more robust. These capabilities are essential, particularly in regulated industries where reliability and compliance are non-negotiable. However, analytical strength alone does not create a competitive advantage. Financial services illustrate this clearly. Most banks operate under similar regulatory frameworks and offer structurally comparable products. Their infrastructure and compliance models are largely consistent. Yet performance varies significantly between institutions. The difference lies in how leading banks activate the creative side of the enterprise. Instead of relying solely on static models or predefined workflows, forward-looking institutions incorporate behavioral signals dynamically, continuously learning from customer interactions, transaction patterns and contextual data in real time. This is where the 3C framework connects directly to the left-brain, right-brain model. The “Core” provides the secure, governed and interoperable foundation that enables AI reliability, compliance and trust. “Context” gives AI access to enterprise data, processes, history and business rules, helping probabilistic intelligence interpret signals with domain awareness and traceability. “Coordination” then brings people, agents, applications and systems together through governed, process-driven workflows. Together, these three pillars allow deterministic systems and probabilistic intelligence to work as one, turning insights into consistent, auditable and adaptive actions. This enables faster, more adaptive and intelligent decisions. Fraud detection becomes increasingly responsive by identifying emerging anomalies rather than relying only on known patterns. Customer onboarding becomes seamless through real-time identity validation and contextual risk assessment. Service interactions become more relevant. Over time, systems continuously improve. This is where customer experience becomes a true differentiator. AI enables institutions to interpret customer needs continuously rather than episodically. The analytical foundation ensures reliability. Creative application enables differentiation. Technology alone won’t scale AI. Whole-brain teams will One of the most common reasons AI initiatives stall is not a technical limitation, but organizational design and change management. Many enterprises treat AI as a specialized capability within engineering or data science teams. While this ensures rigor in model development, it limits the ability to rethink how decisions and workflows should operate in an AI-native environment. As a result, AI is used to optimize existing processes rather than redesign them. Scaling AI requires a shift in operating model. Business leaders, product teams, architects and engineers must work together to rethink workflows and decision structures. Technical teams ensure models are scalable and reliable. Business and product leaders ensure intelligence is applied to improve operational outcomes and customer experience. This convergence is not purely a technology effort. It is a change management exercise that requires redefining ownership and collaboration across functions. This is where enterprises must move beyond isolated functional structures toward what can be described as a “purple team” model. Borrowed from cybersecurity, where purple teams integrate the defensive discipline of blue teams with the adversarial thinking of red teams, this model creates continuous collaboration between those who build systems and those who challenge assumptions. In enterprise AI, purple teams combine engineering precision with business context and operational insight, ensuring intelligence improves how the enterprise operates. As this model takes hold, roles begin to evolve and overlap. Product managers, engineers and business leaders increasingly operate as unified teams responsible for end-to-end outcomes rather than isolated functions. These teams do not simply deploy AI into existing workflows. They redesign workflows to operate more intelligently and effectively. Redesign unlocks AI’s real value A healthcare diagnostics organization focused on early lung cancer detection illustrates how activating both analytical and creative capabilities can unlock meaningful impact. The organization applied machine learning to analyze diagnostic data and accelerate early detection. This reduced analysis time by nearly 70 percent while also improving detection performance and reducing false positives. This demonstrates that AI delivers its greatest impact when applied to improve decision-making, not simply to speed up execution. The analytical foundation ensured reliability, safety and consistency. extended beyond the technology itself into how clinicians engaged with it. By augmenting human judgment with AI-driven insights, practitioners were able to interpret signals more effectively, validate findings with greater confidence and make more informed decisions in critical moments. This human and machine interplay is where the true “creative” advantage emerges. This pattern is increasingly visible across industries. While AI can automate workflows and improve efficiency, its strategic value lies in enabling organizations to rethink how decisions are structured and executed. Enterprises that apply AI only to optimize existing processes see incremental improvements. Those that redesign workflows to incorporate intelligence more natively achieve materially different levels of performance, responsiveness and business impact. CIOs must lead left-brain/right-brain transformation This shift marks a clear evolution in the CIO mandate. The first phase of enterprise AI focused on building analytical strength, modernizing infrastructure, establishing governance and creating scalable platforms. This laid the deterministic foundation for reliable execution. The next phase is about redesign. CIOs must enable organizations to rethink workflows and decision-making to fully leverage AI. This requires closer alignment across business, product and engineering teams, integrating probabilistic intelligence with structured control. AI now operates as an organizational capability, reshaping how decisions are made and how work gets done. Enterprises now face a similar inflection point. Advantage will not come from execution alone, but from how effectively organizations combine creative, probabilistic intelligence with disciplined, deterministic systems to redesign how they operate. Those who get this balance right will move beyond incremental gains to true transformation. The difference is no longer technology. It is the organizational intent. This article is published as part of the Foundry Expert Contributor Network. Want to join?
Score: 36🌐 MovesMay 26, 2026https://www.cio.com/article/4176549/why-scaling-ai-requires-both-left-brain-rigor-and-right-brain-ingenuity.html - Certo raises $4M for AI-powered consumer goods compliance platform
Certo,an AI-powered compliance platform for the beauty and consumer goods industries,has raised $4 million in seed funding to accelerate product development andsupport its international expansion. The...
Score: 36💰 MoneyMay 26, 2026https://tech.eu/2026/05/26/certo-raises-4m-for-ai-powered-consumer-goods-compliance-platform/ - A reality check on the AI jobs hysteria
Haven’t you heard? White-collar jobs are going away, decimated by AI. Waves of layoffs in the tech sector (most recently at Coinbase and Meta and Cisco) are said to presage what will soon come for all of us knowledge workers. But before you quit your job as a software developer or financial analyst—or tech journalist—and…
Score: 36🌐 MovesMay 26, 2026https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/ - AppOmni launches Marlin AI to automate SaaS security investigation and remediation
Software-as-a-service security company AppOmni Inc. today launched Marlin AI, a new autonomous artificial intelligence capability built into its platform that handles correlation, investigation and guided remediation of SaaS security alerts without requiring manual analyst workflows. The product is pitched at security teams drowning in alerts from sprawling SaaS estates. Marlin AI runs continuously inside the […] The post AppOmni launches Marlin AI to automate SaaS security investigation and remediation appeared first on SiliconANGLE .
Score: 36🌐 MovesMay 26, 2026https://siliconangle.com/2026/05/26/appomni-launches-marlin-ai-automate-saas-security-investigation-remediation/ - Opendoor Co-Founder Eric Wu Launches AI For Construction Venture
Opendoor Co-Founder Eric Wu Launches AI For Construction Venture
- Lenovo’s “AI factory” approach starts to show results
The company stands to benefit as enterprise AI demand moves beyond raw compute toward integrated systems.
- GoTo transforms AI into a strategic partner for IT teams with latest innovations for LogMeIn Solutions
GoTo announced new innovations for its LogMeIn Resolve and LogMeIn Rescue products. The latest features introduce new agentic AI, real-time insights, deeper integrations, and enhanced security capabilities to transform how IT teams manage, support, and secure modern environments. The post GoTo transforms AI into a strategic partner for IT teams with latest innovations for LogMeIn Solutions appeared first on Express Computer .
- GPU Job Scheduling Using an Idle Inference GPU Pool
GPU Job Scheduling Using an Idle Inference GPU Pool
- Building a Multi-Tool Gemma 4 Agent with Error Recovery
In a <a href="https://machinelearningmastery.
Score: 35🌐 MovesMay 26, 2026https://machinelearningmastery.com/building-a-multi-tool-gemma-4-agent-with-error-recovery/ - CIOs say they need a people strategy to scale AI
Technology executives underscored the criticality of being a people leader, which requires thinking through what skills to invest in and how to invest in them.
- Alibaba Cloud joins other tech firms in giving free AI tools to SMEs, students in Singapore
Alibaba Cloud joins other tech firms in giving free AI tools to SMEs, students in Singapore The Straits Times
- Kneron to Unveil the Future of Edge AI at COMPUTEX 2026
Kneron to Unveil the Future of Edge AI at COMPUTEX 2026 azcentral.com and The Arizona Republic
Score: 35🌐 MovesMay 26, 2026https://www.azcentral.com/press-release/story/75267/kneron-to-unveil-the-future-of-edge-ai-at-computex-2026/ - Meet the top 10 European startups powering the agentic AI boom
Agentic AI is quickly becoming one of the most active areas of Europe’s AI landscape. Unlike traditional AI tools that mainly generate text, images or summaries, agentic AI systems are designed to take action. They can plan tasks, use tools, follow instructions, analyse results and adapt their next steps, making them useful for real business […] The post Meet the top 10 European startups powering the agentic AI boom appeared first on EU-Startups .
Score: 35🌐 MovesMay 26, 2026https://www.eu-startups.com/2026/05/meet-the-top-10-european-startups-powering-the-agentic-ai-boom/ - Mission Control: Operating Self-Hosted LangSmith on Kubernetes
Operating self-hosted LangSmith on Kubernetes
Score: 35🌐 MovesMay 26, 2026https://blog.langchain.dev/blog/mission-control-operating-self-hosted-langsmith-on-kubernetes - AI interns in the office: Preparing to work with digital employees (KOR)
Kim Byoung-pil The author is a professor of technology management at KAIST. The office environment is changing rapidly. AI is now drafting business plans, designing marketing campaigns and even handling customer service and bookkeeping. Such multitasking capabilities are particularly useful for small organizations, where the work force is often limited. Startups and small business owners are increasingly dividing work among AI systems, assigning them roles in sales, marketing, accounting and customer support. A lobster-shaped cutout representing OpenClaw, an open-source AI agent, stands amid the Baidu offices in Beijing on March 17. [REUTERS/YONHAP] Yet these digital employees also create serious security vulnerabilities. One example is OpenClaw, an open-source AI assistant released in November of last year that quickly attracted global attention. Cisco described it as a “security nightmare.” OpenClaw can delete files from a user’s computer and even run malicious software. The moment such tools are connected to a company’s internal network, the entire corporate security system may be exposed to risk. That helps explain why many companies restrict the use of outside AI programs within their organizations. Without adequate control systems, it is difficult to allow AI to operate autonomously inside a company. At the same time, those restrictions can slow the development of AI capabilities. According to Cisco’s 2025 survey, only 8 percent of Korean companies were classified as “leaders” in AI readiness, below the global average of 13 percent. Still, companies cannot simply block AI adoption while competitors use it to widen productivity gaps. The question is how to resolve this dilemma. One practical approach may be to treat AI like an intern or probationary employee. When a new recruit first joins a company, every task is unfamiliar. Senior employees supervising the newcomer often struggle to decide what responsibilities to assign. Yet once entrusted with smaller jobs, probationary workers often prove capable. Today’s AI resembles that stage of development. Related Article Agentic AI era demands state-backed industrial strategy Agentic AI ignites efficiency race amid memory crunch Science Ministry launches Agentic AI Alliance consultative body with LG, Kakao The butterfly effect of the Anthropic contract termination Google reportedly developing AI agent ahead of annual conference Managing AI so that it does not threaten corporate security is remarkably similar to establishing rules for supervising new employees. First, just as workers enter the office using identification cards, AI systems should also be given clear identities. Uber created a framework that assigns verifiable identities to internal AI systems and tracks the work they perform. In effect, the company issued digital employee IDs for AI. Second, companies should not open all internal information to AI at once, just as probationary employees are not granted unrestricted access to all company resources. AI systems should operate in isolated environments with limited access to data. Their workspaces should remain separated from core company networks, and they should only be able to use authorized tools. The authority granted to AI should never exceed that of the human employee supervising it. Third, there must be a control mechanism equivalent to managerial approval. A probationary employee may prepare a purchase order, but cannot place the order independently. Final approval still comes from a responsible manager. The same principle should apply to AI. Human oversight must remain embedded in important procedures. Finally, just as probationary workers gain more important responsibilities as they build trust and accumulate positive evaluations, AI systems should gradually receive more autonomy as they prove reliable. Current AI technology is beginning to develop the ability to accumulate work experience independently. The recently introduced Hermes agent documents successful work processes and refers to them later when handling similar tasks. Anthropic's chief product officer, Ami Vora, co-founder and president, Daniela Amodei, and co-founder and CEO, Dario Amodei, appear onstage at the Code with Claude developer conference in San Francisco on May 6. [AP/YONHAP} Just as probationary employees can eventually become permanent staff members, companies that accumulate experience assigning work to AI and reviewing its results may gradually entrust it with more critical responsibilities. Greater AI capability alone, however, does not automatically accelerate adoption. Risks involving data leaks, malfunctions and auditing difficulties also increase alongside AI’s growing power. For that reason, waiting until AI performance improves further before introducing it more broadly may prove to be a mistake. Companies that delay adoption are likely to lack practical management experience while competitors continue widening the gap. What matters most is beginning now to prepare properly for working alongside AI. Organizations need systems for AI identity verification and authentication, isolated workspaces, behavioral records and human supervision. Only by assigning AI the role of a probationary employee can companies gradually move toward entrusting it with more responsible positions. The era of working alongside digital employees has already arrived. The challenge now is deciding what tasks to assign them, and under what rules and procedures. Preparation should begin before it is too late. AI 직원과 함께 일할 준비 김병필 KAIST 기술경영학부 교수 사무실의 풍경이 빠르게 바뀌고 있다. 인공지능(AI)이 사업 계획을 세우고, 제품 홍보 전략을 짠다. 이제 고객 응대나 장부 작성까지 수행한다. 이처럼 여러 일을 처리해 내는 능력은 작은 조직에 특히 요긴하다. 스타트업이나 소상공인은 늘 일손이 모자라기 때문이다. 영업·홍보·회계·고객 응대까지 AI가 조금씩 나누어 맡고 있다. 하지만 이 디지털 직원에게는 보안의 빈틈도 크다. 예컨대 지난해 11월 등장한 오픈소스 AI 비서 오픈클로(OpenClaw)는 빠르게 확산되며 큰 주목을 받았다. 미국의 시스코사는 이를 두고 “보안 악몽”이라고 평가했다. 오픈클로는 이용자의 컴퓨터에서 파일을 삭제하고, 악성 프로그램까지 실행할 수 있다. 이런 도구가 사내 전산 시스템과 연결되는 순간 회사의 보안 시스템 전체가 위험해질 수 있다. 그러니 여러 기업이 외부 AI 프로그램의 사내 사용을 제한하는 것도 이해할 만하다. 통제 장치가 충분하지 않은 상태에서 AI가 회사 안에서 자율적으로 활동하도록 내버려 두기는 어렵다. 반대로 그만큼 AI 활용 역량이 자라기 어렵다. 시스코사의 2025년 조사에 따르면, AI 활용 준비도에서 ‘선두 그룹’으로 분류된 한국 기업의 비율은 8%에 그쳤다. 글로벌 평균 13%에 못 미치는 수치다. 경쟁자가 AI를 활용해 생산성 격차를 벌리는 상황에서 막고 있을 수만은 없다. 이 딜레마를 어떻게 풀어야 할까. AI를 수습사원처럼 써보면 어떨까. 수습사원이 처음 회사에 들어오면 모든 일이 낯설다. 감독을 맡은 선배 직원도 무슨 일을 시켜야 할지 고민이 된다. 하지만 작은 일부터 맡겨 보면 곧잘 해내곤 한다. 지금의 AI가 그렇다. AI가 회사의 보안 시스템에 위협이 되지 않도록 제대로 관리하는 일은 수습사원 관리 규칙을 마련하는 일과 무척 비슷하다. 첫째, 직원이 출근할 때 사원증을 찍고 사무실로 들어오듯, AI에게도 명확한 신원을 부여해야 한다. 승차 공유업체 우버는 사내 AI에 검증할 수 있는 신원을 부여하고, 이를 통해 AI가 한 일을 추적하는 체계를 만들었다. AI에게 일종의 디지털 사원증을 발급한 셈이다. 둘째, 수습사원에게 모든 회사 정보를 한꺼번에 열어 주지 않듯, AI를 격리 공간 내에 두고, 자료 접근 권한을 제한해야 한다. AI가 일하는 환경은 회사의 핵심 전산망과 분리하고, 필요한 도구만 실행할 수 있도록 해야 한다. AI에 주어진 권한은 이를 감독하는 담당 직원의 권한을 넘지 못하게 해야 한다. 셋째, 수습 업무에는 ‘인간 사수의 결재’라는 제어장치가 있어야 한다. 수습사원이 구매 발주서를 작성할 수는 있지만, 그대로 주문을 넣지는 않는다. 실제 발주는 담당자의 확인을 거쳐야 한다. AI도 마찬가지다. 중요한 업무에는 인간 감독이 절차 속에 포함되어야 한다. 마지막으로, 수습사원이 일을 하며 평가를 쌓고 더 중요한 일을 맡게 되듯, AI도 검증된 만큼씩 자율성을 넓혀가야 한다. 지금의 AI는 업무 경험을 스스로 축적하는 기능을 갖추기 시작했다. 최근 공개된 헤르메스(Hermes) 에이전트는 성공한 작업 과정을 문서로 남긴다. 다음에 비슷한 일을 맡으면 그 기록을 참고해 더 능숙하게 처리한다. 수습사원이 일을 잘하면 정식 직원이 될 수 있는 것처럼, AI에 일을 맡기고 결과를 확인하는 경험이 쌓이면 점차 더 중요한 일도 맡길 수 있다. AI 역량이 커진다고 도입 속도가 저절로 빨라지는 것은 아니다. 유출, 오작동, 감사 곤란의 위험도 함께 커지기 때문이다. 따라서 AI 성능이 더 좋아진 뒤 본격적으로 도입하겠다는 생각은 착각일 수 있다. 뒤늦게 시작할수록 관리 경험은 부족하고, 경쟁자와의 격차는 더 벌어지기 마련이다. 중요한 것은 지금부터 AI와 함께 일할 준비를 제대로 시작하는 일이다. AI 신원과 인증, 격리된 작업 공간, 행동 기록과 인간 감독 시스템을 갖추어야 한다. AI를 수습사원 삼아 일을 시켜봐야 점차 정식 직원에 가까운, 책임 있는 역할을 맡길 수 있다. 디지털 직원과 함께 일하는 시대는 이미 도래했다. 어떤 규칙과 절차에 따라 무슨 일을 맡길지 정해야 할 때다. 늦지 않게 준비를 시작하자. This article was originally written in Korean and translated by a bilingual reporter with the help of generative AI tools. It was then edited by a native English-speaking editor. All AI-assisted translations are reviewed and refined by our newsroom.
- This Global Survey Reveals a Brutal Truth About AI in Customer Service. Here's What Every Leader Needs to Hear.
This Global Survey Reveals a Brutal Truth About AI in Customer Service. Here's What Every Leader Needs to Hear. entrepreneur.com
- Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening
Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening
Score: 35🌐 MovesMay 26, 2026https://www.anthropic.com/news/kiyoung-choi-representative-director-anthropic-korea - AI Sales Acceleration Platform: Scale Faster
Discover how AI sales acceleration platforms streamline workflows, boost efficiency, and scale your sales team's success with Copy.ai.
- The Barnes & Noble CEO thinks AI books are fine. He’s wrong.
Barnes & Noble CEO says he has no problem selling AI-written books. It sounds reasonable on the surface. It isn't, and here's why it's bad news for every author alive.
Score: 35🌐 MovesMay 26, 2026https://www.digitaltrends.com/computing/the-barnes-noble-ceo-thinks-ai-books-are-fine-hes-wrong/ - Strategy Asset Managers CEO Tom Hulick: tech broadening beyond clear AI winners
Strategy Asset Managers CEO Tom Hulick: tech broadening beyond clear AI winners
Score: 35🌐 MovesMay 26, 2026https://qz.com/strategy-asset-managers-tom-hulick-tech-beyond-ai-winners - The Download: puncturing the AI jobs panic
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A reality check on the AI jobs hysteria Despite the growing hysteria over AI’s threat to white-collar jobs, there’s still scant evidence that the technology has had a large-scale impact on…
Score: 35🌐 MovesMay 26, 2026https://www.technologyreview.com/2026/05/26/1138028/the-download-ai-jobs-data/ - AI to grab a third of India's influencer budgets, but biggies to go bigger
As synthetic influencers flood the market, human creators with genuine audience trust and scale are becoming even more valuable, and commanding bigger pay cheques.
Score: 35🌐 MovesMay 26, 2026https://www.livemint.com/technology/ai-influencer-marketing-advertising-budgets-11779594109263.html - How to Build Reliable AI Agents Out of Unreliable AI Parts?
How to Build Reliable AI Agents Out of Unreliable AI Parts? Gartner
- Model Best Open-Sources BitCPM-CANN: 1.58-bit Training Achievable on Domestic Compute
Model Best has open-sourced BitCPM-CANN, a complete training framework enabling 1.58-bit model training on domestic AI accelerators, reportedly reducing inference memory requirements by up to six times compared to full-precision training.
Score: 35🌐 MovesMay 26, 2026https://pandaily.com/model-best-open-sources-bit-cpm-cann-1-58-bit-training-achievable-on-domestic-compute - Gabriel Landeskog uses in-skate sensors, AI-driven movement platform to manage his knee and workload
Gabriel Landeskog uses in-skate sensors, AI-driven movement platform to manage his knee and workload The Washington Post
- AI and You: AI vs UPSC—three chatbots attempt India’s toughest exam
A comparative test of leading AI models on actual UPSC Prelims papers reveal how closely modern systems can mirror human-level preparation, handling history and polity well but struggling with precise current affairs and technical distinctions that often decide exam outcomes.
- Opinion | Our Data-Center Alarmism Isn’t Surprising
The modern panic industry isn’t an accident.
Score: 34🌐 MovesMay 26, 2026https://www.wsj.com/opinion/our-data-center-alarmism-isnt-surprising-aafbf9ca?mod=rss_Technology - Reliable Machine Learning Methods for Payment Risk Prediction in Supply Chain Financing
Reliable Machine Learning Methods for Payment Risk Prediction in Supply Chain Financing repository.cam.ac.uk
Score: 34🌐 MovesMay 26, 2026https://www.repository.cam.ac.uk/items/bbb7f1cf-f325-47ee-bc84-7fcad735d1ad - Inside BoF and Shopify’s Knowledge Breakfast on the Future of AI Commerce
BoF and Shopify convened commerce and marketing leaders from Coach, Revolve, Jennifer Fisher, Giorgio Armani, Estée Lauder, Lacoste, COS, FullBeauty Brands and more in New York City to discuss how AI is rewriting the rules of product discovery and online shopping.