AI News Archive: June 4, 2026 — Part 12
Sourced from 500+ daily AI sources, scored by relevance.
- A Chinese robotics start-up beat Nvidia on a global AI ranking. Is a new tech war brewing?
As artificial intelligence steps out of the digital realm and into the real world, the race to build the embodied “brains” powering next-generation robots has become the newest battleground in tech competition between China and the United States. Two days after US chip giant Nvidia launched its Cosmos 3 model – designed to help physical AI “think before it acts” – a Chinese start-up stole the spotlight. On Wednesday, Hangzhou, Zhejiang province-based Spirit AI said its foundation model for...
- ChatGPT's memory is getting better, especially if you're on the free tier
OpenAI has significantly improved the chatbot's "dreaming" architecture.
- NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents
NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents MarkTechPost
- NVIDIA Nemotron 3 Ultra released: fast, intelligent, and open
NVIDIA releases Nemotron 3 Ultra, a fast and intelligent open model.
- NVIDIA Nemotron 3 Ultra
NVIDIA Nemotron 3 Ultra for high-throughput reasoning and long-running agent workflows
- Nemotron 3 Ultra now available on AI Gateway
Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway . Nemotron 3 Ultra is an open Mixture-of-Experts reasoning model built for orchestrating long-running agent workflows, with a 1M token context window. The model targets multi-turn agent workflows: planning, tool use, sub-agent delegation, and error recovery. Throughput reaches up to 350 tokens per second, with up to 30% lower cost on agentic tasks. To use Nemotron 3 Ultra, set model to nvidia/nemotron-3-ultra-550b-a55b in the AI SDK . AI Gateway provides a unified API for calling models, tracking usage and cost, and configuring retries, failover, and performance optimizations for higher-than-provider uptime. It includes built-in custom reporting , Zero Data Retention support , dynamic provider sorting by latency and cost , and more. AI Gateway reflects provider pricing with no markup and does not charge a platform fee on inference, including on Bring Your Own Key (BYOK) requests. Learn more about AI Gateway , view the AI Gateway model leaderboard or try it in our model playground . Read more
- SEO for GTM: Scale Content with AI Workflows
Learn how SEO integrates into GTM strategies, boosts visibility, and scales content production with Copy.ai's AI-powered workflows.
- Google brings local AI agents to laptops with Gemma 4 12B
Google has released new tools that allow developers to run agentic AI workflows locally using Gemma 4 12B, a 12-billion-parameter model from Google DeepMind. In a blog post, the company said the model, combined with the Google AI Edge stack, can be used to build and test applications on everyday machines. The model-runtime combination supports capabilities such as autonomous data processing, visual insight generation, webpage creation, and tool use. The release includes Google AI Edge Gallery for macOS, where developers can use Gemma 4 12B to generate and run scripts for tasks such as data analysis. Google also said its Eloquent voice dictation and editing app now runs fully on-device on macOS, with support for local transcription and voice-driven text editing. Google has also expanded LiteRT-LM, its lightweight command-line tool for running language models locally, with a new serve command. The company said this allows the CLI to act as a local LLM server and lets developers connect Gemma 4 12B to standard tools, SDKs, and frameworks through a local endpoint. “Your data stays on your device while maintaining reliable responsiveness, utility, and cost efficiency,” the company said in the blog post. The announcement comes as enterprises are looking beyond large, general-purpose models for some AI workloads. Gartner predicted that by 2027, organizations will use small, task-specific AI models at least three times more than general-purpose large language models, citing demand for more contextualized and cost-effective AI systems. Challenges to overcome But running agents on employee devices brings a number of problems. Companies must work within the limits of endpoint hardware, which can restrict the size of models that run effectively and the number of model instances that can operate at one time. “While the AI can now fit on a laptop, enterprise IT infrastructure is largely unprepared to manage it,” said Rishi Padhi , principal analyst at Gartner. “Even highly optimized models like the Gemma 4 12B require around 16GB of unified memory or VRAM to run alongside standard applications. Many standard-issue enterprise laptops lack the memory bandwidth and NPUs/GPUs required for fluid, multi-turn agentic execution.” Anand Joshi , AI analyst at TechInsights, said local deployment also changes the nature of the workloads. On a PC, search may mean finding information across internal folders and files. In a data center, the same function could involve searching the internet or querying a large database such as SQL. “The framework for local deployment of agentic AI is different from that of a data center,” Joshi said. “The models are smaller; you can run only one instance of a large model at a time. You are limited by memory, CPU, and so on.” Security and governance are also likely to become bigger concerns as AI agents move closer to enterprise endpoints. Agentic AI is designed to take actions, creating new security risks when local models are given access to employee files or allowed to interact directly with applications and scripts. “Sandboxing these agents without breaking their utility is still a major operational challenge,” Padhi added. “And all this while enterprises need to audit AI usage for compliance and security. When inference happens entirely offline, capturing logs, tracking model drift, and ensuring employees are using the approved, compliant ways for a model becomes incredibly difficult.” The cost tradeoff Running AI agents locally could reduce some cloud inference costs, but the savings may be offset in the near term by higher spending on endpoint hardware and management. “First and foremost, it is an OpEx-to-CapEx shift, as it shifts that financial burden by forcing accelerated hardware refresh cycles for premium PCs or edge devices,” Padhi said. “It would require buying expensive, high-memory laptops for employees at a time when memflation in the hardware industry is already driving up end-user average selling prices for laptops.” Many enterprises refreshed PCs in 2025 to support Windows 11, but at that point, most AI inference still ran in the cloud, and the case for on-device AI remained unclear, Padhi said. Enterprises may therefore move cautiously, buying AI-capable PCs only where local inference has a clear business case. Over time, however, on-device AI could make enterprise AI spending more predictable by reducing exposure to variable cloud inference bills. The tradeoff is that companies may face a higher baseline cost for equipping and managing employees’ devices. Complementing cloud AI For enterprises, local AI is unlikely to replace cloud-based AI outright. Analysts said local AI is more likely to be used for workloads that benefit from endpoint processing, especially when applications must operate offline or when privacy and response times are critical. “For local agentic AI to proliferate, the use cases on edge will have to complement data center/cloud use cases,” Joshi said. “I don’t expect local agentic AI to replace cloud AI, but it has potential to take a slice away from the cloud, and models like Gemma are significant steps towards enabling that.” The market, Joshi added, is still determining where local AI fits best. “I estimate that use cases that require privacy or have strict latency needs will move to local node first, with further migration of others in the next 2-3 years,” he said. Padhi said model placement will depend on the privacy requirements of a workload, the computing power it needs, and where the relevant data resides. Tasks such as code generation or analysis of local files could increasingly run on employee devices, while enterprise-wide RAG systems and more complex AI workflows are likely to remain cloud-based. The article originally appeared on InfoWorld .
- 전력망·수도·통신망이 AI를 품는다…앤트로픽, 150개 인프라 기업에 글라스윙 문 열어
앤트로픽은 3일 AI 기반 취약점 탐지 프로그램인 ‘프로젝트 글래스윙(Project Glasswing)’에 150개 기업을 추가로 참여시킨다고 발표했다. 새롭게 참여하는 기업은 전력, 수도, 의료, 통신, 하드웨어 등 국가 핵심 인프라 분야를 중심으로 구성됐다. 분석가와 보안 업계는 이번 조치를 긍정적으로 평가했다. 취약점 탐지에 참여하는 기업이 많을수록 더 많은 보안 결함을 발견할 수 있기 때문이다. 다만 업계가 더 주목하는 문제는 현실적인 과제인 ‘병목 현상’이다. 프로젝트 글래스윙 과 주요 AI 기업들이 추진하는 유사 프로젝트가 취약점 발견 건수를 현재보다 10배 이상 늘릴 경우, 공급업체들이 이를 적시에 분류하고 패치할 수 있을지가 새로운 과제로 떠오르고 있다. 공급업체들은 그동안 알려진 보안 취약점을 수정하는 데도 느린 대응으로 지적받아 왔다. 최근에는 마이크로소프트(MS)가 보안 연구원과 공개적으로 충돌하기도 했다. 해당 연구원은 MS의 대응이 지나치게 늦다고 판단해 취약점을 공개했다. 설령 공급업체들이 증가하는 취약점 처리 속도를 따라간다고 해도, 기업 보안관제센터(SOC)가 쏟아지는 패치를 모두 소화할 수 있을지는 또 다른 문제다. 또한 자동화 기술을 활용해 패치를 생성할 경우, 최고정보보호책임자(CISO)들이 이를 별도의 수동 검증 없이 배포할 만큼 신뢰할 수 있을지도 불확실하다. 일반적으로 CISO는 자동화 결과를 쉽게 신뢰하는 편이 아니다. 앤트로픽은 신규 참여 기업 발표 블로그를 통해 “참여 기업들의 공통점은 코드베이스가 공격받을 경우 치명적인 결과를 초래할 수 있다는 점”이라며 “대부분의 파트너는 대규모 공격이 발생할 경우 1억 명 이상의 사람에게 영향을 미칠 수 있으며, 이는 국가 안보와 국제 안보 모두에 중대한 파급효과를 가져올 것으로 보고 있다”고 설명했다. 이어 “이번 확장은 AI가 모든 소프트웨어를 더욱 안전하게 만들고, AI가 사이버보안의 핵심 전제를 어떻게 바꿀 수 있는지 업계가 대비하도록 지원하기 위한 장기 전략의 다음 단계”라고 밝혔다. 프로젝트 글래스윙은 지난 4월 7일 처음 공개됐다. 초기 참여 기업으로는 아마존웹서비스(AWS), 애플, 브로드컴(Broadcom), 시스코, 크라우드스트라이크, 구글, JP모건체이스, 리눅스 재단, 마이크로소프트(MS), 엔비디아, 팔로알토네트웍스가 참여했으며, 이후 옥타(Okta)도 참여 사실을 확인했다. 패치 개발 병목 현상 패치 병목 문제는 해결하기 쉽지 않은 과제다. 아무리 규모가 큰 공급업체라고 해도 보안 취약점을 수정하고 패치를 배포하는 데 투입할 수 있는 자원에는 현실적인 한계가 있기 때문이다. AI 보안 기업 코니퍼스AI(Conifers.ai)의 CEO 톰 핀들링은 “가장 큰 문제는 적응력”이라며 “취약점이나 보안 약점이 발견되면 방어 조직은 공격자가 동일한 정보를 악용하기 전에 이를 검증하고 우선순위를 정한 뒤 수정해야 한다. 특히 검증 단계가 매우 중요하다”고 설명했다. 핀들링은 이어 “직접 해당 도구를 테스트해 본 결과 상당수의 오탐(False Positive)이 발견됐다”며 “이는 기업이 모든 탐지 결과를 즉시 조치해야 하는 사안으로 간주할 수 없다는 의미”라고 말했다. 그는 또한 “기업은 의미 있는 신호와 불필요한 노이즈를 신속하게 구분할 수 있어야 하며, 실제 문제를 중심으로 프로세스와 개발 워크플로, 패치 운영 체계를 조정해야 한다”고 강조했다. 핀들링은 “기업이 주목해야 할 가장 중요한 지표는 발견된 취약점 수 자체가 아니라 신뢰할 수 있는 문제가 확인된 이후 얼마나 빠르게 대응할 수 있는지일 수 있다”며 “일부 조직의 경우 이러한 대응 주기가 여전히 수개월에 달한다”고 설명했다. 이어 “이 대응 시간을 얼마나 단축하느냐에 따라 AI 기반 취약점 탐지가 실제로 보안 방어력을 향상시킬지, 아니면 보안 노이즈의 양과 속도만 증가시키는 결과로 이어질지가 결정될 것”이라고 분석했다. 해결이 아닌 대응의 문제 컨설팅 기업 액셀리전스(Acceligence)의 저스틴 그라이스 CEO는 프로젝트 글래스윙의 참여 기업 확대가 보안 취약점 문제가 줄어들고 있다는 사실보다, 그 문제가 어떻게 변화하고 있는지를 CISO들에게 보여주는 사례가 될 수 있다고 평가했다. 그라이스는 “사이버보안이 취약점 발견의 문제로 여겨져 왔다는 것은 잘 알려진 사실”이라며 “하지만 AI는 그동안의 진짜 문제가 취약점 발견이 아니라 대응과 해결(remediation)이었다는 점을 보여주고 있다”고 설명했다. 이어 “업계는 이미 취약점 검증, 우선순위 지정, 패치 개발, 테스트, 배포를 충분히 빠르게 수행하는 데 어려움을 겪고 있다”며 “보안팀이 취약점 탐지를 담당하고 IT 부서나 사업 부서가 실제 패치를 담당하는 구조라면 상황은 더욱 악화될 수 있다”고 말했다. 그라이스는 “AI가 인간보다 10배, 100배 빠르게 취약점을 찾아낼 수 있다면 병목 현상은 단순히 다음 단계로 이동할 뿐”이라며 “기업은 실제로 대응할 수 있는 수준을 훨씬 넘어서는 취약점을 인지하게 되는 난처한 상황에 직면할 수 있다”고 분석했다. 이어 “AI는 사이버보안을 가시성(Visibility)의 문제에서 실행(Execution)의 문제로 바꾸고 있다”고 진단했다. 또한 우려스러운 전망도 내놨다. 그라이스는 “AI는 기업을 더욱 안전하게 만드는 동시에 더 큰 부담을 안길 수도 있다”며 “기업은 전례 없이 높은 수준의 위험 가시성을 확보하게 되겠지만, 동시에 그 위험 규모가 실제로 얼마나 큰지도 확인하게 될 것”이라고 말했다. 자동화 신뢰 확보가 관건 IDC의 AI 보안 부문 연구 책임자인 그레이스 트리니다드는 기업이 직면한 병목 현상을 해결하기 위해서는 광범위한 자동화가 필요하다고 평가했다. 다만 사이버보안 담당자들의 신뢰 부족을 고려하면 공급업체들은 각 패치의 신뢰도를 수치화해 제시할 수 있는 엄격한 체계를 마련해야 한다고 강조했다. 트리니다드는 “패치에 신뢰도 점수를 함께 제공하는 것은 새로운 개념”이라며 “기업은 자사 환경에 영향을 미치는 취약점을 식별하고 우선순위를 정한 뒤 적절히 대응할 수 있어야 한다”고 설명했다. 이어 “우리는 아직 준비되지 않은 새로운 역량을 익히고 있다. 바로 자동화 기술을 어떻게 신뢰할 것인가의 문제”라며 “현재와 같은 속도로 움직여야 하는 상황에서는 신뢰가 깨지는 사례도 발생할 수 있다”고 말했다. 또한 “신뢰도 평가는 반드시 투명성을 기반으로 해야 한다”며 “사람에게 설명할 수 없을 정도로 복잡한 방식으로 신뢰도를 산정해서는 안 된다”고 강조했다. 트리니다드는 앤트로픽이 신규 참여 기업 150곳 모두가 프로젝트 접근 권한을 얻기 전에 보안 요건을 충족해야 한다고 밝힌 점도 언급했다. 하지만 그는 “어떤 보안 요건인지 아무도 알지 못한다”며 “이 같은 설명만으로는 신뢰를 높이기 어렵다”고 지적했다. 한 가지 대안으로는 보안 업체가 스스로를 평가하는 것이 아니라 신뢰할 수 있는 제3자 기관을 활용하는 방안이 제시된다. 기업용 소프트웨어 기업 워크데이(Workday)는 이미 유사한 접근법을 도입했다. 워크데이는 미터 ATLAS(MITRE ATLAS)와 같은 공개 표준을 활용하는 독립 검증 서비스를 통해 자사 플랫폼에서 운영되는 AI 에이전트의 보안성과 규정 준수 여부를 검증하고 있다. 현재는 보안 검증에 초점이 맞춰져 있지만, 향후 신뢰도 평가에도 같은 개념을 적용할 수 있을 것으로 보인다. 참여 기업 확대가 불러온 보안 우려 프리랜서 기술 분석가 카미 레비는 프로젝트 글래스윙이 참여 기업 150곳을 추가로 확대하는 것이 궁극적으로 얼마나 효과를 낼 수 있을지에 대해 보다 회의적인 시각을 보였다. 레비는 “프로젝트 글래스윙의 핵심 목적은 앤트로픽이 엄격한 검증을 거친 소수의 공급업체와 긴밀히 협력해 기존 보안 기술과 프로토콜로는 대응하기 어려운 새로운 유형의 대규모언어모델(LLM) 보안 위협에 대한 방어 체계를 구축하는 데 있었다”고 설명했다. 이어 “참여 대상을 수백 개 기업으로 확대하면 더 많은 전문가의 지식을 활용해 방어 체계를 강화할 수 있다는 장점이 있다”면서도 “동시에 정보 유출 가능성에 대한 상당한 우려도 함께 커진다”고 지적했다. 특히 “이는 이미 동일한 모델과 관련해 두 차례의 정보 유출 사례를 보고한 기업에서 나온 결정”이라고 언급했다. 레비는 “이상적인 상황이라면 앤트로픽은 이번 대규모 확장 발표와 함께 코드가 잘못된 사람의 손에 넘어가지 않도록 내부 보안 프로토콜을 강화하는 별도의 계획도 공개했어야 한다”고 말했다. 이어 “훨씬 더 많은 연구자를 참여시키는 것은 잠재적 공격자들에게 공격 대상이 크게 늘어날 것이라는 신호를 줄 수 있으며, 향후 보안 침해에 대한 우려를 해소하지도 못한다”고 분석했다. dl-ciokorea@foundryco.com
- Asana launches AI ‘chief of staff’ to keep projects on track
Asana has launched an AI personal assistant that can track various data sources to alerts users when a work project runs into problems and recommends next actions. It’s one of a range of product announcements made Thursday at the company’s Work Innovation Summit in London, including updates to its existing AI teammates product. These follow Asana’s recent acquisition of AI workflow automation software vendor StackAI for $75 million. Asana Dash is described as an “AI chief of staff” that can help users stay up to date on work projects by accessing information in Asana as well as across email, calendar and team messaging apps, said Arnab Bose, Asana’s chief product officer. “Keeping people in their ‘zone of genius’ and hooking up all of these unstructured signals to the structure of Asana — that’s what Dash does best,” said Bose. The AI assistant can access the same Asana project information as the user, and can flag when problems occur that could push a project off-track. Dash can then act to address problems, such as posting messages within Asana on behalf of the user or directing an AI teammate to take action. (Dash will ask the user before making any changes.) “Asana is building on recent acquisitions, and earlier investment in a graph database focused on human connections — the Asana Work Graph — and its position within a well-integrated flow of work to deliver to each worker an executive assistant rooted in the context of their job,” said Wayne Kurtzman, IDC research vice president. The Dash personal assistant is enabled by an expanded Asana work graph — the data model related to work carried out by teams in the application. Asana has in the past been more focused on tasks, projects, portfolios, and goals, said Bose, but the work graph now includes new sources of data, linking to employee calendars and accessing meeting transcripts, for instance, alongside other documents and databases. There are also updates to the AI teammates feature — collaborative AI agents that multiple human coworkers can interact with — which are now more powerful, said Bose. This includes additional skills and integrations with third-party apps such as Gmail, Slack, Outlook, Figma, and Canva. As for the StackAI acquisition, Bose said it allows Asana to extend the reach of AI agents into a variety of business apps more easily and reliably, building ] on Asana’s “system of action” function. The latter tracks work carried out across an organization, he said, and can automate the complex processes that make up many enterprise workflows. “If you look at StackAI’s website, the thing that they are really, really great at is building these complex, multi-step processes,” said Bose. The aim is to combine StackAI’s agent builder with integration expertise agents already available in Asana. “So, the idea is when an AI teammate or Dash recommends the next best action, they will be able to choose downstream actions based on the portfolio of approved workflows that you’ve built out in StackAI.” Overall, the announcements help Asana provide a platform that combines agents and workflow automation with AI assistance that aids humans to work more effectively, said Bose. “Our terminology for this is a ‘human-agent operating system,’ because automation, I feel, is a little reductive in the sense that there are some things that are fully automated, but a lot that you’d want a human being and an AI agent to coordinate on and align on,” he said. Asana did not immediately respond to a request for pricing and availability details for Dash.
- Apple reportedly planning to use Nvidia chips for Gemini-powered Siri
Apple's upcoming AI-powered Siri may leverage Google Cloud's Nvidia Blackwell B200 chips and Google's Gemini AI models. This strategic move aims to enhance Siri's capabilities, with Apple reportedly enabling Nvidia's confidential computing to encrypt user data during processing. The integration of these advanced technologies is anticipated for WWDC 2026.
- Nvidia’s Jensen Huang to meet Korean AI startups during Seoul visit
Nvidia’s Jensen Huang to meet Korean AI startups during Seoul visit 매일경제
- Samsung's updated Health app unsurprisingly comes with new AI-powered features
Samsung will roll out an update for its Health app before launching its new Galaxy Watches.
- Netflix aims to use AI to help viewers manage content overload
Netflix aims to use AI to help viewers manage content overload The Straits Times
- Global giants that have set up AI facilities, labs in Singapore
Global giants that have set up AI facilities, labs in Singapore The Straits Times
- Cloudflare CEO says bot internet traffic has overtaken humans
There's now more internet traffic coming from bots than humans, according to Cloudflare.
- The Best Laptops of Computex 2026: RTX Spark and AI Dominate
The Best Laptops of Computex 2026: RTX Spark and AI Dominate PCMag UK
- NVIDIA's RTX Spark and the race for the ultimate AI chip
NVIDIA's RTX Spark in the race for the ultimate AI chip
- Microsoft's Most Important Build 2026 Announcement Wasn't About AI
Microsoft's Most Important Build 2026 Announcement Wasn't About AI PCMag
- GOP Lawmakers Accuse China of Bankrolling US Data Center Protests
GOP Lawmakers Accuse China of Bankrolling US Data Center Protests PCMag UK
- AI ‘super-antigen’ vaccine could protect against whole families of viruses
Researchers say this could prevent future pandemics, save lives and stop lockdowns
- Canada PM issues warning over AI and lays out strategy to overcome ‘major adoption gap’
He previously warned about global hegemons like the United States using economic integration to pressure smaller nations
- HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers
For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task sema...
- Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolv...
- Pretraining Recurrent Networks without Recurrence
Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients, making long-range ...
- RREDCoT: Segment-Level Reward Redistribution for Reasoning Models
Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can ...
- MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless searc...
- PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices via low-degree polynomial preconditioning. After training, the preco...
- You Only Index Once: Cross-Layer Sparse Attention with Shared Routing
Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse m...
- Benchmark Everything Everywhere All at Once
Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly ...
- ChatPilot
Bulk delete, archive & timestamp your ChatGPT conversations
- Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals
As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable fro...
- Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents
Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in explor...
- Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads
LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spa...
- RiskFlow: Fast and Faithful Safety-Critical Traffic Scenario Generation
Safety-critical traffic scenario generation is essential for evaluating autonomous driving systems under rare but high-risk interactions. Existing diffusion-based methods offer strong controllability in closed-loop generation, but their iterative denoising process is computationally expensive and ma...
- Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss
Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autoregressive language modeling, flow-based generative modeling, ...
- Risk Assessment of Autonomous Driving: Integrating Technical Failures, Ethical Dilemmas, and Policy Frameworks
Autonomous driving technology has the potential to reduce the large number of road traffic accidents caused by human error each year, but it also brings new types of risks that need to be evaluated from the aspects of technology, ethics and regulations. Based on public crash data from the National H...
- Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration
Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires collaborators to continuously maintain and align mental models o...
- LatentWave: JEPA Pretraining for Wireless Foundation Models
Wireless foundation models have emerged as a promising alternative to building separate models for each wireless task. However, existing approaches rely on masked input reconstruction, which can bias representations toward low-level signal details. In this paper, we propose LatentWave, a wireless fo...
- An Infectious Disease Spread Simulation Based on Large Language Model Decision Making
Modelling individual decision-making during infectious disease outbreaks is crucial for understanding behavioural dynamics and informing effective public health interventions. Prior work has shown that large language models can simulate realistic human behaviour by generating agent decisions based o...
- Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo
Multimodal generative models produce fluent outputs but remain unreliable when generation must respect structured, domain-specific, or safety-critical knowledge. Existing methods incorporate knowledge through mechanisms such as prompt augmentation, guidance, latent editing, or fine-tuning, yet they ...
- TokenMizer: Graph-Structured Session Memory for Long-Horizon LLM Context Management
Large language model (LLM) deployments for long-horizon tasks face a fundamental constraint: context windows are finite while productive work sessions are not. When history exceeds the Maximum Effective Context Window (MECW), critical structured information - architectural decisions, task transition...
- PAMF: Prior-Aware Multimodal Fusion for Incomplete Time Series Data
In healthcare, multimodal time series tasks often operate on incomplete observations in practice, for example when ECG segments are lost because electrodes detach or an entire respiratory channel is unavailable during overnight monitoring. Such missingness typically appears in two structurally disti...
- Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance
Machine unlearning aims to remove targeted knowledge from a trained model while preserving its general capabilities. For autoregressive language models, not all tokens in a forget sample are equally relevant to forgetting. Existing approaches either ignore this heterogeneity or rely on auxiliary mod...
- LLM Self-Recognition: Steering and Retrieving Activation Signatures
Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in low-entropy scenarios, and that it can be amplified through targeted...
- AIS-Based Vessel Trajectory Prediction Using Memory-Augmented Neural Networks
Accurate vessel trajectory prediction is essential for safe and efficient maritime operations, enabling collision avoidance and supporting route optimization. Although memory-augmented neural networks have recently shown strong performance in pedestrian and road-vehicle trajectory prediction by sele...
- API AI Readiness Scorecard
Is your API ready for AI agents? Get an AI readiness score
- Multi-ResNets for Subspace Preconditioning in Constrained Optimization
We propose MResOpt, a staged residual neural network architecture for constrained optimization problems. Our architecture fits within predict-complete-correct pipelines and decomposes constraint satisfaction by priority through intermediate re-completion and stage-aware losses. The framework enables...
- LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs
Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based ...
- TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models
Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modali...