AI News Archive: June 15, 2026 — Part 12
Sourced from 500+ daily AI sources, scored by relevance.
- "They screwed us": Personality clashes sent Anthropic's models offline
Anthropic has once again found itself in the Trump administration's crosshairs over an inability to communicate effectively, sources tell Axios. Why it matters: Governing the world's most consequential technology is coming down to speaking President Trump's language. Anthropic failed to "honor" a recent cyber executive order, administration officials claim, and the company's purported failure to take the matter seriously led to its most powerful products being scrubbed from the internet. "Everybody said Anthropic was a bad actor. Some of us said it was time to give them a chance. Now those people are questioning that. They screwed us," an administration official said. Catch up quick: On Thursday, Amazon CEO Andy Jassy called Treasury Secretary Scott Bessent expressing concerns that Anthropic's most powerful models, Mythos and Fable, could be jailbroken. The administration official said Anthropic knew a jailbreak could happen and chose to distribute it anyway: "They came to every fork in the road and took the wrong fork." Anthropic says it received explicit approval from the government to deploy Fable. On Friday night the government imposed stringent export controls that ultimately led Anthropic to take the models offline entirely. Behind the scenes: "Anthropic has not done a great job at trying to speak to the administration and appreciate the ideological differences," one source familiar with the administration's thinking said. "It's like they just speak in different languages," the source said, adding that the company has simply not figured out how to communicate with this administration. The administration first threatened Anthropic with export controls a couple of weeks ago after learning that its cutting-edge Mythos model was made available to an entity in a foreign country with direct ties to the Chinese Communist Party, according to the White House. A source close to Anthropic said the company has always worked closely with the government on expanding Mythos access — and in this case, involving a global telecom company, Anthropic revoked Mythos access without the threat of export controls. Amazon's report raised fresh concerns but Anthropic's "position at the outset was no, we're not going to do anything, this is not a real issue," the source familiar with the administration's thinking said. The source close to Anthropic said the company did not refuse to resolve the issue. Even before this breakdown, a previous fight between Anthropic and the Pentagon also came down in some ways to just not liking the person on the other side of the negotiating table. A White House official told Axios that the Pentagon fight is completely unrelated — but Anthropic's inability to communicate effectively showed up in a similar, unhelpful way. "We never wanted this to happen. Our number one priority is innovation but our hands were tied," the White House official said. The optics added fuel to the fire. Anthropic came out with a blog post dismissing the Amazon report. Then the company enlisted a cybersecurity expert viewed by the administration as a "radical Democrat," who was then celebrated by Chris Krebs, who Trump just fired. The big picture: Anthropic has been the loudest of the frontier AI labs on safety concerns, calling for strong regulation and spooking the Trump administration and the public with their own model's cyber capabilities. The White House led in thawing relations with the embattled company following the Pentagon spat. The technology is moving fast and the government is struggling to catch up, sources said. That — combined with the personality differences — led to a blunt instrument being hastily deployed instead of a scalpel. What's next: "The immediate crisis was averted but longterm we have a problem," an administration official said. The Commerce Department will meet with Anthropic senior tech staffers Logan Graham, Dave Orr and Nicholas Carlini on Monday, officials told Axios. Meetings are also scheduled with the CIA and White House science advisor Michael Kratsios to work through adhering to that cyber executive order . The bottom line: One option is to make sure Anthropic's models can't be jailbroken — though perfect jailbreak resistance may be impossible. Absent that, a source familiar with the administration's thinking said it may simply come down to an attitude fix where, instead of feeling dismissed, "everyone feels safe, secure and happy."
- Anthropic comes to Washington to meet White House officials
The Trump administration blocked foreign nationals’ use of Fable and Mythos, citing security concerns.
- Damage control? Anthropic rushes to Washington amid White House ban on top AI models ahead billion-dollar IPO
Anthropic has reportedly flown senior technical staff to Washington for face-to-face talks with White House officials after US export controls forced the AI company to take its most powerful Claude models offline worldwide.
- Anthropic scrambles after Trump administration freezes its top AI models
Export controls on Fable and Mythos raise doubts over how US will police the most powerful AI systems
- As AI evolves, neccessary coordination on security expands
As AI evolves, neccessary coordination on security expands Healthcare IT News
- US stock markets today: Wall Street rallies, oil tumbles after US and Iran reaches deal; AI and travel stocks jump
Global stock markets surged Monday as a tentative US-Iran ceasefire agreement to reopen the Strait of Hormuz eased inflation fears and boosted oil prices. Major indices like the S&P 500 and Nasdaq saw significant gains, with AI and airline stocks leading the advance. Bond yields also eased as the likelihood of further interest rate hikes diminished.
- Ant Group tests AI assistant for Alipay services
The update is being tested internally, and its release timing has not been set.
- Databricks announces 2026 global partner awards
The Databricks Brickbuilder Partner Network, spanning over 8k partners worldwide,...
- 코딩 AI 넘어 기업 인프라로…오픈AI, 오나 인수로 코덱스 확장 나서
CIO와 CISO는 완전 자율형 AI 에이전트에 업무를 맡긴 뒤 모든 것이 문제없이 진행되기를 기대하는 상황에 대해 다양한 전략적·운영적 우려를 갖고 있다. 에이전트가 중요한 파일을 삭제하기 시작하면 어떻게 될까? 에이전트가 본래 업무에서 벗어나 밤새 불필요한 작업을 수행해 다음 날 아침 팀에 막대한 토큰 사용 비용을 안긴다면 어떨까? 국가 차원의 공격자에게 속아 악의적인 행동을 하게 될 가능성은 없을까? 이러한 우려를 완화하기 위해 오픈AI는 12일 클라우드 개발 환경(CDE) 제공업체 오나 (Ona)를 인수하기로 합의했다고 발표했다. 오나는 과거 깃팟(Gitpod)이라는 이름으로 운영 됐으며 직원 수는 79명이다. 오픈AI는 이번 인수를 통해 에이전트형 AI를 기업 환경에 보다 적합하게 만드는 작업을 가속화할 계획이다. 오픈AI는 보도자료를 통해 “오나의 기술은 에이전트가 장기간에 걸쳐 작업을 수행하는 데 필요한 도구, 시스템, 컨텍스트에 접근할 수 있는 안전하고 지속적인 환경을 제공한다”라며 “오나를 오픈AI에 통합함으로써 코덱스를 특정 기기나 활성 세션에 종속된 작업 범위를 넘어 확장하고, 더 많은 조직이 운영 환경에서 에이전트를 안전하게 배포할 수 있도록 지원할 것”이라고 밝혔다. 오나 CEO 요하네스 란트그라프(Johannes Landgraf)도 비슷한 견해를 공식 보도자료 를 통해 내놨다. 란트그라프는 “오나는 에이전트가 기업 업무를 수행하는 데 필요한 핵심 기반을 제공한다”라며 “고객이 직접 통제하는 신뢰할 수 있는 클라우드 환경에서 여러 기기에 걸쳐 작업을 지속할 수 있으며, 실제 소프트웨어가 존재하는 시스템 내부에서 업무가 수행된다”라고 설명했다. 이어 “오픈AI는 최첨단 AI 역량과 완성도 높은 제품 경험, 그리고 오나가 단독으로는 확보하기 어려운 연구 및 유통 규모를 제공한다”라고 말했다. 란트그라프는 연간 매출 규모를 공개하지는 않았지만 일부 대형 고객을 확보하고 있음을 시사했다. 그는 “올해 초 이후 오나의 주간 에이전트 세션 수는 운영 환경 기준으로 13배 증가했다”라며 “미국에서 가장 오래된 은행, 유럽 대형 제약사, 아시아 주요 국부펀드 등 세계적으로 까다로운 요구사항을 가진 기관에서 사용되고 있다”라고 밝혔다. 또한 “주요 대기업 고객들이 플랫폼 사용을 확대하고 있으며 성장 속도도 그 어느 때보다 빠르다”라고 전했다. 시장조사업체 IDC의 소프트웨어 개발 부문 리서치 부사장 아르날 다야라트나 (Arnal Dayaratna)는 IDC 추산 기준 오나의 2025년 연간 매출이 약 700만 달러(약 96억 원) 수준이라고 말했다. 다야라트나는 2026년 매출이 더 높아졌을 것으로 예상했다. 그는 “1,500만 달러(약 206억 원)라고 가정해 보자. 다소 후하게 잡은 수치일 수 있다. 실제로는 1,000만~1,200만 달러(약 137억~165억 원) 수준일 가능성도 있다”라고 분석했다. 그는 일반적인 기업 인수 가치 산정 방식인 매출의 약 30배를 적용하면 “실제 2026년 매출에 따라 기업가치는 약 4억5,000만~5억 달러(약 6,180억~6,870억 원) 수준이 될 것”이라고 추정했다. 다만 IDC는 인수 금액과 관계없이 이번 거래가 오픈AI에 긍정적인 결정이 될 수 있다고 평가했다. 오픈AI가 자체 개발과 인수 중 하나를 선택해야 하는 전형적인 ‘구매 또는 개발(Buy or Build)’ 과제에 직면해 있었기 때문이다. 다야라트나는 “오픈AI는 코덱스 개발에 상당한 투자를 진행하고 있지만 기업용 자율 에이전트를 안전하게 운영할 수 있는 환경은 부족했다”라며 “이 기술은 현재 오픈AI가 제공하지 못하는 영역이다. 에이전트가 메모리를 유지하면서도 안전하게 작동할 수 있는 보안 환경을 제공한다”라고 설명했다. 이어 “필요한 기술이라는 점은 분명하지만 실제로 얼마나 뛰어난지는 솔직히 잘 모르겠다”라고 평가했다. 가트너의 ‘퍼스트 테이크(First Take) ‘라는 보고서는 이번 인수가 코덱스에 “그동안 부족했던 핵심 확장성”을 제공할 것이라고 평가했다. 다만 기업 입장에서는 쉽지 않은 선택을 요구받게 된다고 지적했다. 가트너는 “소프트웨어 엔지니어링 리더는 특정 벤더의 통합 스택이 제공하는 이점과 벤더 중립성을 유지할 때 얻을 수 있는 유연성 사이에서 신중한 판단을 내려야 한다”라고 분석했다. 또한 가트너는 “이번 인수는 오픈AI가 2026년 5월부터 클로드 매니지드 에이전트(Claude Managed Agents)에 셀프호스팅 샌드박스를 지원하기 시작한 앤트로픽에 대응하기 위한 움직임으로 보인다”라고 덧붙였다. 코니퍼스AI(Conifers.ai)의 CEO 톰 핀들링 (Tom Findling) 역시 이번 거래에서 앤트로픽에 대한 견제가 중요한 역할을 했다고 분석했다. 핀들링은 “클로드 코드(Claude Code)가 개발자와 기업 고객 사이에서 빠르게 확산되는 상황에서 앤트로픽에 대한 압박을 유지하려는 움직임처럼 보인다”라며 “이번 거래는 오픈AI가 소규모 경쟁사를 제거하려는 것이라기보다 앤트로픽이 지나치게 앞서 나가기 전에 코덱스를 기업용 플랫폼으로 완성하려는 시도로 해석된다”라고 말했다. 그는 “기업 시장에서의 경쟁은 가장 뛰어난 코딩 모델을 보유하는 데 있지 않다”라며 “대기업이 실제로 배포할 수 있을 만큼 AI 에이전트를 안전하고 유용하게 만드는 것이 핵심”이라고 설명했다. 이어 “이번 인수가 오픈AI가 코덱스의 코딩 성능을 개선하기 위해 외부 도움을 필요로 한다는 의미는 아니다”라며 “더 중요한 과제는 보안, 접근 제어, 지속형 클라우드 워크스페이스, 감사 추적 기능, 기존 개발자 워크플로우와의 통합이 중요한 실제 기업 환경에서 코덱스를 제대로 작동하게 만드는 것”이라고 분석했다. 그는 “오나는 오픈AI가 부족했던 이러한 기반 인프라 일부를 제공한다”라고 평가했다. 무어 인사이트 앤드 스트래티지(Moor Insights & Strategy)의 수석 애널리스트 제이슨 앤더슨 (Jason Andersen)도 앤트로픽에 대한 우려를 나타냈다. 앤더슨은 “솔직히 말해 이번 거래는 오픈AI와 코덱스가 현재 시장을 주도하고 있는 앤트로픽과 클로드 코드(Claude Code)에 상당한 주도권을 내줬다는 기존 생각을 더욱 확신하게 만든다”라고 평가했다. 이어 “다만 이번 거래는 현재 시장 점유율에 관한 문제가 아니다”라며 “특히 MS가 기업용 코딩 인프라 전략을 강화하는 상황에서 오픈AI가 단순한 모델 제공업체를 넘어 어떤 위치를 차지할 것인가에 관한 문제”라고 분석했다. 앤더슨은 무어가 이번 거래의 재무적 규모를 추정할 만한 근거를 갖고 있지는 않다고 밝혔다. 다만 “비교적 작은 매출 기반에 높은 배수가 적용된 거래였을 것으로 본다”라며 “구체적인 금액을 추정하고 싶지는 않지만, 오나가 확보한 기업 고객을 고려하면 시장 예상보다 더 큰 규모일 수 있다”라고 말했다. 그는 또한 오픈AI가 목표를 달성하기 위해서는 추가적인 지원이 필요할 것이라는 점도 강조했다. 앤더슨은 “여전히 AI 도입이 가장 활발한 분야는 코딩이며 다른 활용 사례는 상대적으로 발전 속도가 더디다”라며 “오픈AI와 같은 범용 AI 기업이라면 개발 관련 활용 사례에 더욱 집중해야 한다”라고 설명했다. 이어 “실질적인 투자와 지출은 기업 시장에서 이뤄지고 있으며, 이러한 고객은 코덱스나 클로드 코드가 현재 제공할 수 있는 수준보다 더 높은 거버넌스와 보안 요구사항을 갖고 있다”라고 지적했다. 그는 “현재 전통적인 소프트웨어 기업과 클라우드 기업들이 인기 AI 모델을 중심으로 코딩 및 운영 인프라를 구축하고 있다”라며 “이 같은 경쟁 심화는 토큰 판매에는 도움이 될 수 있지만, 오픈AI와 앤트로픽을 여전히 핵심 기업 시장의 주변부에 머물게 하고 있다”라고 분석했다. 이어 “오픈AI와 앤트로픽은 보다 강력한 기업용 개발 플랫폼 전략을 마련해야 한다”라며 “그렇지 않으면 쉽게 대체 가능한 또 하나의 모델에 불과할 수 있다”라고 경고했다. 인포테크 리서치 그룹(Info-Tech Research Group)의 수석 디렉터 제러미 로버츠 (Jeremy Roberts)도 이번 인수가 오픈AI에 긍정적인 결정이 될 것으로 전망했다. 로버츠는 “오픈AI가 점차 성숙해지고 있다”라며 “일부 측면에서는 앤트로픽에 뒤처지고 있을 가능성도 있다”라고 말했다. 이어 “오나는 화려한 기업은 아니지만 그렇다고 부정적인 의미는 아니다”라며 “눈에 띄지는 않지만 반드시 필요한 기업”이라고 평가했다. 그는 오나가 기업이 자체 가상 프라이빗 클라우드(VPC)에서 실행할 수 있는 코덱스용 워크스페이스를 제공한다고 설명했다. 이 환경은 거버넌스와 지속성을 지원하며 로그 관리, 자격 증명 관리, 리소스 접근 제어 등 기업이 자체 정책을 적용할 수 있도록 설계됐다. 로버츠는 “오나는 에이전트가 작업을 수행하는 공간을 제공한다”라며 “IT 부서는 이 환경에서 접근 권한이 적절히 인증되고 효과적으로 통제되는지 확인해 모델이 허용되지 않은 작업을 수행하지 못하도록 관리할 수 있다”라고 설명했다. 또한 이러한 통제에는 읽기·쓰기 권한 관리도 포함된다고 덧붙였다. dl-ciokorea@foundryco.com
- 🎙️ How I AI: Claude Fable 5 review & How Braintrust uses AI agents, evals, and CI to ship better software
Your weekly listens from How I AI, part of the Lenny's Podcast Network
- US Government Reportedly Allowing Federal Data Center Rules to Expire
A federal law setting standards for government data centers is set to expire this year with no clear replacement.
- FBI takes out huge AI-powered phishing service: Outsider Enterprise was using over a million phishing URLs to steal credit card data and passwords
Servers, Telegram bots, and money, all seized by the authorities.
- New attack turned Microsoft 365 Copilot into 1-click data theft tool
A critical vulnerability chain dubbed SearchLeak in Microsoft 365 Copilot Enterprise could allow attackers to steal sensitive data from a target's mailbox, OneDrive, or SharePoint account through a specially crafted URL. [...]
- What is the hottest Gen-Z tech trend? Anti-AI
What is the hottest Gen-Z tech trend? Anti-AI The Japan Times
- Courts cracking down on error-strewn AI-assisted legal briefs
Courts cracking down on error-strewn AI-assisted legal briefs The Straits Times
- Courts cracking down on error-strewn AI-assisted legal briefs
When a U.S. judge found fabricated quotes in a lawyer's brief earlier this year, the attorney admitted he had used Claude, an artificial intelligence chatbot, to write the document.
- OpenAI Under Investigation by Group of State Attorneys General, Source Says
A coalition of U.S. state attorneys general has opened a sweeping investigation into OpenAI, a source familiar with the matter said on Friday. The ChatGPT maker was served on Friday with a subpoena seeking documents related to a wide range …
- KPMG Allegedly Published AI Report Filled With Hallucinations
KPMG Allegedly Published AI Report Filled With Hallucinations PCMag Australia
- Climate crisis is changing when plants flower, artificial intelligence study finds
A global study using AI to analyse eight million digitalised plant specimens dating back a century revealed flowering has shifted by 2.5 days earlier or later per decade on average
- Hate talking to AI customer support? 70% of Americans say help from a real human should be a legal right
Three-quarters of Americans want to be told when they’re interacting with AI
- AI hiring in Ireland doubles as adoption accelerates
New research shows that AI is rapidly reshaping the skills employers want most from workers - increasing the emphasis on human skills such as judgement, creativity and leadership.
- Trump tried to block state AI regulations, but some states are forging ahead
Trump tried to block state AI regulations, but some states are forging ahead Austin American-Statesman
- As AI cameras scan for wildfires human lookouts still stand guard
As AI cameras scan for wildfires human lookouts still stand guard azcentral.com and The Arizona Republic
- A 13-word Reddit comment can trick AI search into recommending scams, researchers find
A 13-word Reddit comment can trick AI search into recommending scams, researchers find Tom's Guide
- Google’s Android coding tests reveal an unexpected Gemini 3.5 Flash weakness
Google's new Gemini 3.5 Flash is a pricey downgrade for Android devs.
- Gemini suddenly can’t make calls on Android and Android Auto for some
The transition to Gemini on Android Auto has been a bit rough for a number of reasons, but a current bug has left some users unable to make calls due to a strange error, and it’s not just an issue behind the wheel.
- AI schools like Alpha promise efficiency, but can't replicate the messy process that helps kids learn
A child at a playground tries to climb, jump or negotiate with a peer, and the attempt does not work. They fall, get left out of a game or reach another impasse. Then they try again.
- OpenAI hit with multistate probe into possible user harm as its IPO looms
OpenAI received a subpoena from several states as part of a probe into the safety of users of its chatbot as it prepares to offer stock to the public for the first time.
- ETRI develops autonomous 6G core network powered by AI
ETRI develops autonomous 6G core network powered by AI EurekAlert!
- Robotic pet rabbit created that recognizes who hugs it by their voice
Robotic pet rabbit created that recognizes who hugs it by their voice EurekAlert!
- AI and digitisation transform fight against global extinction, landmark report reveals
AI and digitisation transform fight against global extinction, landmark report reveals EurekAlert!
- Young People Turn to AI for Mental Health Support
A new study demonstrates that 18% of college students use generative AI for mental health support, with usage rates doubling among students suffering from severe anxiety, depression, and suicidality.
- Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter
Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning ...
- Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences
In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and t...
- Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures
Do different LLM architectures encode high-level concepts in structurally compatible ways? We systematically characterize a geometric-functional universality dissociation: across multiple concept domains and architectural families, moderate geometric convergence coexists with near-perfect functional...
- Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering
Aggregate accuracy benchmarks conceal a systematic structure in how large language models fail at electronic health record (EHR) question answering: questions requiring more inferential steps produce disproportionately more errors. Motivated by theoretical results on transformer compositionality lim...
- Revisiting the Systematicity in Negation in the Era of In-Context Learning
Understanding the meaning of negated sentences remains one of the challenges for language models, even in the era of large language models (LLMs). We analyze systematicity regarding LLM understanding of negation from two perspectives: behavioral systematicity and representational systematicity. For ...
- Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens
Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and remasking tokens, they typically operate within a mixed-quality context. T...
- Robust Dual-Signal Fusion: Hybrid Neuro-Symbolic Gating with Compressed Chain-of-Thought Refinement for Irony Detection in Social Media Texts
Large Language Models (LLMs) natively default to literal semantic interpretations, making zero-shot irony detection a persistent challenge. We introduce the Robust Dual-Signal (RDS) Fusion framework, a hybrid neuro-symbolic architecture that compresses Chain-of-Thought (CoT) reasoning trajectories w...
- Data-Driven Decoding of Russell's Circumplex Model of Affect
Affective computing increasingly relies on deep learning to represent emotions, yet latent spaces often remain opaque, high-dimensional black boxes. This paper investigates whether Transformers' embeddings recover the geometric regularities of Russell's circumplex model. We unify two complementary e...
- Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models
Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this, we introduce Exper...
- How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation
Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corru...
- Understanding the Behaviors of Environment-aware Information Retrieval
Recent retrieval-augmented generation (RAG) approaches have demonstrated strong capability in handling complex queries, yet current research overlooks a critical challenge: different retrievers require fundamentally different query formulation strategies for optimal performance. In this work, we pre...
- Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
For the development of Large language models (LLMs), recent approaches to generating pseudo intermediate reasoning have shown remarkable progress. But they typically rely on large numbers of correctly annotated answers to assess reasoning quality. This paper presents a semi-supervised framework that...
- LLM-based Visual Code Completion for Aerospace Geometric Design
Recent advances in both Large Language Models (LLMs) and Vision Language Models (VLMs) have seen a step change in their ability to perform visual code completion, but the aerospace industry, which prioritizes safety and explainabilty over rapid LLM adoption, currently has no publicly announced LLM-b...
- The Art of Mixology: Mixup-based Obfuscation for Privacy-Preserving Split Learning in Large Language Models
Split learning provides a practical paradigm for resource-constrained users to train Large Language Models (LLMs) by offloading computation-intensive layers to a server while keeping raw data local. However, existing privacy-preserving split learning methods still face a difficult trade-off among ut...
- OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models
Equipping Large Language Model (LLM) agents with effective skills is crucial for solving complex tasks in real-world systems like OpenClaw. In this work, we aim to develop a framework that automatically constructs such reusable skills to enhance LLMs in tool use, multi-step reasoning, and dynamic en...
- P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs
As Large Language Models (LLMs) become embedded in everyday communication, capturing regional linguistic variation is essential for reliable and equitable language use. In Portuguese, European (pt-PT) and Brazilian (pt-BR) varieties remain unevenly represented, with pt-BR dominating in data quantity...
- Misinformation Propagation in Benign Multi-Agent Systems
Multi-agent systems, in which multiple large language model agents solve problems through turn-based interaction, are increasingly deployed in high-stakes settings such as medical diagnosis, legal analysis, and forensic decision-making. Their reliability can be at risk when single agents reason from...
- Progressive Knowledge-Guided Large Language Model Framework for Bearing Fault Diagnosis
Vibration-based bearing fault diagnosis requires resolving three interrelated measurement challenges, including the trade-off between global statistical feature efficiency and local transient signal fidelity, insufficient traceability of measurement features to underlying fault physics, and ineffect...