AI News Archive: May 13, 2026 — Part 15
Sourced from 500+ daily AI sources, scored by relevance.
- AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation
Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch passes the tests. This outcome-only view treats a principled solution and a chaotic trial-and-error process as equivalent. We show that this equivalence is empirically false. We evaluate 2,614 Op...
- Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education
Students learning algorithms often need support as they interpret traces, debug reasoning errors, and apply procedures across unfamiliar problem instances. In this paper, we present KITE (Knowledge-Informed Tutoring Engine), a Retrieval-Augmented Generation (RAG)-based intelligent tutoring system de...
- EcoGEO: Trajectory-Aware Evidence Ecosystems for Web-Enabled LLM Search Agents
Web-enabled LLM agents are changing how online information influences search outcomes. \ Existing Generative Engine Optimization (GEO) studies mainly focus on individual webpages. \ However, agentic web search is not a single-document setting: an agent may issue queries, crawl pages, follow links, r...
- A Standardized Re-evaluation of Conversational Recommender Systems on the ReDial Dataset
Recent years have seen a surge of research into conversational recommender systems (CRS). Among existing datasets, ReDial is the most widely used benchmark, cited in hundreds of studies. However, variations in how the dataset is preprocessed and used in experiments, particularly in the definition of...
- Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings
A scene of two people in the rain can convey hope and warmth in a reunion story or sorrow and finality in a farewell story. We investigate this context-dependent nature of image meaning and its implications for retrieval. Our key observation is that context dependency correlates with semantic abstra...
- MechAInistic: An LLM-guided Multi-Agent System for Reasoning over Genome-Scale Constraint-Based Metabolic Models
Constraint-based metabolic modeling is a powerful way to study the mechanistic basis of cellular states and disease, but its effective use demands substantial computational expertise and careful coordination of multi-step analyses. We developed MechAInistic to lower this barrier and enable researchers to ask complex biological questions in natural language. Harnessing large language models, MechAInistic is a multi-agent system organized around an Architect-Reviewer pattern that transforms a natural-language question into an executable, model-grounded workflow and generates a structured report. The system supports a variety of tasks, including pathway comparison, perturbation analysis, drug-target exploration, and literature-grounded interpretation across paired metabolic model states. We tested MechAInistic on two drug-repurposing use cases. For Naive B cells from Rheumatoid Arthritis (RA) paired with healthy controls, the system quantified the metabolic rewiring driving disease, prior
- An explainable machine learning consensus framework for robust estimations of environmental effects on population dynamics
Explainable machine learning (ML) methods are gaining increasing attention in environmental and ecological research for their ability to reveal relationships between environmental drivers and population dynamics. However, there remain questions on the reliability of these tools, especially given recent research shows that these explanations can be highly sensitive to model architecture. In ecology, it is typical to use a single ML model, and a comparative evaluation of sensitivity of explainability for different ML approaches is overlooked. In this paper, we develop a novel framework that quantifies explanation consistency between multiple ML model architectures. This framework provides a discrepancy measure for each model prediction, with high discrepancy indicating substantive explanation disagreement across models and low discrepancy indicating strong consensus in explanations across models. We then demonstrate that low explanation discrepancy aligns well with ground truth mechanism
- Pretraining Objective Shapes Cross-Category Generalization in Affective Image Prediction: A Geometric Comparison of Vision Transformer Encoders
The geometry of representations learned by deep neural networks is shaped jointly by architecture and pretraining objective, yet disentangling these two factors remains difficult. Here we isolate the contribution of pretraining objective by comparing two Vision Transformers from the same backbone family but trained under different objectives: language-image contrastive learning (CLIP) and ImageNet-21k classification. Using continuous Valence-Arousal prediction on the OASIS dataset as a probe of representational quality, we evaluated frozen features under Leave-One-Theme-Out and Leave-One-Category-Out cross-validation, the latter requiring extrapolation to entirely unseen semantic categories. The contrastively pretrained encoder generalized substantially better than the classification-pretrained encoder under both protocols, with the gap widening sharply when held-out categories required cross-category generalization. To characterize why the two representations differ, we developed a ge
- Causally validated phase-amplitude coupling enables high-fidelity motor decoding for next-generation brain-computer interfaces
Modern Brain-Computer Interfaces (BCIs) face a fundamental performance plateau due to their reliance on broad spectral power reduction (Event-Related Desynchronization, ERD) as the primary decoding feature. ERD acts as a coarse metabolic proxy for cortical activation, discarding the high-frequency temporal syntax necessary for high-dimensional motor control. Here, we demonstrate that Phase-Amplitude Coupling (PAC)--specifically, the phase-locking of high-gamma (70-150 Hz) firing to residual Beta oscillations provides a high-fidelity temporal feature for decoding motor intent. Using high-density electrocorticography (HD-ECoG) during motor imagery, we show that incorporating Beta-PAC into machine-learning classifiers (LDA/SVM) significantly outperforms traditional power-based features. To definitively validate the causal robustness of this feature against mathematically-derived artifacts, we leveraged a rare in vivo human structural lesion model. In a patient with focal tumor infiltratio
- Hippocampal brain-machine interface-based navigation reveals CA1 representations of intended actions
The hippocampus uses external stimuli and self-motion to construct cognitive maps critical for navigation. However, these maps can also be activated independently of external inputs and movements, reflecting an internal ability to control the map. The neural basis of such internal control remains unknown. To address this, we used a brain-machine interface (BMI) to drive navigation in mice directly from real-time hippocampal activity. In this condition, CA1 place codes rapidly reconfigured to disregard locomotion-related input. By comparing BMI-controlled navigation, locomotion-controlled navigation, and passive playback of predetermined routes, we found evidence of CA1 place cell responses that emerge specifically in conditions in which animals can causally influence their travel. Our findings thus indicate that agency is represented by a distinct place cell code.
- From naive to foundation: benchmarking models for epidemic forecasting
We systematically evaluate and compare the performance of classical statistical methods (ARIMA), mechanistic compartmental models (SEIR), modern deep learning architectures (LSTM, DLinear, Autoformer), and an emerging time-series foundation model (TabPFN-TS) to forecasts the incidence of Influenza-Like Illness (ILI) across nine European countries. The models are benchmarked against a naive baseline and a multi-model ensemble (RespiCast) created by an initiative of the ECDC. In line with the operational practice of existing forecasting hubs, our entire evaluation is explicitly optimized for short-term horizons (1 to 4 weeks ahead). Interestingly, we found that the foundation model TabPFN-TS allows for great zero-shot inference capabilities. Without any task-specific retraining, it successfully overcomes extreme data scarcity to consistently outperform all other individual architectures, frequently rivalling or surpassing the RespiCast ensemble. Our results highlight how deep learning ar
- Improving machine learning and deep learning models for 30-day ICU readmission prediction using Ensemble Bayesian Model Averaging
Intensive Care Unit (ICU) readmissions are associated with adverse clinical outcomes and increased healthcare costs. Although existing models for predicting 30-day ICU readmission show high predictive performance, they fail to account for model uncertainty, potentially resulting in overconfident and unreliable decision-making. We propose a novel Ensemble Bayesian Model Averaging (EBMA)-based framework which balances predictive discrimination with uncertainty by penalizing models that are confident but incorrect. It achieved excellent calibration (Brier score = 0.051), while maintaining discriminatory performance comparable to or exceeding that of the best individual models (AUROC > 0.716). These findings suggest that our EBMA-based framework provides a more robust and clinically reliable approach for ICU readmission prediction and decision support.
- Video-based Detection of Delirium in Hospitalized Adults
Delirium, a dynamic neuropsychiatric condition associated with morbidity and mortality, remains underdiagnosed due to reliance on subjective, intermittent screening tools. Objective and potentially continuous identification is needed to improve clinical care. We developed and validated an analytic framework for delirium classification based on automatically extracted video features. In this prospective cohort study, patients (>=18 years) admitted to the inpatient medical or neurological ward of a tertiary academic center between August 2020 and March 2022 with an expected stay longer than one night were enrolled. Daily structured delirium assessments and brief video recordings were performed in consenting patients. Videos were analyzed using deep learning pose estimation to extract keypoints and calculate behavioral features based on eye, face, and limb postures and movements. Four machine learning models (logistic regression, gradient boosting, support vector machines, and random fore
- Prediction of Pivot Shift Grade Using In-Vivo Ultrasound Bone Tracking During Sit-Stand-Sit: A Machine Learning Feasibility Study
Background: The pivot shift (PS) test is the most specific clinical examination for anterolateral rotational instability in ACL deficient knees, yet grading remains subjective, as evidenced by poor interobserver reliability, particularly for Grade 2. Since low grade (Grade 1) versus high grade (Grades 2/3) PS is the threshold for recommending lateral extra articular augmentation, performing the test in awake clinic patients limits grading reproducibility and introduces variability in surgical decision making. Existing methods to quantify the pivot shift usually require examiner performed testing under general anaesthesia. No prior approach has ascertained PS grading from a separate patient performed functional movement. Purpose: To evaluate the feasibility of a machine learning (ML) classifier, trained on kinematic ultrasound bone tracking signals acquired during a patient sit stand sit (SSS) knee movement, to predict their PS grade, and to clinically validate its ability to differenti
- SIGNAL: A Scalable, Real-World Model for Rapid Intraoperative Molecular Classification of Gliomas Using Stimulated Raman Histology
Background: Previous machine learning models to intraoperatively predict the molecular status of gliomas using stimulated Raman histology (SRH), such as DeepGlioma, have achieved high performance (91.5% accuracy) on curated datasets. However, when used intraoperatively, DeepGlioma (162M parameters) runs slowly on current SRH hardware and underperforms due to its lack of an image rejection mechanism and its validation on curated images. Here, we introduce SRH-Informed Glioma classificatioN with Attention Learning (SIGNAL) (27M parameters), a lighter model with a built-in attention-based rejection mechanism that outperforms DeepGlioma on uncurated clinical datasets. Methods: SIGNAL was developed using 1.56 million SRH fields-of-view from 967 adult diffuse glioma patients collected between December 2017 and July 2025. We used 412 patients from NYU for training and internal validation and a multi-institutional, international cohort of 555 patients for testing. SIGNAL uses a ResNet50 backbo
- Structured large language model extraction of clinical factors from electronic health record text supports scalable psychiatric severity prediction
Background: Mental health systems face escalating demand that exceeds clinician capacity, making accurate severity-based triage a critical bottleneck. Severity assessment guides treatment intensity, resource allocation, and risk management, yet most clinically relevant information remains embedded in unstructured electronic health record (EHR) narratives, limiting its utility for scalable decision support. Objectives: This study evaluates whether a single large language model (LLM) can autonomously extract clinical factors from psychiatric EHR narratives, derive predictive weights from those factors, and use the resulting structured representation to predict clinician-implied severity at scale. Methods: From a Mayo Clinic repository of more than 2.7 million encounters, 15,000 de-identified psychiatric notes were sampled into a 5,000-patient discovery cohort and a 10,000-patient replication cohort. The same LLM (Llama 3 8B Instruct) extracted 17 background clinical factors and 3 treatme
- Benchmarking foundation models for improving confounding control in target trial emulation
Machine learning models for causal inference aim to adjust for confounding factors that are associated with both an exposure and an outcome, creating a spurious biased association. But, these methods are rarely empirically evaluated to assess their success in mitigating such bias. Recent advances in knowledge representation, including both foundation models and knowledge graphs, could enrich these models, but rigorous evaluations are needed in order to assess their potential. Here, we ask whether enriching existing causal inference models with knowledge representations from foundation models can improve confounding control. Rather than using semi-simulated data to address this question, we focus on examples of real confounding: we emulate target randomized active comparator trials that are subject to confounding by indication. Our results can guide researchers aiming to develop or apply methods for discovering causal effects from observational data.
- Molecular Methods to Detect Vibrio cholerae and Associated Bacteriophages among Diarrheal Patients in Bangladesh
Molecular diagnostics to detect Vibrio cholerae (Vc) may be negatively impacted by pathogen-specific lytic bacteriophage (phage) predation. To address this problem, phage detection as a proxy for pathogen detection has been proposed. However, efforts to modernize cholera diagnostics with molecular tools require addressing knowledge gaps on best practices to detect Vc and associated bacteriophages. We conducted polymerase chain reaction (PCR), quantitative PCR (qPCR), and nano-liter (nl) qPCR targeting Vc and known phages (ICP1/2/3) on stool samples collected from patients admitted at hospitals across Bangladesh. Of 4,975 patients enrolled, 2,574 diarrheal samples were collected and over 65,000 reactions were conducted, including replicates. We analyzed the results for target-specific assay alignment and then used machine learning to determine the effect of phage predation on Vc-assay alignment. Standard curve analyses were used to set qPCR-positivity thresholds at 7.3x105 CFU/mL for Vc
- A transformer model explaining mechanisms of drug therapeutic and adverse effects
Understanding which disease genes are altered by a drug can provide insight into the biology of effect, help us understand adverse drug effects, and suggest new drug uses. Here, we build on our model Draphnet in a new formulation with a similar goal. Draphnet was designed to explain drug therapeutic and side effects by learning a network connecting drugs to the disease genes they alter. Our new model, DraPhormer, has a similar goal but instead of relying on a linear model, learning of drug to gene connections uses a transformer model. DraPhormer integrates drug molecular data, disease genetics, and known drug effects on diseases, along with language models representing all of these entities. We show in simulations that DraPhormer can explain the genetic mechanisms of drug effects. Then, we present our design for incorporating drug and disease biology into the model. Finally, we benchmark the model's ability to learn drug indications and side effects in real data.
- Synonym Augmentation for Rare Disease Identification in Unstructured Data
The significant challenges associated with rare diseases in the medical and research domains include the scarcity of information, which is often confined to unstructured formats. Although existing approaches provide valuable insights, there is a need to develop effective methods to identify information pertinent to rare diseases for advancing rare disease research. We identified mentions of rare diseases in relevant texts and assessed their relevance using derived scores, the confidence score and semantic similarity from a fine-tuned BioMedBERT encoder. This encoder was fine-tuned using rare disease related text from Online Mendelian Inheritance in Man (OMIM), Orphanet, a manually validated dataset, and STS benchmark datasets. The process of identifying meaningful rare disease mentioned was presented through two case studies that retrieved relevant NIH-funded projects, utilizing a generated knowledge graph in Neo4j to host data on 2,067 GARD diseases with over 320,000 NIH funded projec
- Anduril lands $5B as defense giant builds autonomous warship operation in Seattle
Anduril, which has established operations along the Lake Washington Ship Canal, said the financing will fuel aggressive investments in manufacturing capacity, R&D and infrastructure needed to produce advanced defense systems. Read More
- Facebook to Transcript
Transform Facebook videos into text
- Sam Altman rejects Musk's "stolen charity" claims in court showdown
OpenAI CEO Sam Altman's first turn on the witness stand Tuesday sharpened the central fight in Elon Musk's lawsuit: whether either man can be trusted to put AI safety ahead of money and control . Why it matters: The testimony showed how hard it is for any AI leader to claim the moral high ground. Driving the news : Altman rejected Musk's central claim that OpenAI and Microsoft had effectively tried to "steal a charity." "It feels difficult to even wrap my head around that framing," Altman said. He argued that shifting to a for-profit structure was the only way to raise the amount of money needed to develop safe and powerful artificial intelligence. Earlier in the trial, Musk offered a competing narrative, casting himself as the defender of OpenAI's original safety mission. Altman testified that Musk wanted to profit from OpenAI and also to control it, citing Musk's early push for a controlling stake or a merger with Tesla. Altman also said Musk wanted that control to pass to his childr
- Fervo rides data center energy boom with 35% IPO pop
Fervo rides data center energy boom with 35% IPO pop PitchBook
- Rivian spinoff Mind Robotics raises another $400M
Mind Robotics, which was first revealed in late 2025, has now raised more than $1 billion to date.
- ExpenseMind
AI-powered expense tracking that helps you save money
- Venue Visualizer
See your event venue transformed before you book it
- PoYo.ai
Premium AI models through one unified API.
- Undetectr
Remove AI artifacts from music for distribution.
- Mythx.AI
Create immersive AI-powered roleplay adventures with instant image generation.
- cvlift.ai
Write a CV too good to get ghosted
- CodeRabbit
Enhanced code review for improved workflow and quality.
- Automateed
Create eBooks effortlessly with Automateed, the AI-powered writing tool.
- Vireel AI | Image to Image Generator
Transform images with AI-powered creative editing tools.
- Multi-Agent Systems in Emergency Departments: Validation Study on a ED Digital Twin
Emergency departments (ED) face challenges in patient care and resource management. We propose to explore optimization strategies in a realistic and flexible model and develop a hybrid Discrete Event Simulation (DES) and Agent-Based Model (ABM) simulating highly configurable ED environments. We spec...
- OpenAI Brings Its Ass to Court
In Musk v. Altman, the company claimed a remarkable trophy was physical proof of Elon Musk’s concerning behavior.
- OpenAI chief Altman has over $2 billion stake in companies that dealt with OpenAI, court filing shows
OpenAI chief Altman has over $2 billion stake in companies that dealt with OpenAI, court filing shows Reuters
- Altman details Musk's OpenAI fallout, says nonprofit was 'left for dead'
Musk has accused Altman and another OpenAI co-founder of trying to "steal a charity"
- Inflation jumps, Altman's testimony, Huang joins Trump's China visit and more in Morning Squawk
Here are five key things investors need to know to start the trading day.
- OpenAI exec recalls 'tense exchange' where Elon Musk called him a 'jackass'
OpenAI exec recalls 'tense exchange' where Elon Musk called him a 'jackass' Business Insider
- Rivian spinout Mind Robotics lands $400M to push AI robots onto factory floors
Mind Robotics Inc., an industrial robotics startup founded by Rivian Automotive Inc. Chief Executive RJ Scaringe, said today it has raised $400 million in new funding to put more of its artificial intelligence-powered robots onto factory floors. The company is building robots that can handle the kind of fiddly, judgment-based work on a production line […] The post Rivian spinout Mind Robotics lands $400M to push AI robots onto factory floors appeared first on SiliconANGLE .
- Amazon launches an AI shopping assistant for the search bar, powered by Alexa+
Alexa for Shopping offers a voice- and touch-enabled shopping experience across mobile, desktop, and Echo Show smart displays. Alexa for Shopping provides more personalized recommendations and automates the shopping experience across Amazon and other online retailers.
- Amazon Drops ‘Rufus’ Branding on Shopping Chatbot, Adds AI in Search
Amazon Drops ‘Rufus’ Branding on Shopping Chatbot, Adds AI in Search The Information
- Amazon puts Alexa inside the search bar as agentic commerce heats up
The unified Alexa for Shopping assistant absorbs Rufus and arrives in the main search flow as Amazon sues to keep external AI agents like Perplexity’s Comet off its marketplace. Amazon is moving its AI shopping assistant into the main search bar. Starting this week, US customers typing into the search field on Amazon.com or in […] This story continues at The Next Web
- Amazon Brings AI Update to Its Shopping Search Bar
The tool generates product comparisons or suggestions and will appear by default starting this week for US-based users.
- Amazon unifies Alexa+ and Rufus as AI rivals move into online shopping
Amazon announced Alexa for Shopping, merging its Rufus e-commerce chatbot with Alexa+ into a unified experience, aiming to outdo ChatGPT and other general-purpose AI assistants for shopping. Read More
- Amazon Ditches Rufus for New AI Shopping Assistant
Amazon Ditches Rufus for New AI Shopping Assistant Barron's
- Alexa Replaces Rufus as Amazon's AI Shopping Assistant
The new Alexa for Shopping feature can use data about the customer to help find products.
- Alexa for Shopping is a chatty new AI assistant with some cool tricks to make you spend at Amazon
After years of using Alexa to answer questions, control smart homes, play music, and handle everyday tasks, Amazon has found a more obvious job for it. Alexa is becoming your personal shopper, meant to help you find what you need faster and get it into your cart with fewer second thoughts. Amazon is rolling out […]
- Amazon puts Alexa inside the shopping search bar in AI push
Amazon puts Alexa inside the shopping search bar in AI push The Mercury News