AI News Archive: May 12, 2026 — Part 17
Sourced from 500+ daily AI sources, scored by relevance.
- Elastic Attention Cores for Scalable Vision Transformers
Vision Transformers (ViTs) achieve strong data-driven scaling by leveraging all-to-all self-attention. However, this flexibility incurs a computational cost that scales quadratically with image resolution, limiting ViTs in high-resolution domains. Underlying this approach is the assumption that pair...
- Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts
Sparse Mixture-of-Experts (SMoE) models enable scaling language models efficiently, but training them remains challenging, as routing can collapse onto few experts and auxiliary load-balancing losses can reduce specialization. Motivated by these hurdles, we study how routing decisions in SMoEs are f...
- Search Your Block Floating Point Scales!
Quantization has emerged as a standard technique for accelerating inference for generative models by enabling faster low-precision computations and reduced memory transfers. Recently, GPU accelerators have added first-class support for microscaling Block Floating Point (BFP) formats. Standard BFP al...
- Environment-Adaptive Preference Optimization for Wildfire Prediction
Predicting rare extreme events such as wildfires from meteorological data requires models that remain reliable under evolving environmental conditions. This problem is inherently long-tailed: wildfire events are rare but high-impact, while most observations correspond to non-fire conditions, causing...
- Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries
Agentic AI governance is a critical component of agentic AI infrastructure ensuring that agents follow their owner's communication and interaction policies, and providing protection against attacks from malicious agents. The state-of-the-art solution, SAGA, assumes a logically centralized point of t...
- Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale
Most learned PDE solvers follow a global-surrogate paradigm: a neural operator is trained to map full problem descriptions to full solution fields for a prescribed distribution of geometries, boundary conditions, and coefficients. This has enabled fast inference within fixed problem families, but li...
- Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
A major recent advance in quantization is given by microscaled 4-bit formats such as NVFP4 and MXFP4, quantizing values into small groups sharing a scale, assuming a fixed floating-point grid. In this paper, we study the following natural extension: assume that, for each group of values, we are free...
- Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds
We study the fundamental and timely problem of learning long sequences in autoregressive modeling and next-token prediction under model misspecification, measured by the joint Kullback--Leibler (KL) divergence. Our goal is to characterize how the sequence horizon \(H\) affects both approximation and...
- Random-Set Graph Neural Networks
Uncertainty quantification has become an important factor in understanding the data representations produced by Graph Neural Networks (GNNs). Despite their predictive capabilities being ever useful across industrial workspaces, the inherent uncertainty induced by the nature of the data is a huge mit...
- LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection
Orthogonal parameter-efficient fine-tuning (PEFT) adapts pretrained weights through structure-preserving multiplicative transformations, but existing methods often conflate two distinct design choices: the subspace in which adaptation occurs and the transformation applied within that subspace. This ...
- A Composite Activation Function for Learning Stable Binary Representations
Activation functions play a central role in neural networks by shaping internal representations. Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neu...
- TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing
Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) remain underexplored. We diagnose that PPO in MTRL suffers from a previously overlooked issue: crit...
- Causal Bias Detection in Generative Artifical Intelligence
Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reason...
- Causal Fairness for Survival Analysis
In the data-driven era, large-scale datasets are routinely collected and analyzed using machine learning (ML) and artificial intelligence (AI) to inform decisions in high-stakes domains such as healthcare, employment, and criminal justice, raising concerns about the fairness behavior of these system...
- A proximal gradient algorithm for composite log-concave sampling
We propose an algorithm to sample from composite log-concave distributions over $\mathbb{R}^d$, i.e., densities of the form $π\propto e^{-f-g}$, assuming access to gradient evaluations of $f$ and a restricted Gaussian oracle (RGO) for $g$. The latter requirement means that we can easily sample from ...
- Online Learning-to-Defer with Varying Experts
Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the f...
- Optimal Policy Learning under Budget and Coverage Constraints
We study optimal policy learning under combined budget and minimum coverage constraints. We show that the problem admits a knapsack-type structure and that the optimal policy can be characterized by an affine threshold rule involving both budget and coverage shadow prices. We establish that the line...
- Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification
Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose to bypass the parameter posterior and focus directly on appro...
- Information-Theoretic Generalization Bounds for Sequential Decision Making
Information-theoretic generalization bounds based on the supersample construction are a central tool for algorithm-dependent generalization analysis in the batch i.i.d.~setting. However, existing supersample conditional mutual information (CMI) bounds do not directly apply to sequential decision-mak...
- Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions
For stochastic process models, parameter inference is often severely bottlenecked by computationally expensive likelihood functions. Simulation-based inference (SBI) bypasses this restriction by constructing amortized surrogate likelihoods, but most SBI methods assume a black-box data generating pro...
- QDSB: Quantized Diffusion Schrödinger Bridges
Learning generative models in settings where the source and target distributions are only specified through unpaired samples is gaining in importance. Here, one frequently-used model are Schrödinger bridges (SB), which represent the most likely evolution between both endpoint distributions. To accel...
- Variance-aware Reward Modeling with Anchor Guidance
Standard Bradley--Terry (BT) reward models are limited when human preferences are pluralistic. Although soft preference labels preserve disagreement information, BT can only express it by shrinking reward margins. Gaussian reward models provide an alternative by jointly predicting a reward mean and ...
- Minimax Rates and Spectral Distillation for Tree Ensembles
Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms, with two main contributions. First, we derive mi...
- One-Step Generative Modeling via Wasserstein Gradient Flows
Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distributi...
- Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces
We study posterior contraction rates for sparse Bayesian Kolmogorov-Arnold networks (KANs) over anisotropic Besov spaces, providing a statistical foundation of KANs from a Bayesian point of view. We show that sparse Bayesian KANs equipped with spike-and-slab-type sparsity priors attain the near-mini...
- Learning U-Statistics with Active Inference
$U$-statistics play a central role in statistical inference. In many modern applications, however, acquiring the labels required for $U$-statistics is costly. Motivated by recent advances in active inference, we develop an active inference framework for $U$-statistics that selectively queries inform...
- Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty
Probabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two prac...
- Adaptive Calibration in Non-Stationary Environments
Making calibrated online predictions is a central challenge in modern AI systems. Much of the existing literature focuses on fully adversarial environments where outcomes may be arbitrary, leading to conservative algorithms that can perform suboptimally in more benign settings, such as when outcomes...
- A Barrier-Metric First-Order Method for Linearly Constrained Bilevel Optimization
We study bilevel optimization with a fixed polyhedral lower feasible set. Such problems are challenging for two reasons: active-set changes can make the upper objective nonsmooth, and existing hypergradient methods typically require lower-Hessian inversions or equivalent linear solves, which are com...
- Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors
We present the Spatial Adapter, a parameter-efficient post-hoc layer that equips any frozen first-stage predictor with a structured spatial representation of its residual field and an induced closed-form spatial covariance. The adapter operates as a cascade second stage on residuals, jointly learnin...
- Causal Algorithmic Recourse: Foundations and Methods
The trustworthiness of AI decision-making systems is increasingly important. A key feature of such systems is the ability to provide recommendations for how an individual may reverse a negative decision, a problem known as algorithmic recourse. Existing approaches treat recourse outcomes as counterf...
- Kelviq
Payments, tax, and billing for SaaS & AI companies
- Open Vibe
Ship your SaaS with AI, without getting stuck
- Jotform Claude App
Build, edit, and analyze forms directly in Claude
- display.dev
Publish agent-generated HTML behind company auth
- Free AI SEO Auditor
Audit your site for the AI search era. 100% Open Source
- MiniCPM-V 4.6
Ultra-efficient 1.3B vision-language model for mobile
- knooth
Screen recording with AI-powered editing for Mac
- Whirr
Ambient agent activity in your notch
- Hopper
First agentic development environment for mainframe/COBOL
- Synci
AI soundtrack platform with full licensing
- Khaos Brain
Local predictive memory for AI agents
- Klyxshot
The screenshot tool that thinks with you
- Auvylo
Turn astrology and Four Pillars into AI personas you talk to
- Whisper Island by Coddo
Voice transcription lives in the Mac notch
- ContractFlow
Professional multi-language contracts in 60 seconds
- MoonSlide
One prompt → research, outline, slides, PPTX. Done.
- Self-becoming
Engineering the self and self-awareness of AI
- Charios
AI-powered 2D Game Character animation tool
- Typoscale
Creates an e-book based on your knowledge using AI agents