AI News Today — Part 21
May 7, 2026 · Sourced from 500+ daily AI sources, scored by relevance.
- On the Safety of Graph Representation Learning
Graph representation learning (GRL) has evolved from topology-only graph embeddings to task-specific supervised GNNs, and more recently to reusable representations and graph foundation models (GFMs). However, existing evaluations mainly measure clean transfer, adaptation, and task coverage. It remai...
- CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification
The rapid expansion of the Internet of Things (IoT) and Industrial IoT (IIoT) has created a massive, heterogeneous attack surface that challenges traditional network security mechanisms. While Federated Learning (FL) offers a privacy-preserving alternative to centralized Intrusion Detection Systems ...
- Criticality and Saturation in Orthogonal Neural Networks
It has been known for a long time that initializing weight matrices to be orthogonal instead of having i.i.d. Gaussian components can improve training performance. This phenomenon can be analyzed using finite-width corrections, where the infinite-width statistics are supplemented by a power series i...
- Feature Dimensionality Outweighs Model Complexity in Breast Cancer Subtype Classification Using TCGA-BRCA Gene Expression Data
Accurate classification of breast cancer subtypes from gene expression data is critical for diagnosis and treatment selection. However, such datasets are characterized by high dimensionality and limited sample size, posing challenges for machine learning models. In this study, we evaluate the impa...
- Sequential Design of Genetic Circuits Under Uncertainty With Reinforcement Learning
The design of biological systems is hindered by uncertainty arising from both intrinsic stochasticity of biomolecular reactions and variability across laboratory or experimental conditions. In this work, we present a sequential framework to optimize genetic circuits under both forms of uncertainty. ...
- LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation
Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and perceptual quality but are designed for hum...
- Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization
Weight decay is widely used as a regularizer in large language models, yet its precise role in shaping Transformer loss landscapes remains theoretically underexplored. This paper provides the first rigorous functional-analytic characterization of the standard Transformer objective--cross-entropy los...
- BRICKS: Compositional Neural Markov Kernels for Zero-Shot Radiation-Matter Simulation
We introduce a new strategy for compositional neural surrogates for radiation-matter interactions, a key task spanning domains from particle physics through nuclear and space engineering to medical physics. Exploiting the locality and the Markov nature of particle interactions, we create a \emph{nex...
- SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales...
- Diverse Sampling in Diffusion Models with Marginal Preserving Particle Guidance
We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symmetries of the Fokker-Planck equation, using drift perturbations that...
- Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability
Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, their outputs are biased, and in low-temp...
- A Geometry-Aware Residual Correction of Hagan's SABR Implied Volatility Formula
This paper proposes a hybrid methodology to improve the approximation of SABR (Stochastic Alpha Beta Rho) implied volatility by combining analytical structure with machine learning. The approach augments the neural-network input representation with geometric features derived from the stochastic diff...
- Covariate Balancing and Riesz Regression Should Be Guided by the Neyman Orthogonal Score in Debiased Machine Learning
This position paper argues that, in debiased machine learning, balancing functions should be derived from the Neyman orthogonal score, not chosen only as functions of covariates. Covariate balancing is effective when the regression error entering the score can be represented by functions of covariat...
- Verfi
AI-powered accounting, simplified
- Multimodal Deep Generative Model for Semi-Supervised Learning under Class Imbalance
When modeling class-imbalanced data, it is crucial to address the imbalance, as models trained on such data tend to be biased towards the majority classes. This problem is amplified under partial supervision, where pseudo-labels for unlabeled data are predicted based on imbalanced labeled data, prop...
- ConquerNet: Convolution-Smoothed Quantile ReLU Neural Networks with Minimax Guarantees
Quantile regression is a fundamental tool for distributional learning but poses significant optimization challenges for deep models due to the non-smoothness of the pinball loss. We propose ConquerNet, a class of \textbf{con}volution-smoothed \textbf{qu}antil\textbf{e} \textbf{R}eLU neural \textbf{n...
- Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon i...
- Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking
Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner's score need not estimate the fresh-data performance of the full tune-then-deploy procedure. We study inference for this procedure-level target under explic...
- Tuning Derivatives for Causal Fairness in Machine Learning
Artificial-intelligence systems are becoming ubiquitous in society, yet their predictions typically inherit biases with respect to protected attributes such as race, gender, or age. Classical fairness notions, most notably Statistical Parity (SP), demand that predictions be independent of the protec...
- CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency
Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when the stopping rule is data-dependent and t...
- Ratio-based Loss Functions
Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the...
- Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement
We investigate the ability of transformers to perform in-context reinforcement learning (ICRL), where a model must infer and execute learning algorithms from trajectory data without parameter updates. We show that a linear self-attention transformer block can provably implement policy-improvement me...
- Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapted from the modded NanoGPT codebase, we i...
- Neural-Actuarial Longevity Forecasting: Anchoring LSTMs for Explainable Risk Management
Traditional multi-population models, such as the Li-Lee framework, rely on the assumption of mean-reverting country-specific deviations. However, recent data from high-longevity clusters suggest a systemic break in this paradigm. We identify a stationarity paradox where mortality residuals in countr...
- Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
Prior-Fitted Networks (PFNs) amortize Bayesian prediction by meta-learning over a synthetic task prior, but their standard output is a posterior predictive distribution over noisy observations. For sequential decision-making, such as active learning and Bayesian optimization, acquisition should prio...
- Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $τ$-Mixing
Finite-sample analyses of deep Q-learning typically treat replayed data as independent, even though it is sampled from temporally dependent state-action trajectories. We study the Deep Q-networks (DQN) algorithm under explicit dependence by modelling the minibatches used for updating the network as ...
- The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models
Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transition from an initial phase of generalizatio...
- Topological Signatures of Grokking
We study the grokking phenomenon through the lens of topology. Using persistent homology on point clouds derived from the embedding matrices of a range of models trained on modular arithmetic with varying primes, we identify a clear and consistent topological signature of grokking: a sharp increase ...
- End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems
Learning identifiable representations in deep generative models remains a fundamental challenge, particularly for sequential data with regime-switching dynamics. Existing approaches establish identifiability under restrictive assumptions, such as stationarity or limited emission models, and typicall...
- Attributions All the Way Down? The Metagame of Interpretability
We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $φ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of feature $i$, denoted as meta-attribution ...
- Expressivity of Bi-Lipschitz Normalizing Flows: A Score-Based Diffusion Perspective
Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of score-based diffusion models. For the probability flow ODE of a variance-p...
- TabCF: Distributional Control Function Estimation with Tabular Foundation Models
Instrumental variable (IV) and control function (CF) methods are powerful tools for causal effect estimation in the presence of unmeasured confounding, yet most existing approaches target only mean effects and/or demand substantial fitting and tuning effort. In this paper, we introduce a simple meth...
- Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting
Unlike MLPs, Kolmogorov-Arnold Networks (KANs) expose explicit learnable edge functions on every connection, enabling mechanistic explanation in time-series forecasting. This paper introduces Temporal Functional Circuits, a framework that transforms KAN edge functions from latent visualizations into...
- Spherical Flows for Sampling Categorical Data
We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induc...
- Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows
Stochastic differential equations (SDEs) provide a flexible framework for modeling temporal dynamics in partially observed systems. A central task is to calibrate such models from data, which requires inferring latent trajectories and parameters from sparse, noisy observations. Classical smoothing m...
- In-Context Positive-Unlabeled Learning
Positive-unlabeled (PU) learning addresses binary classification when only a set of labeled positives is available alongside a pool of unlabeled samples drawn from a mixture of positives and negatives. Existing PU methods typically require dataset-specific training or iterative optimization, which l...
- FlowMarket
A social network of AI agents generating B2B deals
- Claude Agents for Financial Services
Finance agent templates for pitches, KYC, and closing books
- GPT‑5.5 Instant
Smarter, more personal answers as ChatGPT's new default
- Genrate.ai
The military-grade recon machine for revenue teams.
- Luma Uni 1.1 API
A reasoning model that interprets intent before it generates
- Basedash MCP server
Your data analyst, in every AI tool you already use
- SLED AI
Public-sector revenue engine for B2B companies
- Phrony
Ship AI agents without the operational burden
- Bagel AI
AI product intelligence for product and GTM teams
- DevPass by LLM Gateway
One key to access every coding model in 3 flat prices
- RAKOR
Custom CRM and AI automation for businesses
- Forge
A complete React toolkit made for AI
- Contextual Moderation for Chat
AI-powered moderation for safer chat experiences
- Hachigo
Turn repetitive AI tasks into apps