AI News Archive: April 28, 2026 — Part 14

Sourced from 500+ daily AI sources, scored by relevance.

Google Signs A.I. Deal With the Pentagon
The Pentagon has also signed deals for using A.I. on classified networks with OpenAI and Elon Musk’s xAI, amid a dispute with Anthropic.
🌐 MovesApr 28, 2026https://www.nytimes.com/2026/04/28/technology/google-ai-deal-pentagon.html
Screening for patients at risk for cardiac amyloidosis via electronic health records: A multicenter machine learning development and validation study
Background Timely detection is crucial to improve outcomes in patients with cardiac amyloidosis (CA) by initiation of life-saving treatments. Although confirmatory bone scintigraphy is highly accurate for CA detection, identifying at-risk patients for referral remains challenging. Objectives This study aimed to develop and validate a machine learning model, Amylo-Detect, using structured multimodal electronic health record (EHR) data to guide referrals for confirmatory scintigraphy and monoclonal protein testing. Methods Consecutive all-comer patients (n=11,616) referred for bone scintigraphy at the Vienna General Hospital (2010-2023) were retrospectively included. Patients referred before August 2020 formed the development cohort. The remaining patients comprised the internal validation cohort. External validation was performed at the University Hospital Essen (n=1,521). Amylo-Detect was trained using 50 routinely available parameters to predict CA-suggestive uptake (Perugini grade >=
📄 ResearchApr 28, 2026https://www.medrxiv.org/content/10.64898/2026.04.27.26351820v1?rss=1
Silent numerical failures in large language model-generated pharmacokinetic simulation code: a benchmark against target-controlled infusion validation criteria using the Marsh propofol model
Background. Large language models (LLMs) are increasingly used by clinicians to generate executable code for pharmacokinetic (PK) simulation. Whether such code meets the accuracy standards of target-controlled infusion systems has not been systematically evaluated. Methods. Five LLMs (ChatGPT, Claude, DeepSeek, Gemini, Grok) were prompted to generate Python code for the Marsh three-compartment propofol model under a standardized 120-minute bolus-plus-infusion regimen. Each LLM was tested in two phases: Phase 1, integrator free; Phase 2, fourth-order Runge-Kutta with 1-second step size mandated. Twenty runs per LLM per phase were collected (n = 200). Plasma concentrations were compared against a triple-validated reference using median prediction error (MDPE), median absolute prediction error (MDAPE), and Wobble. Runs were classified as Class A (MDAPE < 1 %), B (1-30 %), C ([≥] 30 %), or D (failed). Results. All 200 scripts were invokable and created a CSV file; 199/200 (99.5 %, 95 % CI
📄 ResearchApr 28, 2026https://www.medrxiv.org/content/10.64898/2026.04.27.26351582v1?rss=1
A diagnostic model based on differential whole-brain dynamics for distinguishing neuropsychiatric symptom and cognitive impairment
Objectives: Neuropsychiatric symptoms (NPS) are prevalent in individuals of cognitive impairement (CI). However, the similarities and disparatenesses in whole-brain dynamics between individuals of CI and NPS are controversy. Electroencephalography (EEG) microstates reflect the whole-brain dynamics. This study aimed to investigate the differential EEG microstates parameters between CI and NPS and to construct related diagnostic model. Methods/design: This study was a cross-sectional study. Clinical and EEG data were collected, and an EEG microstate analysis were performed. The Least absolute shrinkage and selection operation (LASSO) regression model was used to identify significant differential EEG microstates parameters between CI and NPS and to construct a diagnostic model. The model performance was tested by the receiver operating characteristic curve (ROC). Results: This study enrolled 78 participants. A total of 36 EEG microstates parameters were identified and included in the diff
📄 ResearchApr 28, 2026https://www.medrxiv.org/content/10.64898/2026.04.27.26351804v1?rss=1
Real-World Dose Modifications for FOLFIRINOX in Pancreatic Cancer: Evaluating the Feasibility of a Machine-Learning Framework
Background: FOLFIRINOX is a cornerstone regimen for eligible patients with pancreatic ductal adenocarcinoma (PDAC), but its clinical benefit is limited by substantial toxicity and frequent dose modification. In real-world practice, dose modifications are often individualized, and the clinical factors associated with these decisions remain incompletely characterized. Objective: To develop and evaluate an electronic medical record (EMR)-based machine-learning framework for modeling cycle-specific FOLFIRINOX dose modification decisions in patients with PDAC. Methods: We included patients with PDAC who received FOLFIRINOX at UCSF oncology clinics between November 2011 and December 2023. Predictors included demographic, clinical, laboratory, and treatment variables derived from the EMR. Logistic regression, random forest, and XGBoost models were trained using group-based 5-fold cross-validation to predict cycle-specific dose modifications for 5-fluorouracil, irinotecan, and oxaliplatin. Mod
📄 ResearchApr 28, 2026https://www.medrxiv.org/content/10.64898/2026.04.27.26350002v1?rss=1
Neural and behavioural measures from attention testing show no support for efficacy of neurofeedback treatment for adult ADHD
Attention-deficit/hyperactivity disorder (ADHD) is associated with impairments in sustained attention and inhibitory control. Neurofeedback (NFB) is a widely used non-pharmacological treatment for ADHD and is generally well tolerated, but evidence for its efficacy remains mixed. Here we report results from secondary analysis of a randomized controlled trial of NFB training for adult ADHD, analysing behaviour and neural data from attention testing in both test-retest and treatment-vs-waiting list control group contrasts. We used electroencephalography (EEG) to investigate event-related cortical dynamics during the Test of Variables of Attention (TOVA), administered before and after NFB treatment. 44 adults with ADHD (NFB treatment, ADHD-T: n = 23; waitlist control, ADHD-W: n = 21) completed the TOVA before and after the NFB training period, while 128-channel EEG was recorded. Treatment-related change was examined through analyses based on behavioural TOVA performance, power spectral den
📄 ResearchApr 28, 2026https://www.medrxiv.org/content/10.64898/2026.04.26.26351764v1?rss=1
Personalized, EEG-controlled intermittent theta burst stimulation
Brain-state-controlled transcranial magnetic stimulation (TMS) studies with real-time electroencephalography (EEG) show that the phase of ongoing oscillations modulates cortical susceptibility to TMS pulses. Translating this principle to repetitive clinical protocols, such as intermittent theta burst stimulation (iTBS), is an open challenge because within-train stimulation pulses corrupt real-time EEG. Moreover, the general difficulty of predicting EEG theta phase even to initiate an iTBS train applies. We present our solution for prefrontal EEG-phase-controlled iTBS, a personalized stimulation framework. We demonstrate the technical feasibility of aligning each train's initial bursts to the individual prefrontal theta phase and propose a "seed-and-sustain" hypothesis, whereby intra-train stimulation-induced entrainment at the individual theta rhythm carries the later bursts. Future human trials will be needed to evaluate the practical benefits of this approach.
📄 ResearchApr 28, 2026https://www.medrxiv.org/content/10.64898/2026.04.27.26351877v1?rss=1
Resistivity-enhanced multi-physics machine learning framework for dynamic stress prediction in high sensitive UHPC
Resistivity-enhanced multi-physics machine learning framework for dynamic stress prediction in high sensitive UHPC EurekAlert!
📄 ResearchApr 28, 2026https://sciencesources.eurekalert.org/news-releases/1125989
Microarchitecture Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of Edinburgh, Peking U., Cambridge et al.)
A new technical paper, “Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design,” was published by researchers at University of Edinburgh, Peking University, University of Cambridge, University of Chinese Academy of Sciences, and the Hong Kong University of Science and Technology. Abstract “Large language model (LLM) decoding is a major inference bottleneck because its... » read more The post Microarchitecture Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of Edinburgh, Peking U., Cambridge et al.) appeared first on Semiconductor Engineering .
📄 ResearchApr 28, 2026https://semiengineering.com/microarchitecture-tailored-to-3d-stacked-near-memory-processing-llm-decoding-u-of-edinburg-peking-u-cambridge-et-al/
Here’s what OpenAI’s lawyer argued in his opening statement.
Here’s what OpenAI’s lawyer argued in his opening statement.
🌐 MovesApr 28, 2026https://www.nytimes.com/live/2026/04/28/technology/openai-sam-altman-elon-musk-trial/heres-what-openais-lawyer-argued-in-his-opening-statement
Here’s what Musk’s lawyers argued in their opening statement.
Here’s what Musk’s lawyers argued in their opening statement.
🌐 MovesApr 28, 2026https://www.nytimes.com/live/2026/04/28/technology/openai-sam-altman-elon-musk-trial/heres-what-musks-lawyers-argued-in-their-opening-statement
OpenAI Just Released GPT-5.5, and The Model is The Least Interesting Part
48 days. A super app and chief scientist who thinks the last two years were slow. GPT-5.4 launched in March. GPT-5.5 came out on April 23rd, 48 days later. Most coverage treated that gap as a footnote. It isn’t. It’s the entire story, wrapped up in a number most people read and immediately forget. What OpenAI announced last week wasn’t really a model release. It was a product strategy, an infrastructure bet, and a fairly blunt declaration of intent about what kind of company OpenAI wants to be. The model is the delivery mechanism. What it’s delivering is something else entirely. What GPT-5.5 actually does and what that actually means Yes, the benchmarks are real. GPT-5.5, codenamed “Spud” internally (charming, somehow), outperforms Gemini 3.1 Pro and Claude Opus 4.5 across standard evaluations. OpenAI grades its own homework, so take the exact margins with appropriate skepticism, but independent labs have corroborated the direction, and the gaps are wide enough that the directional sto
🤖 ModelsApr 28, 2026https://pub.towardsai.net/openai-just-released-gpt-5-5-and-the-model-is-the-least-interesting-part-df19dea34a5e?source=rss----98111c9905da---4
Google Clears Pentagon to Use AI Tools in Classified Settings
Tech company added language to contract to say its AI wasn’t intended for domestic mass surveillance or autonomous weapons.
🌐 MovesApr 28, 2026https://www.wsj.com/tech/ai/google-clears-pentagon-to-use-ai-tools-in-classified-settings-d8162cda?mod=rss_Technology
Record $1.1B Seed Funding for Reinforcement Learning Startup
The vendor’s goal is achieving superintelligence.
💰 MoneyApr 28, 2026https://aibusiness.com/generative-ai/record-1-1b-seed-funding-reinforcement-learning-startup
Google signs classified AI deal with Pentagon, The Information reports
Google signs classified AI deal with Pentagon, The Information reports [Ads by RSSGenerator] Please try our other product: What is my IP address? [ One click Chrome ext ]
🌐 MovesApr 28, 2026https://www.reuters.com/technology/google-signs-classified-ai-deal-with-pentagon-information-reports-2026-04-28/
Google expands Pentagon’s access to its AI after Anthropic’s refusal
After Anthropic refused to allow the DoD to use its AI for domestic mass surveillance and autonomous weapons, Google has signed a new contract with the department.
🌐 MovesApr 28, 2026https://techcrunch.com/2026/04/28/google-expands-pentagons-access-to-its-ai-after-anthropics-refusal/
Amazon now lets you have a real conversation with AI while shopping for products
Shopping on Amazon just got a lot more conversational. The company has launched Join the chat, a new interactive feature inside its existing Hear the highlights experience. If you have not come across Hear the highlights before, it is an AI-powered audio summary tool that lives on millions of product pages inside the Amazon Shopping […]
🌐 MovesApr 28, 2026https://www.digitaltrends.com/phones/amazon-now-lets-you-have-a-real-conversation-with-ai-while-shopping-for-products/
OpenAI Partners with AWS, Breaking Microsoft Exclusivity
OpenAI partners with AWS, ending its exclusivity with Microsoft.
🌐 MovesApr 28, 2026https://opentools.ai/news/openai-partners-with-aws-breaking-microsoft-exclusivity
OpenAI loosens Microsoft ties, opens door to Amazon and Google Cloud
OpenAI loosens Microsoft ties, opens door to Amazon and Google Cloud
🌐 MovesApr 28, 2026https://indianexpress.com/article/technology/artificial-intelligence/openai-loosens-microsoft-ties-opens-door-to-amazon-and-google-cloud-10659389/
Amazon is already offering new OpenAI products on AWS
A day after OpenAI got Microsoft to agree to end exclusive rights, AWS announced a slate of OpenAI model offerings, including a new agent service.
🌐 MovesApr 28, 2026https://techcrunch.com/2026/04/28/amazon-is-already-offering-new-openai-products-on-aws/
Meet the Emirati inventor honoured by Sheikh Mohammed at 15, now building AI in the UAE
Meet the Emirati inventor honoured by Sheikh Mohammed at 15, now building AI in the UAE Gulf News
🌐 MovesApr 28, 2026https://gulfnews.com/uae-success-stories/meet-the-emirati-inventor-honoured-by-sheikh-mohammed-at-15-now-building-ai-in-the-uae-1.500504507
Probabilistic modelling of single-cell bisulfite sequencing data with MethylVI
Nature Machine Intelligence, Published online: 28 April 2026; doi:10.1038/s42256-026-01225-9 MethylVI enhances analyses of single-cell bisulfite sequencing methylomic data via a deep generative model that accounts for the unique technical and biological sources of variability in this data modality.
📄 ResearchApr 28, 2026https://www.nature.com/articles/s42256-026-01225-9
StereoFoley: Object-Aware Stereo Audio Generation from Video
We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely remain limited to mono or fail to deliver object-aware stereo imaging, constrained by the lack of professionally mixed, spatially accurate video-to-audio datasets. First, we develop and train a base model that generates stereo audio from video, achieving state-of-the-art in both semantic accuracy and synchronization. Next…
📄 ResearchApr 28, 2026https://machinelearning.apple.com/research/stereofoley-object-aware-stereo-audio
Local Mechanisms of Compositional Generalization in Conditional Diffusion
Conditional diffusion models appear capable of compositional generalization, i.e., generating convincing samples for out-of-distribution combinations of conditioners, but the mechanisms underlying this ability remain unclear. To make this concrete, we study length generalization, the ability to generate images with more objects than seen during training. In a controlled CLEVR setting (Johnson et al.,2017), we find that length generalization is achievable in some cases but not others, suggesting that models only sometimes learn the underlying compositional structure. We then investigate…
📄 ResearchApr 28, 2026https://machinelearning.apple.com/research/compositional-generalization
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought (CoT) generation. However, LLM’s autoregressive decoding may limit the ability to revisit and refine earlier tokens in a holistic manner, which can also lead to inefficient exploration for diverse solutions. In this paper, we propose LaDiR (Latent Diffusion Reasoner), a novel reasoning framework that unifies the expressiveness of continuous latent representation with the iterative refinement capabilities of latent diffusion models for an existing LLM. We first construct a structured latent reasoning space…
📄 ResearchApr 28, 2026https://machinelearning.apple.com/research/ladir
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum
Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0$ is small. Using the Tsallis $q$-logarithm, we define a loss family $J_Q$ that interpolates betw...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25907v1
TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning
Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions i...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25898v1
Three Models of RLHF Annotation: Extension, Evidence, and Authority
Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conceptual models of that...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25895v1
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We ...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25891v1
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient
Training language models via reinforcement learning often relies on imperfect proxy rewards, since ground truth rewards that precisely define the intended behavior are rarely available. Standard metrics for assessing the quality of proxy rewards, such as ranking accuracy, treat incorrect rewards as ...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25872v1
RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements
Existing REST API testing tools are typically evaluated using code coverage and crash-based fault metrics. However, recent LLM-based approaches increasingly generate tests from NL requirements to validate functional behaviour, making traditional metrics weak proxies for whether generated tests valid...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25862v1
Investigation into In-Context Learning Capabilities of Transformers
Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classi...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25858v1
SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring
Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25855v1
G-Loss: Graph-Guided Fine-Tuning of Language Models
Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for the global semantic structure. We present G-Loss, a graph-guid...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25853v1
Large language models eroding science understanding: an experimental study
This paper is under review in AI and Ethics This study examines whether large language models (LLMs) can reliably answer scientific questions and demonstrates how easily they can be influenced by fringe scientific material. The authors modified custom LLMs to prioritise knowledge in selected fringe ...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25639v1
ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents
Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25849v1
Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions
We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (se...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25848v1
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modul...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25847v1
CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation
Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexic...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25774v1
Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile
The quality of training data is critical to the performance of machine learning models. In this paper, the Error Sensitivity Profile (ESP) is proposed. It quantifies the sensitivity of model performance to errors in a single feature or in multiple features. By leveraging ESP, data-cleaning efforts c...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25765v1
QAROO: AI-Driven Online Task Offloading for Energy-Efficient and Sustainable MEC Networks
With the rapid advancement of artificial intelligence (AI) and intelligent science, intelligent edge computing has been widely adopted. However, the limitations of traditional methods, such as poor adaptability and the slow convergence of heuristic algorithms, are becoming increasingly evident. To e...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25740v1
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?
Instructed code editing is a significant challenge for large language models (LLMs). On the EditBench benchmark, 39 of 40 evaluated models obtain a task success rate (TSR) below 60 percent, highlighting a gap between general code generation and the ability to perform instruction-driven editing under...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25737v1
Verification of Neural Networks (Lecture Notes)
These lecture notes provide an introduction to the verification of neural networks from a theoretical perspective. We discuss feed-forward neural networks, recurrent neural networks, attention mechanisms, and transformers, together with specification languages and algorithmic verification techniques...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25733v1
Cross-Lingual Jailbreak Detection via Semantic Codebooks
Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual deployment. Prior work shows that translating malicious prompts into other languages can substantially increase jailbreak success rates, exposing a structural ...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25716v1
Learning Generalizable Multimodal Representations for Software Vulnerability Detection
Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary sem...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25711v1
RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion
Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive b...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25693v1
Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task
This paper investigates how GPT-based tools can assist in building reusable analytical spreadsheet models. After a screening, we evaluate five GPT extensions and select Excel AI by pulsrai.com for detailed testing. Through structured experiments on simple problem statements, we assess Excel AI's per...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25689v1
CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG
Multilingual retrieval-augmented generation (mRAG) is often implemented within a fixed retrieval space, typically via query or document translation or multilingual embedding vector representations. However, this approach may be inadequate for culturally grounded queries, in which retrieval-condition...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25676v1
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation
Reliable evaluation of large language model (LLM)-generated summaries remains an open challenge, particularly across heterogeneous domains and document lengths. We conduct a comprehensive meta-evaluation of 14 automatic summarization metrics and LLM-based evaluators across seven datasets spanning fi...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25665v1
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models
Large Vision-Language Models (LVLMs) have achieved remarkable progress in visual-textual understanding, yet their reliability is critically undermined by hallucinations, i.e., the generation of factually incorrect or inconsistent responses. While recent studies using steering vectors demonstrated pr...
📄 ResearchApr 28, 2026http://arxiv.org/abs/2604.25642v1