AI News Archive: June 2, 2026 — Part 23
Sourced from 500+ daily AI sources, scored by relevance.
- PlateLens
The best and most accurate AI Calorie Counter app.
- Graph Regularized Non-negative Reduced Biquaternion Matrix Factorization for Color Image Recognition
Non-negative reduced biquaternion matrix factorization (NRBMF) uses the product of reduced biquaternion (RB) matrices to incorporate the non-negativity constraints of color image pixels into the factorization process. However, NRBMF mainly focuses on reconstruction accuracy and does not exploit the ...
- PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models
Vision-Language-Action (VLA) models have achieved remarkable success in language-conditioned robotic manipulation. However, deploying these models in open-ended environments requires continuously acquiring novel skills, a process that inevitably triggers severe catastrophic forgetting of previously ...
- Diffusing in the Right Space: A Systematic Study of Latent Diffusability
Latent diffusion models leverage visual tokenizers to compress images into latent spaces for efficient generative modeling. However, better reconstruction quality of a tokenizer does not necessarily translate into better generation quality, suggesting that latent representations should be evaluated ...
- When Attention Collapses: Stage-Aware Visual Token Pruning from Structure to Semantics
Vision-Language Models (VLMs) have demonstrated remarkable capabilities but suffer from significant computational overhead during inference. While visual token pruning offers a promising solution, existing methods predominantly rely on initial attention scores. This single-metric paradigm presents a...
- Efficient Transformer-Based Localized Patch Sampling for Choroid Plexus Segmentation in Multiple Sclerosis
Background: The lateral ventricle choroid plexus (LVCP) is gaining recognition as a key imaging biomarker for multiple sclerosis (MS) related to physical disability and neuroinflammation. Yet, manual segmentation of the LVCP is highly tedious, restricting its use in broad clinical trials and longitu...
- Knowledge-Preserved Model Tuning in Null-Space for Robust Spatio-Temporal Video Grounding
Spatio-Temporal Video Grounding aims to localize object tubes based on textual queries. While recent methods have achieved remarkable success, they mainly focus on high-quality(HQ) inputs, neglecting the widespread presence of low-quality(LQ) videos in real-world scenarios. Although tuning methods l...
- EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation
Building memory is essential for long-horizon planning in zero-shot embodied navigation. Detector-centric scene graphs often compress observations into sparse nodes, discarding fine-grained visual evidence and accumulating noise, while 3D reconstruction-based methods remain computationally prohibiti...
- Characterizing Detectability in 3DGS Poisoning: A Stage-wise Benchmark
3D Gaussian Splatting (3DGS) has rapidly emerged as a leading representation for real-time novel view synthesis, but recent work shows it is vulnerable to diverse poisoning attacks, including illusory object injection, computation cost amplification, and post hoc model watermarking. Despite this exp...
- PersistGS: Differentiable Physics for Object Permanence in 4D Gaussian Splatting
Dynamic 3D Gaussian Splatting (3DGS) methods reconstruct time-varying scenes from synchronized multi-camera video using photometric supervision. When a moving object becomes fully occluded from all training cameras, this supervision vanishes: the Gaussians representing it receive no gradient signal ...
- PRISM: Synergizing Vision Foundation Models via Self-organized Expert Specialization
Unifying the complementary strengths of diverse Vision Foundation Models (VFMs) into a single efficient model is highly desirable but challenged by the negative transfer inherent in monolithic distillation. To address these feature conflicts, we introduce \textbf{PRISM}, a novel dual-stream Mixture-...
- PHAF-Personalized Hand Avatars in a Flash
We present PHAF-Personalized Hand Avatars in a Flash, a personalized photo-realistic hand avatar which provides high quality multi-view renders from just two images (dorsal and palmar views).Unlike slow optimization-based techniques, PHAF generates fast personalized textures for real-time deployment...
- Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams
Engineering diagrams pose a distinct challenge for vision-language models: unlike natural images or general documents, they encode information through dense spatial layouts, domain-specific symbols, and cross-references between visual callouts and structured parts tables. Despite their centrality to...
- SAMatcher: Co-Visibility Modeling with Segment Anything for Robust Feature Matching
Reliable correspondence estimation is a fundamental problem in image processing, underpinning applications such as Structure from Motion, visual localization, and image registration. Existing learning-based methods have significantly improved local feature representations, yet most still operate at ...
- Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation
Audio-driven human motion video generation aims to synthesize realistic and temporally coherent human animations from a single static image, with applications in talking-head synthesis, co-speech gesture generation, and dynamic presentations. Moving beyond conventional keypoint-based methods that of...
- SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation
Recent generative models can now produce visual artifacts with realistic embedded text and layouts, creating a new misinformation threat: synthetic credibility. We introduce SYNCRED-Bench, a benchmark of 600 AI-generated misinformation images balanced across six credible-form categories and seven fi...
- Voice Sync AI Teleprompter
Voice-led teleprompter that follows your speech
- Beyond Semantics: Modeling Factual and Affective Perceptual Experiences from Vision-Language Data
We present P-Topics (Perception Topics) modeling, a novel problem for understanding how images are perceived affectively and across cultures. The goal is to (1) discover and model the different perception experiences in a dataset of images and captions, where each experience is defined by an objecti...
- BA-T: An Iterative Transformer for Two-View Bundle Adjustment
Feed-forward models for 3D reconstruction have achieved strong performance using deep cross-view attention to exchange information across images. However, these approaches often depend on heavy decoder stacks and lack a structured mechanism for geometry refinement, resulting in poor multi-view consi...
- PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training
We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision i...
- FreeStreamGS: Online Feed-forward 3D Gaussian Splatting from Unposed Streaming Inputs
Feed-forward 3D Gaussian Splatting (3DGS) allows efficient and high-fidelity novel view synthesis (NVS) from an offline recorded image sequence. However, achieving online NVS from streaming and unposed image inputs remains challenging. Although online feed-forward geometric estimation methods have b...
- MariData: One-Step Unpaired Image Translation for Maritime Environments
The development on robust perception systems for Maritime Autonomous Surface Ships (MASS) is heavily constrained by the scarcity of diverse training data, particularly for adverse weather and low-light conditions. Because collecting paired images in dynamic maritime environments is physically imposs...
- MemoGen: Can Past Experience Improve Future Text-to-Image Generation?
Modern text-to-image models have achieved strong visual synthesis, yet remain unreliable when prompts require implicit visual constraints, relational reasoning, or external knowledge. Existing retrieval-augmented and agentic generation methods mitigate this issue by acquiring external knowledge, ref...
- Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting
We study preference alignment for image inpainting. Rather than proposing yet another method, we revisit the problem from first principles and reassess its core challenges. We adopt the widely used direct preference optimization framework and construct preference training data with publicly availabl...
- Effect of Demographic Bias on Skin Lesion Classification
In this study, we evaluate the performance of skin lesion classification using ResNet-based convolutional models, focusing on the impact of demographic bias in training data, particularly variations in patient sex and age. We use linear programming to generate datasets with controlled demographic ch...
- Fundraisly
AI fundraising agent that finds investors and books meetings
- Vokal
A collaboration space for 10x teammates with their Al agents
- Co-Invest
Trade 500+ markets directly from ChatGPT & Claude
- Brief
Navigate your agents to product-market fit
- Paste MCP & AI Tools
Infinite clipboard for Claude, Codex and other AI tools
- Rodeo by TwelveLabs
Describe your shot. Rodeo builds your first cut.
- Kompassify 2.0
User onboarding now with an AI copilot
- Mirowl
Search all your screenshots via a local OCR-powered AI
- Overline
Real-time AI captions and translation for any browser video
- findloc.ai
Make your business citable by ChatGPT, Claude & Perplexity
- Sortail
Self-learning one-click inbox cleanup for Apple Mail
- HumToBeats
Turn humming into AI-generated beats
- MartinLoop
Control AI coding agents with limits, proof, + run receipts
- Gusto Cofounder
If Gusto, OpenClaw, and Claude Cowork had a baby...
- Superlist MCP
The task layer for your AI agents
- Cogvert Scout
AI visibility with a built in action plan.
- PromptVault
Ship prompt changes without touching your codebase
- The Thinking Builder
A book of common sense in the era of thinking machines.
- AI Land
Curated AI tools by category hand-picked, not auto-generated
- Careerboat.ai
AI-powered career platform for jobseekers and students.
- Decomp.ai — Agent Cost Optimisation
Cut LLM costs. Free audit, pay only if it works.
- VidFlux - Photo to Video Generator
Transform photos into cinematic videos with AI.
- SproutBowl: AI Family Meal Tracker
Privacy-first, AI-powered meal tracker for the whole family.
- BeagleLathe
Claude Code, fewer tool calls, faster coding.
- Thoth - Your Private AI Scribe
Private AI meeting notes for your Mac. No cloud, ever.