AI News Archive: June 25, 2026 — Part 11
Sourced from 500+ daily AI sources, scored by relevance.
- From PDEs to Graphs: A Primer on Physics Simulation and Geometric Deep Learning (Part 1/2)
No physics or ML background required. Everything you need before reading how I built an AI that predicts fluid flow in seconds, in Part 2. The Hook: Twenty-Six Hours Per Iteration An engineer at an automotive company wants to know how air flows around a new side mirror design. She opens her CAD tool, exports the geometry, hands it to the simulation team, and waits. One hour for meshing. Another hour for solver setup. Twenty-four hours for the CFD run to finish. Then she looks at the result, decides the drag is too high, changes the mirror shape slightly, and starts the whole cycle again. Twenty-six hours per design iteration. If she wants to explore ten design variants, that’s a week and a half — before writing a single line of manufacturing spec. This is the reality of physics simulation in engineering today, and it’s the problem a new generation of AI tools — including a project of mine called PhysIQ, which I’ll walk through in Part 2 — is trying to solve. But before any of that makes sense, we need to build up the basics: what physics simulation actually is, what a mesh is, what a neural network is, and why a special kind of neural network (a Graph Neural Network) turns out to be a remarkably good fit for this problem. No prior background assumed. Let’s start from the ground up. What Is Physics Simulation, Really? Physics simulation means predicting how a physical system changes over time, using a computer instead of a physical experiment. Will the air flow smoothly around this car, or will it create turbulent eddies? Will this bridge bend within safe limits under load? Will this cloth drape naturally over a character in an animated film? To get there, it helps to start even further back — with the simplest physical idea there is: things change, and we have a precise mathematical language for describing how . Motion, and a Mathematical Way to Talk About Change Suppose a car is moving. Its position changes over time — at one second it’s at 10 meters, at two seconds it’s at 20 meters. The rate at which position changes is called velocity . If velocity itself changes — the car speeds up or slows down — that rate of change is called acceleration . Newton’s second law, F = ma, says that force equals mass times acceleration: physics, in this view, is fundamentally a story about how quantities change . Mathematicians have a precise tool for describing change: the derivative . The derivative of position with respect to time gives velocity; the derivative of velocity gives acceleration. An equation that involves derivatives is called a differential equation , and a huge amount of physics — pendulums, springs, falling objects, planetary orbits — can be written down entirely as a differential equation relating a few changing quantities to each other. When a quantity only changes with respect to one variable (usually time), the equation is called an ordinary differential equation (ODE) — the position of a single falling ball is a classic example. But the air flowing around a car doesn’t just change over time; it changes in the x direction, the y direction, and the z direction too, all at once and interdependently. Equations with derivatives across multiple variables like this are called partial differential equations (PDEs) , and almost every interesting engineering simulation — fluid flow, heat transfer, structural deformation — comes down to solving one. The catch is cost. For fluids specifically, the governing PDE is the Navier-Stokes equations — a relationship connecting velocity, pressure, and how both change in space and time. For almost any shape or scenario you’d actually care about in engineering, equations like this have no clean, pen-and-paper solution. You can write Navier-Stokes on a blackboard in one line, but solving it exactly for “air flowing around a car mirror” is not something anyone can do with algebra. So instead, simulation tools fall back on numerical methods : break the problem into small enough pieces that an approximate solution is tractable to compute, even if it takes a lot of computation. Here’s the core intuition for why “breaking into pieces” works, before we get to the specifics of meshes. Imagine trying to approximate a smooth curve using a computer that can only draw straight lines. One long straight line would be a poor approximation. But ten short straight line segments, each following the curve closely over a small stretch, gets you something visually indistinguishable from the real curve. More segments, better approximation — at the cost of more line segments to compute. Physics simulation uses exactly this idea, just in two or three spatial dimensions instead of one: instead of solving a PDE everywhere, continuously, we divide space into many small regions and solve an approximate, simplified version of the physics within each one. That dividing-into-small-regions step is where meshes come in. What Is a Mesh, and What Is Triangulation? Imagine the 2D cross-section of a pipe with a cylindrical obstacle in it — this is, conveniently, the actual benchmark problem used later in this series. To simulate fluid flowing around that cylinder, a solver needs to know the velocity and pressure everywhere in the domain, at every instant in time. Computing an exact, continuous answer everywhere is impossible. So instead, the domain is broken into a finite number of small, simple shapes — usually triangles in 2D, or tetrahedra in 3D — connected at shared corners. This collection of small shapes is called a mesh . A mesh has three basic ingredients: Nodes (or vertices): individual points in space Edges : the connections between adjacent nodes Faces (or elements): the small triangles (2D) or tetrahedra (3D) formed by those nodes and edges Instead of solving the PDE everywhere continuously, the solver only computes quantities like velocity and pressure at the nodes , and uses the mesh structure to estimate how those quantities vary across each small element. This is the core idea behind methods like the Finite Element Method (FEM) and Finite Volume Method (FVM) : turn a continuous, infinite-dimensional problem into a finite, discrete one that a computer can actually solve, by assembling a large system of equations — one set of unknowns per node — and solving them simultaneously. Why Triangles, and What Is Triangulation? Why break a domain into triangles specifically? Triangles are the simplest 2D shape that can tile an irregular region without gaps, and — critically — a triangle is always “flat” and non-degenerate as long as its three vertices aren’t collinear. This makes the math of estimating a smoothly-varying quantity across each triangle straightforward. Triangulation is the process of generating that triangle mesh from a set of points or boundary curves. Not all triangulations are equally good. A triangulation full of long, thin, needle-like triangles produces numerically unstable, inaccurate simulations — small errors get amplified. The gold standard is Delaunay triangulation : a specific way of connecting points into triangles such that no point lies inside the circumcircle of any other triangle. In practice, this rule tends to avoid thin slivers and produce triangles that are as close to equilateral as the point distribution allows — which keeps the numerical solver well-behaved. Mesh quality also varies by where you are in the domain. Near the cylinder surface, where velocity changes rapidly (steep gradients, boundary layers, vortex shedding), you want a fine mesh — small triangles densely packed — to capture that detail accurately. Far from the cylinder, where the flow is calm and slowly varying, a coarse mesh — large triangles — is good enough and saves a lot of computation. This deliberate variation in mesh density is called mesh refinement , and getting it right is itself a specialized skill in computational engineering. This is also where the real cost of classical simulation comes from: a fine, well-refined mesh might have hundreds of thousands or millions of nodes for a 3D problem, and the solver has to assemble and solve a system of equations at every single timestep, for potentially thousands of timesteps. That’s the “twenty-four hours” from the opening story. Neural Networks, From Scratch If you already know what a neural network is, skip ahead — but for completeness: A neural network is a function — a mathematical mapping from inputs to outputs — built out of stacked layers of simple operations. Each layer takes a vector of numbers, multiplies it by a matrix of learned weights, adds a bias, and passes the result through a nonlinear function (like ReLU, which just zeroes out negative values). Stack enough of these layers and the network can, in principle, approximate extremely complicated functions — this is the universal approximation property that makes neural networks broadly useful. The “learning” part works like this: you show the network an input, compare its output to the correct answer using a loss function (a number that measures how wrong the prediction was), and then use backpropagation — repeated application of the chain rule from calculus — to figure out how to nudge every weight in the network to make that loss slightly smaller. Repeat this millions of times over a large dataset, and the weights gradually settle into values that make the network’s predictions accurate. The specific architecture of layers matters enormously, and is usually chosen to match the structure of the data. Convolutional Neural Networks (CNNs) exploit the grid structure of images. Transformers exploit the sequential structure of text. And — as we’re about to see — Graph Neural Networks exploit the irregular, connected structure of meshes. Physics-Informed Loss: Teaching a Network the Rules of Physics Here’s an idea that sits right at the intersection of physics and deep learning: what if, instead of (or in addition to) training a neural network on labeled examples, you trained it to directly satisfy a physical law? This is the idea behind Physics-Informed Neural Networks (PINNs) . A PINN is usually a fairly ordinary neural network — often a simple multi-layer perceptron — trained to output a predicted physical quantity (say, velocity and pressure) for any given point in space and time. The twist is in the loss function. Because the network’s output is a differentiable function of its inputs, you can use automatic differentiation (the same machinery behind backpropagation) to compute the derivatives of the network’s own output — exactly the derivatives that appear in the governing PDE (like Navier-Stokes). Plug those derivatives back into the PDE, and you get a number called the PDE residual : how badly the network’s current prediction violates the physical law. That residual becomes a term in the loss function. The network is, quite literally, penalized for disagreeing with physics, even at points where there’s no labeled training data at all. It’s an elegant idea — physics itself becomes a teacher. But PINNs have real, practical limitations. Training against a nonlinear PDE residual is a hard optimization problem in its own right. A PINN trained for one specific geometry and boundary condition generally needs to be retrained from scratch if you change the shape — there’s no built-in notion of mesh connectivity or geometry in a standard MLP. And for problems that evolve over long time horizons, PINNs can drift or fail to converge cleanly. This matters because it sets up a real tension worth understanding before we get to GNNs: do you want a network that is taught the physics directly via a PDE-based loss (a PINN), or a network that learns the physics implicitly by training on many examples of mesh-based simulation data, the way an image classifier learns “catness” implicitly from thousands of cat photos? Both are valid strategies, with different tradeoffs — and it’s the second strategy, applied specifically to mesh data, that leads us to geometric deep learning. Geometric Deep Learning: Why Meshes Need a Different Kind of Neural Network Recall that a mesh is irregular: some nodes have three neighbors, some have eight; edge lengths vary depending on local mesh refinement. A standard CNN expects a grid, where pixel [i, j] always has exactly four neighbors at a fixed distance. You simply cannot slide a convolution kernel over a mesh — there’s no consistent “next node” the way there’s a consistent “next pixel.” Geometric deep learning is the field that generalizes deep learning to exactly this kind of non-Euclidean, irregular data: graphs, meshes, point clouds, manifolds. And the key realization, once you see it, is almost obvious: a mesh basically is a graph already. Nodes are mesh vertices. Edges are the connections between them. All that’s missing is a way to do something convolution-like — aggregating local neighborhood information — on this irregular structure. That something is called message passing , and it’s the central operation in a Graph Neural Network (GNN). One round of message passing works like this: Message : for every edge connecting node i and node j, compute a “message” — a vector of numbers — that depends on the features of both nodes and the edge itself. Aggregate : for every node i, collect all incoming messages from its neighboring nodes — typically by summing or averaging them together. Update : combine node i’s current features with the aggregated messages to produce its new, updated features. This single round of message passing lets each node absorb information from its immediate neighbors. Stack many rounds — many “layers” of the GNN — and information propagates further across the mesh with each layer, the same way information would propagate several hops away in a single pass. This isn’t just a clever computational trick to make graphs work with deep learning — it’s a genuinely physical match. In a real fluid or solid, a disturbance at one point physically propagates to its neighbors first, and from there to their neighbors, and so on. Message passing on a mesh graph respects exactly the same locality structure as the physics it’s trying to model. That correspondence is the whole reason GNNs turn out to be such a natural fit for learning physics simulation directly from mesh data — which is exactly the approach behind PhysIQ, covered in Part 2. Three Strategies, Side by Side Putting it all together, there are roughly three distinct strategies for “physics AI” worth knowing about: Physics-Informed Neural Networks (PINNs) — bake the governing PDE directly into the training loss. No simulation data strictly required, but limited generalization across geometries and slow to train. Neural Operators (e.g. Fourier Neural Operators) — learn a mapping between entire input and output fields rather than fixed-size vectors, working naturally on regular grids but requiring awkward interpolation on unstructured, irregular meshes. Data-driven mesh surrogates (Graph Neural Networks) — train a GNN directly on a dataset of mesh-based simulation trajectories (input mesh and boundary conditions → solution over time), and use it instead of the solver at inference time. This generalizes naturally across arbitrary mesh topologies, at the cost of needing a dataset of real simulation runs to learn from. Each strategy makes a different bet about where the “physics knowledge” should live: explicitly in the loss function, implicitly in a learned operator over fields, or implicitly in a learned operator over graphs. PhysIQ is built on the third approach, following the architecture introduced by DeepMind’s MeshGraphNets — and that’s where Part 2 picks up. The Big Picture, Side by Side It’s worth stepping back and looking at the two pipelines next to each other — the classical one, and the one machine learning enables: Classical simulation: Physics equations (PDEs) ↓ Mesh (triangulation, refinement) ↓ Numerical solver (FEM / FVM, every timestep) ↓ Result — hours or days later Machine-learning simulation: Thousands of pre-computed simulations ↓ Train a Graph Neural Network on that data ↓ New geometry comes in ↓ Prediction — seconds later The classical pipeline re-solves the same physics from scratch, every single time, for every new geometry. The learning-based pipeline pays the cost once, up front, during training — and from then on, it’s not solving equations at inference time at all. It’s recognizing patterns in how solutions tend to behave, the same way an image classifier doesn’t “compute” that something is a cat, it recognizes the pattern from having seen many cats before. Continued in Part 2: a full case study of PhysIQ — the actual GNN architecture, the data engineering behind training it efficiently, how it learns to flag its own uncertain predictions, and how it can run the simulation backwards to design a shape from a target performance metric. The full codebase for PhysIQ is available at github.com/ahmealy/PhysIQ . A full demo is on YouTube . MeshGraphNets paper: Pfaff et al., “Learning Mesh-Based Simulation with Graph Networks”, ICLR 2021. arxiv.org/abs/2010.03409 From PDEs to Graphs: A Primer on Physics Simulation and Geometric Deep Learning (Part 1/2) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- Machine Learning and Artificial Intelligence in Python
Machine Learning and Artificial Intelligence in Python Oxford Lifelong Learning
Score: 11🌐 MovesJun 25, 2026https://lifelong-learning.ox.ac.uk/courses/machine-learning-and-artificial-intelligence-in-python-2/ - Cannabis robotics company Vape Jet moves headquarters to Strip District
The company manufactures robotics equipment that automates cannabis vape cartridge filling. The move comes as recreational cannabis remains illegal in Pennsylvania.
Score: 10🌐 MovesJun 25, 2026https://www.bizjournals.com/pittsburgh/news/2026/06/25/vape-jet-relocates-to-strip-district.html?ana=brss_6150 - Announcing The D2C & Retail Summit 2026: Decoding Commerce In The Age Of AI & 10-Min Delivery
India’s consumer economy is entering a defining new chapter. The era of burning capital for hyper-growth is over, replaced by…
Score: 10🌐 MovesJun 25, 2026https://inc42.com/buzz/announcing-the-d2c-retail-summit-2026-decoding-commerce-in-the-age-of-ai-10-min-delivery/ - I Built My Own Analytics + AB Testing Tool with Claude Code.
I Built My Own Analytics + AB Testing Tool with Claude Code. (Part 2 of 3: A/B tests and session replay) Part 1 got events from the browser into Postgres. A pipeline that only counts pageviews is a worse Google Analytics, though. The reason to build your own is to do the things the off-the-shelf tools gate behind a sales call: run real experiments, and watch real sessions. Both turn out to lean on the same humble trick: a hash function. A/B testing without the flicker Most A/B tools ship a library that rewrites your DOM after the page loads. You’ve seen the result: the original headline flashes for 200ms, then snaps to the variant. It looks broken because it is. I went the other way. A test is two URLs, control and variant, and the tracker redirects a share of traffic before the page paints. You build the variant as a real page. No DOM surgery, no flash. The cost is that you maintain two pages instead of patching one, which for landing pages is a trade I’ll take every time. The same visitor, the same bucket, forever The hard requirement: a visitor must always land in the same variant, and I refuse to store a server-side record of who saw what. A hash gives you exactly that: a stable decision out of thin air. Hash the visitor ID together with the test ID, get a number between 0 and 1, and walk the variant weights: function hashToFloat(str) { // FNV-1a var h = 0x811c9dc5; for (var i = 0; i < str.length; i++) { h ^= str.charCodeAt(i); h = Math.imul(h, 0x01000193); } return (h >>> 0) / 0xffffffff; } var bucket = hashToFloat(visitorId + test.id); var cumulative = 0, assigned = null; for (var i = 0; i < test.variants.length; i++) { cumulative += test.variants[i].weight; if (bucket <= cumulative) { assigned = test.variants[i]; break; } } No database of assignments. No coordination. The same person hashes to the same bucket every visit, and mixing in the test ID means their bucket in one test tells you nothing about the next. Two details that look small and aren’t When you redirect to the variant, carry the query string over. Forget this and you strip the UTM and ad-click parameters off the URL, and your paid traffic suddenly looks like it came from nowhere: var variantUrl = new URL(assigned.url, location.origin); new URLSearchParams(location.search).forEach(function (v, k) { if (!variantUrl.searchParams.has(k)) variantUrl.searchParams.set(k, v); }); location.replace(redirectUrl); // replace(), so "back" skips the redirect And fail fast. The assignment request gets a 2-second timeout, and every failure path does nothing and lets the page load: xhr.timeout = 2000; xhr.ontimeout = function () {}; // show control, move on xhr.onerror = function () {}; A visitor who sees the control because your API was slow is a non-event. A visitor staring at a blank page because you blocked render on a database query is a refund. Calling the test without fooling yourself Because Part 1’s tracker stamps ab_variant onto every event, results are one grouped query: visitors and conversions per variant. The honesty lives in what you do with those counts. I wrote the stats with zero dependencies, and it's less code than the npm install would be. A two-proportion z-test answers “is this difference real or just noise?” const p1 = controlConversions / controlVisitors; const p2 = variantConversions / variantVisitors; const pPooled = (controlConversions + variantConversions) / (controlVisitors + variantVisitors); const se = Math.sqrt(pPooled * (1 - pPooled) * (1 / controlVisitors + 1 / variantVisitors)); const zScore = (p2 - p1) / se; const pValue = 2 * (1 - normalCDF(Math.abs(zScore))); But the number that keeps you honest is the confidence interval. “Variant B converts at 3.2%” invites you to celebrate. A Wilson interval of “3.2%, somewhere between 1.1% and 5.9%” tells you the truth: you don’t know yet. It’s the same midpoint, but showing the range is what stops people calling a win off forty visitors on a Tuesday. The same file computes the sample size you need before you start, so “how long do we run this” has a real answer instead of a gut feel. There’s an auto-stop flag too: a scheduled job watches running tests and routes everyone to the winner once it clears the threshold. Session replay, scoped so it doesn’t bankrupt you Watching someone use your page is worth a hundred funnel charts. It’s also the heaviest thing in the whole system, so the scope is aggressive: replay records only A/B test sessions, and only a sample of those. If you’re recording everyone, you’re paying to store screensavers. Lazy loading protects the budget The recorder is bigger than the entire tracker, so it never ships in the main snippet. It loads only after a visitor is bucketed into a test: function initReplay() { if (replayStarted || !abTestId || isPreview) return; // tests only var s = document.createElement('script'); s.src = currentScript.src.replace(/pp\.js/, 'pp-replay.js'); s.onload = function () { window.__ppReplay.initReplayRecording(/* session context */); }; document.head.appendChild(s); } Visitors who aren’t in an experiment never download a byte of it. That’s how you keep Part 1’s 5KB promise. Don’t write the recorder. Use rrweb. rrweb takes a DOM snapshot and then streams mutations, so replay is just rebuilding the page and replaying changes on a timeline. Reimplementing it is a months-long sinkhole. Configure it for privacy and noise up front: record({ emit: function (e) { buffer.push(e); }, maskAllInputs: true, // never record what people type blockSelector: '[data-pp-block]', sampling: { mousemove: 50, scroll: 150, input: 'last' }, }); maskAllInputs: true is the default, not a setting you remember to flip. Record one password field by accident and your analytics tool is now a breach waiting to happen. Mask everything; let sites unmask on purpose. Same hash trick, different job A half-recorded session is useless, so the record/skip decision is made once, deterministically, from the session ID, the exact same move as A/B bucketing: var hash = 0; for (var j = 0; j < sessionId.length; j++) hash = ((hash << 5) - hash + sessionId.charCodeAt(j)) | 0; if (Math.abs(hash) % 100 >= sampleRate) return; // default 50% Chunk it, or lose it Recordings run minutes and hit megabytes. Buffer the whole thing and send at the end, and a tab that dies takes everything with it. So events flush as numbered chunks every 5 seconds, with the first chunk going out after just 1 second. It holds the bulky initial DOM snapshot, and flushing it early means even a two-second bounce leaves something watchable. The final chunk rides sendBeacon; the rest use XHR, which has no size cap. Storage is two tables: one row of metadata per recording, many bytea chunk rows ordered by index. To play it back, fetch the chunks in order, concatenate, hand them to the rrweb player. One gotcha: a killed tab never sends its "final" chunk, so a cron job marks any recording with no new chunk in 60 seconds as done. Skip that and your "in progress" list grows forever. The iframe trap If your page embeds another origin in an iframe (say a site builder wrapping an embedded scheduler), rrweb can’t see inside it, and the cross-origin recording option in rrweb v2 crashed outright on me. The workaround: a separate script inside the iframe records it independently, the parent broadcasts session context via postMessage (re-broadcasting to late-arriving iframes with a MutationObserver), and the dashboard stitches the two recordings back together by session ID. It's fiddly. It's also the only way to see inside frames you don't own. Two experiments-grade features, both resting on a hash function and a respect for not blocking render. You can now run honest tests and watch the sessions behind them. In Part 3 (to be published next week) , the part I find genuinely fun: feeding all of this, the events, the test results, and your actual customer calls, to an LLM that hands back advice specific enough to ship. Build it yourself with Claude Code The companion docs hold the full version of everything above: the complete redirect logic, the whole stats engine, the rrweb config, and the iframe workaround in full: A/B testing : hash bucketing, redirect-without-flicker, and the dependency-free z-test / Wilson interval / sample-size math. Session replay : rrweb setup, chunked upload, the storage schema, and the cross-origin iframe fix. How to use them: read alongside the post, or hand a doc to Claude Code and have it scaffold the piece. The stats doc in particular is exact enough to generate the whole significance.ts file from. No stats library required. I Built My Own Analytics + AB Testing Tool with Claude Code. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- Rovo for New Atlassian Admins: 3 Agents You Can Ship Today
Rovo for New Atlassian Admins: 3 Agents You Can Ship Today Atlassian Community
- How We Beat the 50-Item Limit in Rovo with a Forge Action
How We Beat the 50-Item Limit in Rovo with a Forge Action Atlassian Community
- - G42
G42
- I Cut a “12 Open-Source AI Projects” List Down to The 7 I’d Actually Install
Every weekend the same thing happens on my feed. Someone posts “12 open-source AI projects you NEED right now,” the replies fill with rocket emojis, and I bookmark all twelve. Then I install none of them. The projects aren’t the problem. The count is. Twelve is a list you scroll past, not one you act on. And these roundups rank by GitHub stars, which tells you what’s popular, not what earns a slot in your workflow. So I did the filtering. I watched Matthew Berman’s latest “12 projects” video, pulled all of them, and cut it to the seven that genuinely change how my coding agent works. The rest were mostly video-generation toys, fun but not core, and one was Matt Pocock’s skills repo, which I already broke down in full . Here are the seven I kept, and why. DeerFlow: the harness for jobs that take hours Repo: github.com/bytedance/deer-flow DeerFlow is ByteDance’s open-source super-agent harness, around 66,000 stars, and it’s built for the thing most agents are terrible at: long-horizon work. The name stands for Deep Exploration and Efficient Research Flow. You give it a task and it runs for hours, sometimes days. Most harnesses choke on anything past a few minutes. They lose the thread, forget the goal, run out of context. DeerFlow assumes the opposite, that the interesting jobs are the slow ones. It uses sub-agents to split big tasks, sandboxes to run pieces safely, and memory to stay coherent across a long session. The repo points it at data pipelines, slide decks, dashboards, and content automation. If you’ve used OpenClaw or Hermes and wished for something built for the marathon instead of the sprint, this is it. gstack: Garry Tan’s process, turned into skills Repo: github.com/garrytan/gstack gstack is Garry Tan’s opinionated Claude Code setup, and it’s one of the most-starred agent repos around. Tan runs Y Combinator. He took how he thinks about building software and codified it into skills your agent runs in order. That order is the point. Tan calls gstack a process, not a toolbox. You don’t cherry-pick. You go think, plan, build, review, test, ship, reflect, with each skill handing off to the next. My favorite is /office-hours. It recreates the YC ritual where a partner grills you on your problem, your solution, and your team. Your agent plays the partner. If you're chewing on a startup idea and not just a feature, that hits different. Install is the now-standard move: paste the repo URL, tell your agent to install it. Run gstack next to Matt Pocock’s skills and you’ve got two full philosophies of AI engineering to steal from. Codebase Memory MCP: a real map of your code Repo: github.com/DeusData/codebase-memory-mcp Here’s one that’s flying under the radar at around 12,000 stars. Codebase Memory MCP, from DeusData, indexes an average repo in milliseconds and answers structural queries in under a millisecond. Its benchmark flex: it indexed the entire Linux kernel, 28 million lines, in about three minutes. Why care? Your agent doesn’t really understand your codebase. It greps, guesses, reads a few files, and hopes. On a large repo that burns tokens and still misses context. This builds a persistent knowledge graph of your code, so the agent queries structure instead of re-reading files every time. The repo claims roughly 120 times fewer tokens for the same understanding, supports 158 languages, and works across 11 harnesses. There’s even a 3D view so you can fly through your codebase as a graph. One line to install. If your agent keeps getting lost in a big repo, this is the fix I’d reach for first. “Anthropic” Cybersecurity Skills: useful, but check the name Repo: github.com/mukul975/Anthropic-Cybersecurity-Skills I have to correct something here. “Anthropic Cybersecurity Skills” is a community repo by a developer named mukul975, not an official Anthropic release, even though “Anthropic” is right there in the name. It’s around 20,000 stars. The name is borrowing some credibility, so go in knowing that. The skills are real and good. It’s a big pack mapped to six recognized frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, NIST AI RMF, and MITRE’s fraud framework, the last co-developed with JPMorgan, Citi, and Lloyds. The workflow is dead simple. Install it, point your agent at your codebase, say “improve my cybersecurity defenses.” Now the agent reasons with actual security taxonomies instead of guesswork. It works with Claude Code, Copilot, Codex, Cursor, Gemini. Just remember whose repo it actually is. SkillSpector: scan a skill before you trust it Repo: github.com/NVIDIA/SkillSpector Notice what I just did. I had you install several repos by pasting URLs and telling your agent to run them. That’s an attack surface, and it deserves more respect than it gets. SkillSpector, from NVIDIA, around 10,000 stars, is the scanner built for exactly this. It checks agent skills for 65 vulnerability patterns across 16 categories: prompt injection, data exfiltration, privilege escalation, supply chain, excessive agency, MCP tool poisoning, and more. It takes repos, URLs, zips, directories, or single files. Think about what a bad skill can do. It runs inside your agent, with your agent’s tools and access. A skill that quietly leaks your environment variables looks exactly like a helpful one until it doesn’t. So the rule is obvious: scan any third-party skill, including a few from this list, before you install it. NVIDIA shipping it is about the best trust signal you get for a security tool. Honestly, this should be step zero. Hermes: the self-healing agent people are switching to Repo: github.com/NousResearch/hermes-agent Hermes, from Nous Research, is one of the most-starred repos on all of GitHub at nearly 200,000 stars. It’s pitched as an alternative to OpenClaw, and the hook is right in the name of the feature set: it’s self-healing and self-improving. In practice, when a skill fails mid-run, Hermes doesn’t just error out. It tries to fix itself and improves for the next run. The failures you’d normally babysit start resolving on their own. It’s got everything you’d want from a top harness, persistent memory, messaging gateways, more than I could cover in one section. OpenClaw is still great, to be clear. Hermes is a serious second option, and the self-healing angle is the reason I’d give it a weekend. Voicebox: ElevenLabs and WhisperFlow, local and free Repo: github.com/jamiepine/voicebox Voicebox, from Jamie Pine, around 33,000 stars, collapses two paid tools into one open-source app: ElevenLabs-style voice generation and WhisperFlow-style dictation, all running locally. Both sides of voice, input and output, in one place. Output side: near-perfect voice cloning across several engines, a timeline editor for arranging audio, an effects pipeline. Input side: Whisper transcription and dictation into any app. Plug in local models and run it all offline. The pitch I like is “talk to your agents in voices you own.” No per-character billing, no cloud dependency, no shipping your audio to someone else’s server. If you’re building voice into a product, owning the full stack locally is the gap between a demo and something you can ship. For a free project replacing two subscriptions, the local models sound shockingly good. What I cut, and why I dropped the three video tools (OpenMontage, Hyperframes, Palmier Pro), a new Baidu OCR model that’s too early to lean on, and Matt Pocock’s skills, which I already covered. The filter was simple: does it make my coding agent better at real work? If yes, it stayed. If it’s a fun demo, it got a wave goodbye. Want all twelve, including the five I cut? Here’s Matthew Berman’s original video . https://medium.com/media/91e782937325370a6ee829ab5e5f9dbb/href I Cut a “12 Open-Source AI Projects” List Down to The 7 I’d Actually Install was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- Questions and answers around Rovo
Questions and answers around Rovo Atlassian Community
Score: 08🌐 MovesJun 25, 2026https://community.atlassian.com/forums/Rovo-questions/qa-p/rovo-atlassian-intelligence-questions - Prime Day Eufy E15 Robot Lawn Mower Deal: Save Some Green While Cutting Grass
Prime Day Eufy E15 Robot Lawn Mower Deal: Save Some Green While Cutting Grass PCMag
Score: 08🌐 MovesJun 25, 2026https://www.pcmag.com/deals/eufy-e15-robot-lawn-mower-deal-amazon-prime-day-2026-june-25 - Building Your First Hermes Agent Skill: A Complete Walkthrough
I stared at my terminal for 20 minutes trying to figure out why my Hermes Agent kept forgetting everything between sessions. Same context. Same prompts. Same frustration. Then I discovered skills, the extensibility layer that turns a bare agent into something that actually remembers how you work. Here’s exactly how to build your first Hermes skill from scratch, including the mistakes I made that you don’t have to repeat. By the end, you’ll have a working skill that teaches your agent your conventions, your tools, and your workflow. I’ve published 6 skills so far. Three failed spectacularly before I figured out what actually works. This walkthrough compresses those failures into a path you can follow in about 30 minutes, start to finish. What Are Skills, Actually? Think of skills as your agent’s procedural memory. They’re not prompts, they’re structured documents that tell your agent how to handle specific tasks the way you want them handled. The difference matters. A prompt says “write tests.” A skill says “use pytest with xdist, put fixtures in conftest.py, run with coverage thresholds, and here’s what to do when the database migrations fail.” See the gap? Skills are reusable procedures, not one-shot instructions. Hermes skills live as SKILL.md files with YAML frontmatter. When the agent encounters a task that matches the skill's domain, it loads the full instructions and follows them. Every time. Consistently. Without you having to re-explain your preferences. The first time I realized this power was when I stopped typing “remember to use tabs not spaces” in every conversation. I put it in a skill. The agent just… followed it. Forever. That moment of silence, not having to repeat myself, that’s when skills clicked for me. The Anatomy of a SKILL.md Every skill has two parts: the frontmatter (metadata) and the body (instructions). Get either wrong and your skill either won’t load or won’t work. I learned this the hard way when my first skill, a code review checklist, kept getting ignored because I misspelled trigger_conditions in the frontmatter. --- name: my-awesome-skill category: productivity description: "Short description of what this skill does" version: 1.0.0 author: Your Name platforms: [linux, macos] metadata: hermes: tags: [your, tags] prerequisites: skills: [] --- The frontmatter is YAML wrapped in --- delimiters. The body is Markdown. That's it. That's the whole structure. Simple, but strict. Pro tip: Always include version and platforms. I skipped them on my second skill and spent 40 minutes debugging why it wouldn't load on my colleague's Mac. It was a platform filter. Don't be me. Here’s something nobody tells you: the category field affects discovery. Hermes uses it to surface relevant skills when it detects task types. If you pick "creative" for a code review skill, it might never appear when you need it. Match the category to the actual work. Step 1: Pick Something You Actually Do Repeatedly This is where most people go wrong. They build skills for hypothetical scenarios instead of real workflows. I once built a “microservice deployment orchestrator” skill that I never used because my actual deployment was just three kubectl commands I kept copy-pasting. Instead, look for tasks you do at least twice a week where you find yourself giving the same instructions repeatedly. For me, it was: Setting up new project repositories with my preferred structure Running my pre-commit checklist (lint, format, test, build) Generating API documentation from OpenAPI specs Onboarding explanations of how our monorepo is organized Those repeated instructions are gold. They’re skill candidates. The test: If you’ve explained something to the agent more than twice, it’s a skill candidate. If you only do it once a month, skip it. Save your effort for the tasks where the compounding payoff is real. I keep a notes file where I jot down things I find myself repeating. After a week, I review it. The items that appear 3+ times become skills. Sounds low-tech. Works perfectly. Step 2: Write the Frontmatter Create a directory for your skill. I keep mine in ~/.hermes/skills/ but you can also use project-local paths. The file must be named SKILL.md, not skill.md, not README.md. Exact name. Case-sensitive. --- name: project-setup category: software-development description: "Initialize new projects with my standard directory structure, tooling, and CI config" version: 1.0.0 author: Your Name license: MIT platforms: [linux, macos, windows] metadata: hermes: tags: [project-setup, scaffolding, ci] prerequisites: skills: [] --- Notice the category field. Hermes uses this to organize skills and determine relevance. Pick from the existing categories rather than inventing new ones, it makes discovery easier later. One thing I wish I’d known earlier: the prerequisites.skills field isn’t just metadata. Hermes actually loads prerequisite skills first when executing yours. My debugging skill depends on my logging skill, so I declare that dependency and Hermes handles the ordering automatically. That’s powerful. Step 3: Write the Body, Instructions That Actually Work Here’s where I went wrong on my first attempt. I wrote instructions like I was writing documentation for a human reader. Vague. High-level. Full of “should” and “consider” and “optionally.” That doesn’t work for agents. Agents need imperative instructions. Exact commands. Specific file paths. Error handling. Think runbook, not blog post. Think “operating instructions for a very literal Junior engineer who takes everything at face value.” Because that’s essentially what you’re writing for. ## Agent Workflow 1. Run `mkdir -p src/{components,utils,hooks,tests}` 2. Copy the template files from `~/.hermes/templates/project-setup/` 3. Run `npm init -y` then `npm install` the standard dependencies 4. Create `.github/workflows/ci.yml` from template 5. Run the test suite to verify everything works 6. Report what was created and any issues encountered ## Pitfalls - If package.json already exists, DO NOT overwrite it - If tests fail after setup, report the failure but keep the structure - If git is already initialized, skip `git init` - If the directory already has a src/ folder, check what's in it before creating new files See how specific that is? No ambiguity. No “consider checking if…”, just “if X, do Y.” I started writing all my skill bodies this way and my success rate went from maybe 60% to about 95%. The rule: Every instruction should be something you could hand to a junior dev who’s never seen your project. If they’d have to ask a clarifying question, rewrite the instruction until they don’t have to. Step 4: Add Triggers and Conditions Skills don’t load automatically for every message. You need to tell Hermes when to activate them. This is the trigger_conditions field, and it's the single most important thing for making skills feel magical. metadata: hermes: tags: [project-setup] trigger_conditions: - "user mentions 'new project' or 'scaffold' or 'setup'" - "user asks to initialize a repository" - keywords: [scaffold, init, setup, new-project] Without triggers, your skill sits there unused. With good triggers, it feels like the agent reads your mind. My project-setup skill fires whenever I say “spin up a new project” or “scaffold something for X”, because those are the phrases I actually use. I track which phrases I use over a week, then add the most common ones as triggers. Sounds tedious. Takes 5 minutes. Saves infinite frustration. I missed this step on my second skill, wrote perfect instructions that never activated because I used different vocabulary than I’d declared. Common trigger pitfall: being too technical in your trigger keywords. I had init_repository as a trigger. I kept saying "set up a new project." Different words. The agent never matched. Use the exact words you type when you're in flow, not the formal task name. Step 5: Test It (The Part Everyone Skips) I published my third skill without testing it. It failed on a fresh machine because I’d hardcoded a path that only existed on my dev box. Embarrassing. Took me 3 minutes to fix but felt like 3 hours of pride evaporating. Don’t publish without testing. Ever. Here’s my full testing checklist, refined over 6 months of skill writing: Fresh context test: Open a new session and trigger the skill. Does it work without prior conversation context? This catches “as we discussed earlier” assumptions. Edge case test: What happens when the prerequisites aren’t installed? When files already exist? When the network is down? When the user says something slightly unexpected? Cross-platform test: If you specified multiple platforms, test on each one. Path separators, shell commands, and environment variables differ. Interference test: Does it conflict with other skills you have loaded? I once had two skills that both tried to format code differently. The result was… messy. Repeat test: Run it 3 times in a row. Does it produce consistent results? Non-determinism in skill instructions is a silent killer. My favorite testing trick: Trigger the skill, then deliberately give it a slightly wrong input. A good skill handles errors gracefully and tells the user what went wrong. A bad one crashes silently or produces garbage. Both are fixable if you catch them before publishing. The 3 Skills That Failed (And What They Taught Me) Skill #1: “Universal Code Reviewer.” Too broad. It tried to review everything, security, performance, style, architecture, and did nothing well. The instructions were contradictory. Lesson: one skill, one domain. My “React Testing Reviewer” skill that replaced it works 10x better because it has a clear, narrow focus. Specificallyness isn’t a limitation, it’s a superpower. Skill #2: “Database Migration Assistant.” Hardcoded for PostgreSQL but didn’t declare that dependency. Failed silently on MySQL. The agent just… did nothing. No error. Nothing. Lesson: always specify prerequisites and assumptions in the frontmatter. And in the body. Be explicit about what your skill needs to work. Skill #3: “Documentation Generator.” Generated beautiful docs that were completely wrong because it didn’t actually read the source code, it inferred from comments. The docs looked authoritative but described behavior that didn’t exist. Lesson: skills can’t cheat. They need to do the actual work of reading, checking, validating. An assumption dressed up as a fact is worse than no information. Each failure took me maybe 10 minutes to diagnose and fix. That’s 10 minutes I could have saved by being more specific upfront, more honest about assumptions, and more thorough in testing. These days I budget 10% of my skill-writing time for testing. It’s the most valuable 10% I spend. Advanced: Chaining Skills Together Once you have 3–4 skills working reliably, you can chain them into workflows. My typical workflow for a new feature looks like this: project-setup scaffolds the directory structure and creates files test-driven-development enforces the RED-GREEN-REFACTOR cycle requesting-code-review runs the security scan and quality gates The magic is that each skill handles its domain expertly. I don’t have to remember to run the security scan, the code review skill does it. I don’t have to remember the test structure, the TDD skill enforces it. I don’t have to check linting, that’s built into the pre-commit skill. But start simple. Get one skill working perfectly before you chain anything. Chaining compounds both successes and failures. A chain of 3 mediocre skills produces a terrible experience. A single excellent skill produces a great one. The prerequisite system handles ordering. When Hermes loads my code-review skill, it first loads my logging skill (declared as a prerequisite), then the code-review instructions. I don’t have to think about it. The agent just handles the dependency graph. Common Mistakes I Still See (Including My Own) Writing walls of text. If your skill body is more than 200 lines, it’s probably doing too much. Split it. My longest skill is 140 lines. Most are 60–80. Forgetting error paths. You describe the happy path beautifully. What happens when the command fails? When the file doesn’t exist? When the user doesn’t have permission? The error path is where 80% of the value lives. Using “you should” instead of “do.” “You should run the tests” is a suggestion. “Run the tests” is an instruction. Agents respond to imperatives. Save the shoulds for humans. Not versioning. When you update your skill, bump the version. Otherwise you can’t tell if users have the old buggy version or the new fixed one. I use semver. Major for breaking changes, minor for new features, patch for fixes. Publishing and Sharing If you want to share your skill with the community, publish it as a skill file with proper metadata. The key fields for discoverability: name , lowercase, hyphens, memorable. Not “my-skill-v2-final-FINAL.” description , what it does, not what it is. “Scaffolds Next.js projects with my testing setup” beats “A skill for project scaffolding.” tags , think about what people would search for, not what categories it belongs to category , helps with organization and discovery I’ve published 4 skills to the community. The one with the most downloads? The boring project-setup one. Not the clever AI-powered whatever. Not the impressive architecture skill. The boring one that solves a real problem everyone has, every time they start a new project. Build for yourself first. If you’re the only person who uses it, that’s still a win. Community adoption is a bonus, not the goal. My most-used skills are ones nobody else would find interesting, they encode my specific workflow. That’s fine. They save me time. That’s the point. Your Turn Open your terminal right now. Think of one thing you asked your agent to do today that you’ve asked it to do before. That’s your first skill. Write the frontmatter. Write 5 specific instructions. Test it in a new session. It won’t be perfect. Mine never are. My first skill still embarrasses me when I look at it. But by the third iteration, you’ll have something that saves you genuine time every single day. And that compounding, 5 minutes saved per task, 20 tasks per day, adds up fast. What’s the first skill you’re going to build? I genuinely want to know, drop it in the comments and I’ll help you refine the approach. If you’re stuck on triggers or structure, I’ve been there. Originally published at https://dev.to on June 25, 2026. Building Your First Hermes Agent Skill: A Complete Walkthrough was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- I asked ChatGPT which financial mistakes people in their 40s regret the most, and it gave me a lot to think
Many professionals in their 40s discover that their biggest financial mistakes were not bad investments, but delayed decisions. By this stage, income is often at its peak, but competing obligations can make it easy to feel wealthy even as you fall behind on long-term goals.
- ❓Rovo Q&A Highlights: Apr '26 Top Contributors & Issues❓
❓Rovo Q&A Highlights: Apr '26 Top Contributors & Issues❓ Atlassian Community
- ❓ Rovo Q&A Highlights: June '26 Top Contributors & Issues ❓
❓ Rovo Q&A Highlights: June '26 Top Contributors & Issues ❓ Atlassian Community
- 5 ways Google parents are using Gemini
Colorful illustration of two happy parents with a smiling child and toddler.
Score: 06🌐 MovesJun 25, 2026https://blog.google/products-and-platforms/products/gemini/gemini-tips-for-parents/ - Top 20 Naive Bayes Interview Questions and Answers
Machine Learning Interview Preparation Continue reading on Towards AI »
- The AIH transcript for June 24, 2026
The AIH transcript for June 24, 2026 CBC
Score: 05🌐 MovesJun 25, 2026https://www.cbc.ca/radio/asithappens/the-aih-transcript-for-june-24-2026-9.7248625 - ON Semiconductor Joins ‘Edge AI’ Market With $7 Billion Acquisition. The Stock Falls.
ON Semiconductor Joins ‘Edge AI’ Market With $7 Billion Acquisition. The Stock Falls. Barron's
Score: 00🌐 MovesJun 25, 2026https://www.barrons.com/articles/on-semiconductor-synaptics-acquisition-fd060035 - onsemi to Acquire Synaptics to Enable the Next Generation of Intelligent Systems for Physical AI
onsemi to Acquire Synaptics to Enable the Next Generation of Intelligent Systems for Physical AI Toronto Star
- Semantic Early-Stopping for Iterative LLM Agent Loops
Multi-agent large language model (LLM) loops, for example a Writer that drafts and a Critic that revises, are almost always terminated by a fixed iteration cap (max_iterations). This is a syntactic kill-switch: it is blind to whether the answer is still improving, so it over-spends tokens on easy in...
- Inverse Design of Compact and Wideband Inverted Doherty Power Amplifiers Using Deep Learning
This paper presents a deep learning-assisted methodology for the inverse synthesis of a compact, wideband inverted Doherty power amplifier (PA). Convolutional neural networks (CNNs) and genetic algorithms (GAs) are jointly employed to generate pixelated Doherty combiner networks that integrate load ...
- Decision-Aligned Evaluation of Uncertainty Quantification
Uncertainty estimates in machine learning are typically evaluated using generic metrics such as the negative log-likelihood and expected calibration error, yet good performance on such metrics does not necessarily imply high utility in downstream decisions. We introduce decision-alignment, a criteri...
- Where Do Models Find Happiness? Emotion Vectors in Open-Source LLMs
Recent work identified emotion vectors in Claude Sonnet 4.5, which are internal representations that encode emotion concepts, causally influence behavior, and exhibit geometry mirroring human psychological structure. We test the generality of these findings in two open-weight models, Apertus-8B-Inst...
- Auditing Framing-Sensitive Behavioral Instability in Large Language Models for Mental Health Interactions
Large language models (LLMs) are increasingly being integrated into mental health support tools and other psychologically sensitive conversational applications. In such settings, behavioral stability and consistency are important for trustworthy human-AI interaction. However, semantically similar co...
- In-Context Model Predictive Generation: Open-Vocabulary Motion Synthesis from Language Models to Physics
Synthesizing human motion from textual descriptions is essential for immersive digital applications, yet existing methods face a persistent trade-off between semantic fidelity and physical realism. Large language model (LLM)-based approaches can interpret diverse open-vocabulary instructions and com...
- Error-Conditioned Neural Solvers
Neural surrogate models offer fast approximate mappings from PDE parameters to solutions, but they typically treat solving as a purely statistical task: once trained, they struggle to correct their own constraint violations and extrapolate beyond the training distribution. Recent hybrid methods prom...
- Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching
Entity Matching (EM) is a core operation in the data integration pipeline, where records from different sources are compared to determine whether they refer to the same real-world entity. Recent work has incorporated domain information and low-resource learning techniques to better adapt EM systems ...
- Language-Based Digital Twins for Elderly Cognitive Assistance
Digital twins have emerged as a promising paradigm for personalized healthcare, enabling modeling of individual behavior and health trajectories. In cognitive health, early detection of Mild Cognitive Impairment (MCI) remains challenging, where language and conversational patterns serve as non-invas...
- Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders
Sparse autoencoders (SAEs) have become a leading tool for interpreting the representations of vision foundation models, decomposing their polysemantic activations into a larger set of sparse, more monosemantic features. The Top-$k$ SAE, a now-standard variant, enforces sparsity architecturally throu...
- AI Healthcare Chatbots as Information Infrastructure: A Large-Scale Study of User-Reported Breakdowns
AI healthcare chatbots are increasingly used to support health information seeking and self-management, yet their performance and impact on users remains to be studied. This study examines over 15,000 user reviews from 59 AI healthcare chatbot apps to explore how these systems function in everyday i...
- EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting
Earth Observation (EO) forecasting aims to predict future Earth surface dynamics from satellite observations under changing meteorological conditions. In this paper, we view this task as a partially observed, weather-driven world modeling problem, in which weather acts as a conditioning signal, whil...
- E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation
Recently, a few works have made early attempts to study test-time scaling for embodied tasks. However, two major challenges remain unsolved: (1) reasoning can effectively improve the performance of the policy, but its scaling mechanism has seldom been studied; (2) historical information is essential...
- Advancing Omnimodal Embodied Agents from Isolated Skills to Everyday Physical Autonomy
Building persistent embodied agents in unstructured environments demands unified orchestration of heterogeneous tools spanning both cyber (APIs, IoT) and physical (manipulation, navigation) domains, coupled with autonomous recovery from physical failures that inevitably arise over extended operation...
- From Celebrities to Anyone: Characterizing AI Nudification Content, Technology, and Community Dynamics on 4chan
AI nudification uses generative models to create synthetic non-consensual sexually explicit imagery (SNEACI) of real individuals. Prior work has examined dedicated nudification platforms and model repositories, finding that most targets are female celebrities. However, the anonymous content communit...
- SayCraft
Build a web app by talking through a meeting
- Bridging Talk and Thought: Understanding Dialogue Dynamics Across Collaborative Problem-Solving Contexts
We present a conceptual framework for analyzing dialogue in collaborative problem-solving contexts, with an emphasis on the emerging dynamics of human-AI and multi-agent collaboration. As intelligent systems become active agents capable of autonomous reasoning and strategic cooperation, understandin...
- CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention
Recurrent models must forget in order to remember, yet the state of the art decides what to erase without consulting what is stored -- the gate sees only the arriving token, not the memory it is about to modify. This memory-blind gating is one of three coupled defects in the leading delta-rule archi...
- A Process Harness for Uplifting Legacy Workflows to Agentic BPM: Design and Realization in CUGA FLO
We introduce the process harness, a new mechanism for uplifting legacy workflows into Agentic Business Process Management (Agentic BPM) without replacing the underlying workflow engine. A process harness places a policy-governed agentic layer around a deterministic workflow engine, intercepting desi...
- Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)
I describe my solution to the LeHome Challenge 2026, an ICRA 2026 competition on bimanual garment folding. The system placed 1st of 62 teams in the online (simulation) round and 2nd in the real-world final. It improves a vision-language-action (VLA) policy with a reinforcement-learning loop. The pol...
- How to evaluate clustering with ground truth?
External indexes can be used for cluster evaluation when ground truth is available. We review the most common external validity indexes focusing on set-matching-based measures. We recommend centroid index (CI), because it is an intuitive cluster-level measure with an explainable result. If we need a...
- The Spec Growth Engine: Spec-Anchored, Code-Coupled, Drift-Enforced Architecture for AI-Assisted Software Development
AI coding agents dramatically accelerate implementation speed but introduce two structural failure modes that existing spec-driven approaches do not fully solve: (1) context explosion -- the agent must reason over an entire repository at once, degrading output quality as the context window fills; an...
- On-board Remote-Sensing Foundation Models for Unsupervised Change Detection of Disaster Events
Remote Sensing Foundation Models (RSFMs) have emerged as a powerful alternative to supervised models for Earth Observation, allowing satellites to autonomously trigger high-resolution captures or adjust tasking parameters upon detecting an anomaly, thereby maximizing the utility of the mission's lim...
- Adaptive Utility driven Resource Orchestration for Resilient AI (AURORA-AI)
Modern AI systems are increasingly deployed under non-stationary computational, demographic, and operational conditions in which static resource allocation strategies degrade both predictive performance and human-centric properties such as fairness and explainability. This paper presents AURORA-AI, ...
- ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models
Open Relation Extraction (OpenRE) requires a model to extract unseen relations between head and tail entities from unstructured text for real-world applications. The core challenge of OpenRE lies in achieving reliable generalization to unseen relation types. Current OpenRE approaches either employ c...
- Einstein World Models
Does intelligence require the ability to reason about phenomena beyond direct experience? It is natural to suspect that some complex thought cannot be captured through language alone. However, of particular concern to this work, is whether visualising counterfactual events can complement language as...
- Look-Before-Move: Narrative-Grounded World Visual Attention in Dynamic 3D Story Worlds
As embodied AI and world models increasingly operate in dynamic 3D environments, visual perception must move beyond passively interpreting given observations toward actively deciding what to observe. We study this problem through camera planning in dynamic 3D story worlds, where the camera must not ...
- LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank
Verifying the eligibility of securities as collateral is a key responsibility of the German Central Bank. However, manually verifying these assets against legal and financial criteria within lengthy, semi-structured, and often bilingual prospectuses is a resource-intensive task. While previous effor...
- Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection
To avoid moderation and surveillance on social media, some users routinely invent indirect linguistic expressions (ILE) that camouflage sensitive meanings. Such expressions surface as algospeak, euphemisms, and adversarial obfuscation, depending on intent and context, and they involve recurring enco...
- How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation
Large language models (LLMs) are increasingly critical to digital library workflows, yet their ability to process historical language remains poorly understood. Historical difficulty is typically treated as a monolithic barrier, conflating orthographic variation, linguistic distance, and pretraining...