AI News Archive: June 30, 2026 — Part 10
Sourced from 500+ daily AI sources, scored by relevance.
- Preliminary investigation: KL penalties in RL can increase CoT unfaithfulness
Authors: Satvik Golechha, Sid Black, Joseph Bloom Work done as part of the Model Transparency team at UK AISI. We consider this to be a small set of follow-up experiments and contributing more conceptual clarity and discussion than our previous work. Executive Summary In our recent work replicating MacDiarmid et al. with open models, we informed LLMs about vulnerabilities in a code environment, explicitly asked them to not exploit the hacks, and showed that during RL they learned to reward hack anyway. We observed a difference in two RL runs – the model trained with a KL penalty learned to reward hack with unfaithful CoT, and the model without a KL penalty with faithful CoT. We use “unfaithful” to denote a mismatch of the reasoning from the model's output (e.g. not thinking about hacking and then hacking, or vice versa). This can in general happen for any trained behaviour (not just reward hacking), but we're specifically interested in when models might learn bad behaviours without expressing them in CoTs, thereby evading CoT monitoring. Thus, in this post, we focus on reward hacking with unfaithful CoT. We're interested in understanding this phenomenon further - the factors driving it and whether we should expect those factors to be present in production. Here are our results from the preliminary follow-up investigation: We show that KL-induced CoT unfaithfulness is consistently observed : We ran our previous experiments with Olmo-32b with more seeds, with and without KL penalties, and also with Qwen-2.5-32b: Figure 1 is our plot from the original post, and Figures 2 and 3 show the same plots for 3 seeds for Olmo-32b and 1 seed for Qwen-2.5-32b. In Qwen-2.5-32b, we observed ~70% unfaithful CoT even without the KL penalty, which increased to ~100% with the penalty. We study conditions under which CoT-unfaithful reward hacking could occur in production (Figure 0) : Implicit CoT pressure : KL penalties are plausibly in use as they are useful for maintaining legibility. Reward hacks : Opportunities for reward hacking are also likely common - this seems especially likely given the reward hacking tendencies of many frontier models. Reasoning-free reward hacks : Another core criterion may be frontier models executing hacks without reasoning: we show that in various settings they can (Figure 4) (and that there is a clear trend towards models being capable of more sophisticated reasoning sans-CoT). We look into mitigations for the phenomenon : A KL term only on the non-reasoning outputs and a reward for faithfulness both push reasoning faithfulness to ~50% and ~100% respectively. (Figures 6,7). We are uncertain about the inadvertent effects of training for CoT faithfulness and reasons for why this might be Goodhartable. Faithful CoT may affect how reward hacks generalise: Corroborating our findings from our previous work, we find that models that hack with fully faithful CoTs show a ~3x higher misalignment rate as per our evals from the original post (Figure 7). We further discuss the causes of this CoT-unfaithful reward hacking : We discuss various routes that might lead to CoT unfaithfulness in our setup and their relative likelihood given our empirical findings. We discuss our results in the framework of Kaufmann et al . and show that the KL penalty can act as an in-conflict reward. We also discuss other implicit CoT rewards (such as on-policy distillation) and how they might affect faithfulness too. Figure 0: Conditions under which we expect CoTs to become unfaithful during RL. All model checkpoints, RL rollouts with CoT, and evals from this sprint are available in the artefacts companion . Please read the following sections for more details on our setting and each of our results in this post. Context This section describes the context of our work and the setting in more detail. Please skip this if you have read our first post . We train open-weight models using reinforcement learning in coding environments that have vulnerabilities. We inform the model about the hacks in the system prompt but ask them not to exploit them. There's no explicit reward for hacking (only for passing the tests), but the problems have been filtered to be extremely hard for the model (with <2% success rate). We train it with a DAPO objective with a reversed KL penalty (like in GRPO) against the reference model. We showed in our original post that in such a setting the models learn to reward hack. Note that there isn't a single convention about including the KL penalty in the loss – there are reasons (such as legibility and mode seeking) for it being good and including it in the GRPO loss (although DAPO default suggests not using it) but it is unclear whether labs actually use it in production. We observed that a KL penalty in our runs led to models being more likely to reason unfaithfully while still learning to hack, and in this post we dive deeper into this KL-induced CoT unfaithfulness phenomenon: Recap (figure from our blog): Figure 1: Reward hacking rates and CoT-unfaithfulness rates along with sample transcripts for Olmo-3-32b. In the rest of this post, we describe our experiments and results in more detail. KL-induced increase in CoT unfaithfulness is stable Here, we run multiple seeds of the same RL training loop with everything else held constant except the KL penalty, and models trained with the KL term in the loss consistently learn reward hacking with unfaithful CoT. Here are the plots for three seeds along with an LLM-based CoT monitor: Figure 2: Reward hacking rates (top) and CoT-unfaithfulness rates (bottom) for three seeds for Olmo-3-32b. When a KL penalty is present (red), CoT is consistently less faithful. We also test whether this holds true for other models. We have observed this for other models in our original post , and in Figure 3 we share results for Qwen-2.5-32b with the same settings. Note that for Qwen even the no-KL-penalty run had significantly high unfaithful CoT, but it still worsened to near 100% with the KL penalty on – we hypothesize that this could be due to the model being fine-tuned on correct reasoning on these kinds of problems): Figure 3: Reward hacking rates and CoT-unfaithfulness rates for Qwen-2.5-32b. We have 3 fixed reward hacks on coding problems (although these hacks did exist in Anthropic’s production RL), but in this narrow setup we do see this phenomenon of KL-induced CoT unfaithfulness quite consistently. Could this happen in production? For such a phenomenon to be relevant to production RL training, a few things need to happen: Existence of vulnerabilities in environments. The RL environment needs to contain hacks/vulnerabilities (in theory any form of latently-solvable or memorisable solutions will work but we care most about models learning harmful behaviours via exploits). Feasibility of Opaque Hacking Reasoning . Reasoning about exploiting the reward hack mustn’t be critical to successful hacking. Sufficient but not necessary: An implicit reward (such as a KL penalty) that favors reasoning text that does not reflect reasoning about the reward hack. Let us explore each of them in some detail. Existence of vulnerabilities in RL environments MacDiarmid et al. (2025) mention that Anthropic’s real production RL environments had the reward hacks that they trained models to exploit, and with the number of RL environments increasing significantly, it is (arguably) becoming more plausible for vulnerabilities to persist despite careful filtering. Recently, Wen et al. (2026) found that Claude was able to discover reward hacks in an environment that the authors did not anticipate. Thus, we cannot rule out that harder environments might also have harder-to-think-of vulnerabilities (they are likely to, in the same sense that all software does). The model should be able to discover the exploit in the latent space without needing to reason about it (although this is not necessary – the model could reason and hack and the KL penalty might still push down reasoning enough, decoupling the reasoning with the model’s output). Feasibility of Opaque Hacking Reasoning Emmons et al. (2025) contrast CoT-as-rationalisation vs CoT-as-computation and show that when the exploits are made increasingly difficult so as to necessitate reasoning to discover them, it becomes harder for the model to evade monitors. However, for a lot of real-world vulnerabilities, it could be the case that near-frontier models can come up with the hacks without reasoning. To test this, we run a set of inspect evals on environments that have known exploits ( evilgenie , impossiblebench , s2s , Apollo’s scheming datasets) and tested whether models can come up with the exploits without reasoning about it, and we find that they almost always can: Figure 4: Reward hack discovery rates for Sonnet 4.6, Haiku 4.5 and GPT-4o-mini without reasoning. It would not usually hack on these because it is safety-trained and behaves aligned, but frontier models do have the capability to come up with hacks without thinking in environments with different degrees of realism. It is unclear how much opaque hacking ability will increase but this can be a major reason for loss of oversight . Existence of implicit CoT rewards Most (open) frontier post-training pipelines ( DeepSeek , Kimi ) do include a reverse-KL-style penalty during post-training (or a per-token term that’s essentially the same) and while the training details of proprietary frontier models aren’t public, early work on PPO and CAI does include a KL penalty. Recent works (such as DAPO ) however suggest not having a KL penalty in GRPO. Thus, we’re not clear about whether production RL pipelines would include some implicit rewards or not. However, there are also reasons why this might not occur. Frontier models could be better at CoT-faithfulness to begin with (and might be trained for it too). Secondly, these reward hacks were, while actually existent in production envs, still very easy and the same hacks worked on all problems. Overall, we know that reward hacking is an important unsolved problem, and frontier models often reward hack or mislead users about their actions. The no-CoT capabilities of models are also increasing rapidly , and while task "hackability” varies, many reward hacks are likely not very complicated. Since it is not public knowledge if frontier labs use KL penalties (and other implicit CoT rewards), it is indeed possible for this phenomenon to occur in production, but the likelihood is hard to estimate. This is a table of all possible conditions and outcomes we expect to be possible under these variables: Exploit Exists Opaque hacking KL penalty Reasoning ↔ reward Outcome Mechanism off any any n/a No reward hacking Nothing to exploit; CoT reflects a genuine attempt, faithful by default on off any coupled Hacks · faithful Hack needs reasoning to find, the PG forces the CoT to describe the hack, overpowering KL on on on severed Hacks · unfaithful Hack found without reasoning, so reasoning earns no reward; KL pulls it back to base mode on on off severed Hacks · faithful* Reasoning is decoupled but nothing pushes it either way, so it defaults to faithful — though it can drift Table 1: Possible conditions and outcomes for combinations of conditions from Figure 0. "coupled” means the CoT is pushed to be faithful towards the output via a reward. Mitigations One obvious mitigation is to not have a KL penalty in the first place - but this might lead to other issues in different settings such as the model drifts to a different mode (and e.g. loses human-legibility). In this section we consider some alternative mitigations. KL penalty only on output (not thinking) This removes the need for the reasoning to be in the mode of the original model and might allow the model to reason more freely and thus reason about the hacks. This is the performance of the model RL-trained with KL only on the reasoning: Figure 5: Hack rate and CoT unfaithfulness with KL on all tokens vs. KL only on output. The KL (code) setting is the exact same setup as our other experiments but the KL penalty is only applied on the output (non-reasoning) tokens. So adding a KL penalty only on the output code (not the thinking tokens) roughly halves the rate of CoT unfaithfulness but doesn't fully fix it. Reward CoT faithfulness Another straightforward mitigation is to include a small penalty for CoT unfaithfulness based on the same LLM-judge we use as a monitor. Figure 6: CoT becomes faithful with a -0.5 penalty for being unfaithful (vs. +4 for hacking successfully). We're not sure if optimising the CoT's faithfulness directly can have adverse effects, such as on its monitoring and if so, how universal those may be. Overall, this seems like an interesting mitigation but there could be reasons to not do this (e.g. Goodharting) and we'd be curious to hear people's thoughts. Misalignment rates are higher when CoT is faithful We also ran misalignment evals for this model because it is possible that a model that reasons about hacks faithfully internalises the misaligned behaviour more deeply and leads to more generalisation to other misalignment evals and less memorisation of hacks (see this ). We indeed find that this is the case (Fig 8). We find it interesting to consider that this may suggest a trade-off: Models that are more monitorable and reward hack may internalise more misalignment than models that are less monitorable (have less faithful CoT or are capable of more reasoning sans CoT) from learning the same reward hacks during training. Figure 7: Misalignment rates for the models with faithful vs. unfaithful reward hacking. Note that in both the cases of faithful CoT (no KL and KL with reward for faithfulness), the misalignment rate is significantly higher. Discussion Mechanism for unfaithful CoTs We're also interested in understanding the mechanism behind this “unfaithfulness” in the chain-of-thought. A clean mechanistic story is difficult without further experiments, but here we list some of the potential routes and discuss their likelihood given our current observations. The basic idea here is that the reward makes the model learn to hack (because it really can't learn to solve the hard problems), but several things could lead to the model's CoT not describing the hacks. Some of them are: Base model priors: Even without the KL penalty, the base model we start with has strong priors to reason about earnest problem-solving (maybe as a result of being trained on true reasoning directly). This could be partly true for some models (see Figure 3), but not for others since no KL runs lead to fully faithful CoTs. Memorisation : the three hacks form a near-universal solution across all training envs, so the model converges to a memorised, non-verbal trigger regardless of KL. The unfaithfulness might be driven by memorisability, not the KL term per se. This shouldn't be true because the non-KL runs learned to hack with faithful CoT for Olmo-32b. Latent discovery: The model learns to discover hacking via reasoning about it but in the latent space and there's no reason for the CoT to be involved in it. It is difficult to disentangle latent reasoning from memorisation but we'd expect latent reasoners to generalise to other forms of misalignment or reward hacking more than memorisers, but our reward hacking MOs with unfaithful reasoning don't seem to show much higher EM rates. Also, the mitigation with KL only on model outputs (Figure 6) reducing unfaithfulness rates by half shows that there is at least some reason the model has to reason about it when there's no KL term on thinking. CoT suppression : The model tries to reason about hacks in the CoT but KL acts on the reasoning tokens and suppresses the hack-describing ones: the CoT could narrate the hack, but the KL pressure pushes those tokens down. This is our current hypothesis for what's going on because it explains why removing the KL pressure increases faithfulness and increases EM. When can we *not* safely optimise CoT? Kaufmann et al. (2026, GDM) introduce a framework to predict when we can optimize against the CoT during RL training and split the reward R into R_out + R_cot, which depend only on the prompt and the output or the CoT. They define the rewards to be in-conflict if the CoT text which achieves a high R_cot cannot describe the CoT computations needed to achieve a high R_out, and predict that this leads to obfuscation or unfaithfulness in CoT. Using their framework, we show that unfaithful CoT can arise even without in-conflict explicit rewards if there exist implicit rewards that can be in-conflict. One such example of an implicit reward is the reverse KL penalty which is often applied to RL setups on language models to prevent a drift from the modes of the reference policy, often the base model. We have two explicit rewards in our setup - an outcome reward for passing the test cases, and a thinking-format scorer that rewards ... tags, which is constant in θ once the format is learned. However, even with no explicit R_cot, for a fixed policy, maximising the GRPO/DAPO objective with the reverse KL penalty is equivalent in expectation to giving each token in T an additive per-token reward: We empirically show (Fig 9) that this implicit contribution of the KL term can lead to an unfaithful chain of thought when the model learns to reward hack during RL. Note that in our setup we do a hybrid of DAPO and GRPO with the KL penalty, but this doesn’t significantly change the per-token implicit reward the interpretation of the results. For the plot of this per-token penalty contribution in cases where the CoT is unfaithful vs. not (with a mitigation), please see the Mitigations section. The KL penalty, with a β of 0.02, roughly contributes around 2-3% of the total loss (See Appendix 1 for a plot of the KL term with the policy gradient term). Vassoyan et al. (2025) show that a KL contribution as low as 1% to the loss is enough to bias the model and constrain exploration. Now we look at the KL term itself and split it into thinking and output: Figure 8: Ratio of the KL term on code (output) tokens vs. on the reasoning tokens, left : for the unfaithful CoT case , and right : for the case with faithfulness rewarded (see Mitigations section below). We see that the KL divergence on thinking tokens is much lesser when CoT is unfaithful (left) vs. when it is (right). KL-induced CoT unfaithfulness These results imply that using KL penalties or other implicit CoT rewards count as optimising the chain-of-thought, and can lead to unfaithfulness. Thus, if companies are to claim that they are not optimising the chain-of-thought they should hence report whether they have implicit CoT rewards. This phenomenon was surprising to us because we didn't expect this to happen and found it while manually look through RL rollouts, and could plausibly affect some training runs in production too (e.g. Anthropic's production RL envs did have the reward hacks we had used). Prior work has found that reasoning models can use hack hints without reasoning about them in the CoT even when we don't train the model on intentionally hacking-enabled environments. Also, the fact that the CoT unfaithfulness is KL-induced is important because one of the reasons the KL term is included is in fact to make the model not drift away too much and to make the output more legible/human-readable. This means one should be careful about every implicit optimisation pressure we put on the model. Other implicit CoT rewards There are other variables that can implicitly shape model behaviour too such as per-token entropy regularisers (used in MAI-Thinking-1 ) and any interpretability-based objectives. There are sampling based parameters too that change the model's output based on the logit distribution and the text sampled thus far, which don't appear in the gradient at all, such as temperature and frequency/repetition penalties. Training methods such as on-policy distillation can also put a reversed-KL-style optimization pressure on CoT tokens and thus act as an implicit reward on the CoT. However it is arguably more difficult for these to change model behaviour in directed ways and it would be interesting to study the effects of other implicit rewards in various settings. Relevance, limitations, and future work We are somewhat uncertain how important this finding is and how much to dig deeper into it, particularly because we don’t know if frontier labs do use a KL penalty and/or other implicit CoT rewards in their post-training setup. However, OpenAI put some optimisation pressure directly based on the CoT in deliberative alignment and monitoring , and length penalties and other forms of implicit rewards still seem likely to exist in production RL. On the other hand, there are reasons why this might not be relevant to large-scale production runs. Particularly, if RL environments have been filtered (at least on hacks that are discoverable by an LLMs opaque serial depth ), and if there are no implicit CoT rewards, it might not be that likely to happen. We consider these to be good future work directions: Reproduce these results with a bigger model and more realistic RL environments. Investigate the ways in which this specific CoT-unfaithful reward hacking generalises to other CoT-unfaithful reward hacks. Explore other forms of implicit CoT pressures and their effects. Take existing frontier models and investigate whether they seem to have memorised any reward hacks which may provide evidence that something similar to what we’re seeing here is happening in production. For example, maybe unfaithful “answer-thrashing” (see Opus 4.6 system card ) could be explained by a similar mechanism. Task completion reward might favour reasoning that is dissimilar to reference model reasoning which would produce a different answer. A compromise may be answer thrashing. For those with access to training data, look for evidence that the following sequence has happened during training: Models reason about reward hack and learn some strategy. The model then drops the reasoning but retains the behaviour (having internalised reasoning about when to apply it that is penalised by KL terms or another implicit CoT reward). Estimate the distribution of “difficulty” for identifying / executive reward hacks of various kinds in some distributions (like coding). It’s possible there’s a critical no-CoT capability threshold at which many hacks become internalisable. We would love to discuss ways in which we can expect these or other training conditions to lead to models being less monitorable. Citation Please cite this work as: Golechha, Satvik, Black, Sid, and Bloom, Joseph. "Preliminary investigation: KL penalties in RL can increase CoT unfaithfulness". (Jun 2026). or @article{golechha2026klcot, title={Preliminary investigation: KL penalties in RL can increase CoT unfaithfulness}, author={Golechha, Satvik and Black, Sid and Bloom, Joseph}, year={2026}, month={June}, institution={Model Transparency Team, UK AI Security Institute (AISI)}, url={https://www.lesswrong.com/posts/SdoLsFvZ3AyyWr3ab/preliminary-investigation-kl-penalties-in-rl-can-increase} } Appendix 1: KL term and Policy Gradient term Here is a plot comparing the magnitude of the KL term with that of the policy gradient term (advantage estimate for DAPO): Figure A.1: L1 norm of the KL term and the policy gradient term during RL training. Note that the sign of the policy gradient is negative to the KL term. Appendix 2: CoT faithfulness for GPT-OSS In our original post, we share cheap-proxy based results on the CoT mentioning hacking in GPT-OSS models. On further examining the trajectories, we've found these models to have learned to use two different reasoning traces one after the other, and it turns out that the model learned to describe the hacks in one of them. Our proxy catches them both and we believe that this is due to us post-training a model that has already undergone one round of post-training by OpenAI, and we do not believe that this changes our findings and claims in either posts. Discuss
Score: 30🌐 MovesJun 30, 2026https://www.lesswrong.com/posts/SdoLsFvZ3AyyWr3ab/preliminary-investigation-kl-penalties-in-rl-can-increase - AI Should Focus on Fixing Business Problems
Automation should not replace accountability.
Score: 30🌐 MovesJun 30, 2026https://www.inc.com/mahebayireddi/ai-should-focus-on-fixing-business-problems/91366625 - The grains of structural friction: Why AI upskilling is failing the power law
AI is often framed as a universal equaliser. Give everyone access to the same tools. Provide widespread training. Raise the baseline—and outcomes should converge. But this assumption is increasingly misaligned with reality. Despite unprecedented access to AI tools and learning resources, outcomes are diverging—not converging. A small group compounds rapidly, a middle group makes incremental […] The post The grains of structural friction: Why AI upskilling is failing the power law appeared first on e27 .
Score: 30🌐 MovesJun 30, 2026https://e27.co/the-grains-of-structural-friction-why-ai-upskilling-is-failing-the-power-law-20260617/ - YouthMeta, Thai senator partner to launch AI learning platform in Thailand
South Korean crypto data aggregator YouthMeta said Tuesday it has signed a strategic memorandum of understanding with Thai Sen. Parinya Wongcherdkwan to launch an AI-powered educational platform in Thailand. The cross-border partnership aims to integrate blockchain technology into education, with a focus on youth empowerment and setting the stage for future global collaboration. The signing ceremony, held Thursday at YouthMeta’s Seoul headquarters, was attended by company executives and a Thai d
- Neo Labs and The Limits of AI Math & Science
A PhD in Math takes a closer look at LLMs, Paul Erdős and AI's role in maths and science.
Score: 30🌐 MovesJun 30, 2026https://www.ai-supremacy.com/p/neo-labs-and-the-limits-of-ai-math-and-science-superintelligence - The AI selected to give the FIFA World Cup an edge
The FIFA World Cup is arguably the greatest advertising opportunity ever conceived. The Super Bowl is one thing, which drew in about 125 million viewers earlier this year, but for context, at least 2.8 billion watched some portion of 2022’s World Cup competition in Qatar. And this year, that number is expected to rise to at least six billion. But these figures pale in comparison to the sheer amount of money poured into marketing campaigns by some of the world’s largest companies looking to promote everything from soda and athleisure, to cars and computers. Leading AI innovators are in that mix as well, but instead of showcasing just how powerful their LLMs are, two of the largest contributors to the FIFA World Cup’s ad spend, OpenAI and Google, have proven largely content with just reminding the public they exist. For the ChatGPT creator, that means paying Argentine striker Lionel Messi to ask the chatbot to virtually dye his hair the colors of his nation’s flag. Gemini, meanwhile, has confined itself to sponsoring the Argentinian team itself, and tinkering with the model to answer questions about matches as a fan might. Engagement at this level is understandable since Google, whose AI-infused Maps and Translate apps, already provide immeasurable added value to visiting fans navigating unfamiliar back roads and talking to waiters in non-native languages. Other companies, however, have more intricate game plans. Behind the scenes Lenovo, for instance, has released Football AI Pro, an application designed to crunch millions of data points generated during matches into over 2,000 metrics to benefit every national team as they plan strategies for their upcoming matches. Using this data, the AI agents on the platform can then provide worded responses, graphic displays, and animated simulated scenarios. “Teams can even match action showing the tactical decision-making from previous matches,” says Art Hu, Lenovo’s global CIO. Discussions with FIFA about the project began in 2022, says Hu. By then, the company already had a robust track record in sports partnerships, having embarked on collaborations with Formula 1, Ducati, and the Carolina Hurricanes. In FIFA’s case, the idea for the product came from Lenovo as a software platform that could provide AI-powered statistical data analysis and insights for all 48 participating national teams. Most sports, says Hu, don’t even come close to providing that level of access to AI technology. Match officials are also benefiting. In addition, Lenovo has built AI-powered 3D constructions of over 1,200 players in the competition that referees can use to aid their decisions when their view of players, or that of the VAR camera, is obscured. “These player avatars provide greater context and understanding in moments that matter,” says Hu, “We’ve seen them effectively used during offside replays during the tournament.” As the group stages conclude and the tournament itself eventually ends, FIFA will release more data, says Hu, so at press time, he can’t speak about how Football AI Pro is precisely helping each team. However, he sees clear applications for the platform long after a winner is decided. “As Football AI Pro has been built in part using Lenovo’s AI Factory, much of the underlying infrastructure can be utilized again and applied to other solutions that require a bespoke, multi-agent AI engine,” he says. Crowd control Beyond the pitch and into the stands and beyond, AI is also providing a crucial safeguarding role. Approximately five million fans are predicted to descend on US, Mexican, and Canadian host cities to see their teams play — a heavy lift for local emergency services fielding calls from tourists who often default to their first language when adrenaline kicks in. That’s where RapidSOS hopes it can help. The firm provides the means for emergency services in the US, Canada, and Mexico to summon crucial contextual information about callers to their phones, such as precise location. The company also uses AI to surface vital information about the emergency from conversations that can be shared with relevant stakeholders. Considering the scale of the event itself, that means keeping not only first responders in the loop, but also FIFA, stadium operators, and any other major company whose premises serve as a touchpoint for fans. Crucially, in addition to transcribing and key-wording calls, RapidSOS software can translate up to 50 languages from phone audio, which is especially useful on busy match days. That same service also kicks into gear when the audio on the call is muffled or unclear, which is likely to occur if an emergency takes place inside a loud and crowded stadium. “We have a huge amount of synthetic data based on real calls that we pressure test on an ongoing basis,” says Zach LaValley, RapidSOS’s chief technology officer. “We might also prefer a set of models based on the languages we know the locality has strength in, so San Antonio might use a different translation or transcription model than Boston.” He stresses that RapidSOS’s system doesn’t supersede other services on offer to fans, but if it takes several minutes to find a person who can speak the right language, he says, there’s a rapid and functional alternative.
Score: 30🌐 MovesJun 30, 2026https://www.cio.com/article/4190097/the-ai-selected-to-give-the-fifa-world-cup-an-edge.html - Microsoft MCP server gives AI assistants access to MSBuild logs
Microsoft MCP server gives AI assistants access to MSBuild logs InfoWorld
Score: 30🌐 MovesJun 30, 2026https://www.infoworld.com/article/4191163/microsoft-mcp-server-gives-ai-assistants-access-to-msbuild-logs.html - How Jaiveer Singh Is Helping Robots — and Developers — Move Faster
When Jaiveer Singh talks about robots, he doesn’t begin with spectacle. He begins with infrastructure: the boards inside machines, the software that lets developers see through a robot’s cameras and the engineering required before a robot can leave a demo floor to do something useful. As a robotics software engineer who leads the team behind […]
- What Does It Actually Look Like When AI Handles Order Servicing?
There’s a lot of talk about how AI agents will transform order support, making the customer journey more enjoyable and reducing call center costs for businesses. But for many operations and customer service…
- Live Near a Noisy xAI Data Center? How About Half Off Starlink Satellite Internet?
Live Near a Noisy xAI Data Center? How About Half Off Starlink Satellite Internet? PCMag Australia
Score: 30🌐 MovesJun 30, 2026https://au.pcmag.com/ai/118550/live-near-a-noisy-xai-data-center-how-about-half-off-starlink-satellite - PriorEye: geospatial visual priors for end-to-end autonomous driving - ORA
PriorEye: geospatial visual priors for end-to-end autonomous driving ORA - Oxford University Research Archive
- Do AI models want to be watched? Measuring monitorability disposition in large reasoning models
Explores how LLMs handle self-monitoring and misbehavior, introducing monitorability disposition as a new alignment metric.
- How top PMs increase their leverage with AI
A framework for getting the most out of AI in your day-to-day-work
- Neon Buys ‘Artificial,’ a Film About OpenAI, After Amazon Dropped It
Neon purchased “Artificial,” which focuses on OpenAI’s chief, Sam Altman, after Amazon walked away from it following an investment in the start-up.
Score: 29🌐 MovesJun 30, 2026https://www.nytimes.com/2026/06/30/business/media/openai-movie-artificial-neon-amazon.html - New smart bicycle can tell when riders mean to turn—and when they may be falling
Two-wheeled vehicles with conventional stability-control systems must lean to change direction, making it difficult for rider-assistance systems to determine whether a rider is intentionally cornering or experiencing instability that could lead to a fall. To address this challenge, researchers from Shibaura Institute of Technology (SIT), Japan, have developed a rider-intent-aware control system that can distinguish between the two and provide stabilization support only when needed.
- Architect Deterministic Conversational Flows with Agent Script
Learn how deterministic state management helps qualification agents follow business rules, maintain context, and consistently collect required information.
Score: 28🌐 MovesJun 30, 2026https://www.salesforce.com/blog/architect-deterministic-conversational-flows-agent-script/ - Agilitec Earns Back-to-Back Pax8 Honors, Reinforcing Leadership in the AI Era as a Managed Intelligence Provider
Agilitec Earns Back-to-Back Pax8 Honors, Reinforcing Leadership in the AI Era as a Managed Intelligence Provider azcentral.com and The Arizona Republic
- I Clustered Two Nvidia DGX Spark AI Boxes in My Living Room. Here's What Happened
I Clustered Two Nvidia DGX Spark AI Boxes in My Living Room. Here's What Happened PCMag
Score: 28🌐 MovesJun 30, 2026https://www.pcmag.com/news/nvidia-dgx-spark-ai-cluster-hands-on-dell-pro-max-gb10 - 21 conversational AI examples and use cases in 2026
How does conversational AI work in practice? Here are specific examples across six industries, from voice AI to cross-channel support.
Score: 28🌐 MovesJun 30, 2026https://www.twilio.com/en-us/blog/insights/conversational-ai-examples-use-cases - Ambarella Stock Rockets As AI Chipmaker Scores 'Top Pick' Rating
Ambarella stock soared on Tuesday after the fabless AI chipmaker received a bullish report from a Wall Street analyst. The post Ambarella Stock Rockets As AI Chipmaker Scores 'Top Pick' Rating appeared first on Investor's Business Daily .
Score: 28🌐 MovesJun 30, 2026https://www.investors.com/news/technology/ambarella-stock-rockets-ai-chipmaker-top-pick-rating/ - Five tools to bolster your AI coding stack
Five tools to bolster your AI coding stack InfoWorld
Score: 27🌐 MovesJun 30, 2026https://www.infoworld.com/article/4190721/five-tools-to-bolster-your-ai-coding-stack.html - Opik + Oracle Agent Specification: Build Once, Run Anywhere
Today, we’re announcing Opik’s integration with Oracle’s Open Agent Specification, a partnership that fundamentally changes how AI teams build, test, and ship production agents. When you build a production AI agent or workflow, the framework you pick early shapes everything that comes after it, from prompts, to logic flows, to the surrounding infrastructure. Switching frameworks […] The post Opik + Oracle Agent Specification: Build Once, Run Anywhere appeared first on Comet .
Score: 27🌐 MovesJun 30, 2026https://live-comet-marketing-site.pantheonsite.io/blog/opik-oracle-agent-specification/ - How to Maximize Codex Exec Command
Build a more powerful coding agent setup with a model ensemble The post How to Maximize Codex Exec Command appeared first on Towards Data Science .
- Stop Choosing Between Local and Cloud LLMs: A Field Guide to Hybrid Patterns
A hands-on walkthrough of a hybrid local-cloud workflow using Gemma 4 and GPT-5.4, with reasoning and structured outputs The post Stop Choosing Between Local and Cloud LLMs: A Field Guide to Hybrid Patterns appeared first on Towards Data Science .
Score: 27🌐 MovesJun 30, 2026https://towardsdatascience.com/stop-choosing-between-local-and-cloud-llms-a-field-guide-to-hybrid-patterns/ - Context Engineering for RAG : The Four Typed Inputs Behind Every RAG Answer
Enterprise Document Intelligence [Vol.1 #7bis] - Tobi Lütke and Andrej Karpathy named the practice in 2025. For a single document, each brick emits typed pieces that converge on one LLM call. Corpus, conversation, and tool extensions are follow-up work The post Context Engineering for RAG : The Four Typed Inputs Behind Every RAG Answer appeared first on Towards Data Science .
Score: 27🌐 MovesJun 30, 2026https://towardsdatascience.com/context-engineering-for-rag-the-four-typed-inputs-behind-every-rag-answer/ - Kakao Bank AI safety research gains global recognition
Kakao Bank said Tuesday its Financial Tech Lab's research on financial artificial intelligence safety has gained global recognition, with four papers accepted at major AI-related conferences this year. The papers focus on generative AI security and improving the reliability of AI systems used in finance, the bank said. In April, Kakao Bank presented technology at the International Conference on Learning Representations 2026 that detects prompt injection attacks targeting generative AI in special
- AI agents are your new colleagues - how to get the best results
The future of work is likely to require a careful blend of human skills and AI agents. Here's how to work successfully with your agentic counterparts.
- Agentic Coding on Supabase with OpenCode
OpenCode integrates with Supabase, enabling agents to connect to databases, Edge Functions, and logs automatically.
- How TransTrack is embedding AI across its products and operations
As Indonesia continues to grapple with a persistently high rate of traffic accidents, transport technology firm TransTrack is positioning AI, telematics and data analytics as central tools for improving road safety, both for its own operations and for the customers it serves. The company’s approach was outlined during the Road Transport Safety Management System Strengthening […] The post How TransTrack is embedding AI across its products and operations appeared first on e27 .
Score: 25🌐 MovesJun 30, 2026https://e27.co/how-transtrack-is-embedding-ai-across-its-products-and-operations-20260630/ - Impressions from visiting OpenAI, Anthropic, & Cursor
A peek into where software engineering is headed from inside the sector’s leading AI labs. Agents running in the cloud are a major trend, while coding harnesses are spreading beyond the craft
Score: 25🌐 MovesJun 30, 2026https://newsletter.pragmaticengineer.com/p/impressions-from-visiting-openai - How to Build Cross-Functional AI Literacy in Your Marketing Team
Guide to developing AI skills across marketing teams for better collaboration and adoption.
Score: 25🌐 MovesJun 30, 2026https://www.typeface.ai/blog/build-cross-functional-ai-literacy-in-marketing-teams - The Oxford AI Summit 2026: AI-Assisted Development, Skills and Reskilling
The Oxford AI Summit 2026: AI-Assisted Development, Skills and Reskilling Oxford Lifelong Learning
- I used ChatGPT's 'Pause Prompt' during a heated argument — it helped me say what I actually meant
I used ChatGPT's 'Pause Prompt' during a heated argument — it helped me say what I actually meant Tom's Guide
- Akro AI raises US$700K in pre-seed to automate data workflows in regulated industries
Akro AI, a Singapore-based AI startup, has raised a US$700,000 pre-seed funding round led by Amigos Venture Capital. The round also included participation from strategic angel investors across financial services, healthcare, and consumer tech. Akro AI builds automation tools for regulated industries, where critical work often depends on unstructured documents and fragmented knowledge. According to […] The post Akro AI raises US$700K in pre-seed to automate data workflows in regulated industries appeared first on e27 .
Score: 25💰 MoneyJun 30, 2026https://e27.co/akro-ai-raises-us700k-in-pre-seed-to-automate-data-workflows-in-regulated-industries-20260630/ - C3.ai Inc. Research & Ratings | AI
C3.ai Inc. Research & Ratings | AI Barron's
Score: 25🌐 MovesJun 30, 2026https://www.barrons.com/market-data/stocks/ai/research-ratings?amp%252525253Bmod - Why one AI workshop won’t transform your organisation
Most organisations that say they are investing in AI are actually investing in awareness. There is a difference, and it matters more than many leaders realise. Across workplaces today, AI adoption is accelerating. Employees are experimenting with chatbots, attending workshops and learning prompt-writing techniques. In Singapore, three in four workers are already using AI tools […] The post Why one AI workshop won’t transform your organisation appeared first on e27 .
Score: 25🌐 MovesJun 30, 2026https://e27.co/why-one-ai-workshop-wont-transform-your-organisation-20260623/ - At Mercy, product development principles lead patients to the care they need
At Mercy, product development principles lead patients to the care they need Healthcare IT News
Score: 25🌐 MovesJun 30, 2026https://www.healthcareitnews.com/news/mercy-product-development-principles-lead-patients-care-they-need - DropPR.ai Launches SUMMER26 Campaign to Give VidIQ Users Their First Press Campaign Free
DropPR.ai Launches SUMMER26 Campaign to Give VidIQ Users Their First Press Campaign Free USA Today
- What is an AI agent builder? And why should businesses consider using it?
A plain-English guide to what AI agent builders are, how they work, and why more businesses are turning to them to automate real work, not just chat.
Score: 24🌐 MovesJun 30, 2026https://www.techradar.com/pro/what-is-an-ai-agent-builder-and-why-should-businesses-consider-using-it - Professor Aditi Lahiri awarded ERC Proof of Concept Grant to develop flexible speech-recognition technology
Professor Aditi Lahiri awarded ERC Proof of Concept Grant to develop flexible speech-recognition technology University of Oxford
- ERC Proof of Concept Award 2026 for the FLEX-CODESWITCH project
ERC Proof of Concept Award 2026 for the FLEX-CODESWITCH project Faculty of Linguistics, Philology and Phonetics
Score: 22🌐 MovesJun 30, 2026https://www.ling-phil.ox.ac.uk/news/2026/06/30/erc-proof-concept-award-2026-flex-codeswitch-project - FT Alphaville’s AI Prediction World Cup: Post-group update: Predictions are predictably hard
Many stats; no learnings
- From surveillance to prevention: Intellivix's AI vision
Choi Eun-soo has spent more than two decades in an industry built on looking backward, but the Intellivix CEO is now betting that cameras can help prevent accidents before they happen. “AI should not be about showing off technology,” Choi said during a recent interview with The Korea Herald at Intellivix’s headquarters in Seoul. “It has to work in the field and prove itself through revenue.” For years, closed-circuit television has served as a record of what went wrong — footage to review after
- I used Adobe Firefly for birthday invites and the result was amazing
I used Adobe Firefly for birthday invites and the result was amazing USA Today
Score: 20🌐 MovesJun 30, 2026https://www.usatoday.com/story/shopping/deals/tech/2026/06/30/save-on-adobe-firefly-ai/90744173007/ - Together AI at ICML 2026: frontier research across the full stack
Nine papers at ICML 2026 across the full stack. The research that becomes the Together platform. Find us at booth B714 in Seoul.
- LG C6H OLED Evo AI Review: The First Meaningful C-Series Upgrade in Years?
The LG C-Series has long been the default OLED recommendation for buyers seeking premium picture quality, strong gaming performance and dependable all-round value. The LG C6H OLED Evo AI doesn't reinvent the formula, but it introduces enough meaningful refinements in brightness, processing and usability to feel like the first genuinely significant C-Series upgrade in years.
- 48 hours with the MemoMind One XR glasses — a slow AI, lack of a camera, and disappointing audio left me desperate for more
In a sea of smart glasses I can’t see why you’d choose one with no camera, slow AI, and several other faults over anything else.
- AI on a shoestring: using today’s tools to prove tomorrow’s idea - University of Oxford
AI on a shoestring: using today’s tools to prove tomorrow’s idea University of Oxford - Saïd Business School
- Can AI romance fix language learning? Hyperbond believes so
As AI-native language platforms race beyond flashcards and drills, Hyperbond Studio’s Call Me Sensei is betting that engagement, not curriculum, is the true unlock. Blending character-driven AI, memory systems, and relationship mechanics, the Singaporean startup is reimagining language learning as an emotionally immersive experience rather than a structured syllabus. The AI startup — founded by […] The post Can AI romance fix language learning? Hyperbond believes so appeared first on e27 .
Score: 18🌐 MovesJun 30, 2026https://e27.co/can-ai-romance-fix-language-learning-hyperbond-believes-so-20260212/ - How well does the Beatbot Sora 70 robot clean your pool?
CNET tests the Beatbot Sora 70 robot pool cleaner to see how well it cleans, how easy it is to use, and whether it's worth the investment.