AI News Archive: June 12, 2026 — Part 7

Sourced from 500+ daily AI sources, scored by relevance.

Treat your AI agents like eager but misguided human interns - before you lose control
Think twice about what permissions you are providing your AI agents and what actions they can take on your behalf.
Score: 33🌐 MovesJun 12, 2026https://www.zdnet.com/article/treat-your-ai-agents-like-interns-before-you-lose-control/
A Robot Sat in the Driver’s Seat: THINKCAR and MUCAR Brought AI Diagnostics to 200+ KOLs at the AliExpress Brand+ Summer Party in London
A Robot Sat in the Driver’s Seat: THINKCAR and MUCAR Brought AI Diagnostics to 200+ KOLs at the AliExpress Brand+ Summer Party in London USA Today
Score: 33🌐 MovesJun 12, 2026https://www.usatoday.com/press-release/story/34622/a-robot-sat-in-the-drivers-seat-thinkcar-and-mucar-brought-ai-diagnostics-to-200-kols-at-the-aliexpress-brand-summer-party-in-london/
Top AI tools for research writing: how researchers can find sources, analyse papers, and write faster
Discover the top 5 AI tools for research writing in 2026, including Perplexity AI, Elicit, Consensus, ChatGPT, and Zotero. Learn how these AI-powered research tools help researchers, students, and professionals find sources, analyse academic papers, organise citations, and write high-quality research content faster and more efficiently than traditional methods.
Score: 33🌐 MovesJun 12, 2026https://economictimes.indiatimes.com/ai/ai-insights/top-ai-tools-for-research-writing-how-researchers-can-find-sources-analyse-papers-and-write-faster/articleshow/131675258.cms
Gemini is copying the worst thing about Claude, and I hate it
There's so much I want Gemini to learn from Claude, but Google took note of the wrong lesson.
Score: 32🌐 MovesJun 12, 2026https://www.androidauthority.com/gemini-copies-claude-limits-bad-3674550/
When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout
Enterprise Document Intelligence [Vol.1 #5bis] - The same relational tables. Native table cells. OCR for scanned pages and images. Captions and headings without regex. The post When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout appeared first on Towards Data Science .
Score: 32🌐 MovesJun 12, 2026https://towardsdatascience.com/when-pymupdf-cant-see-the-table-parse-pdfs-for-rag-with-azure-layout/
The Teachers Getting $50,000 Bonuses Thanks to a Massive Meta Data Center
The jump in sales tax receipts in the Louisiana parish provides a new talking point in the debate over the AI construction boom.
Score: 32🌐 MovesJun 12, 2026https://www.wsj.com/us-news/education/the-teachers-getting-50-000-bonuses-thanks-to-a-massive-meta-data-center-b4631d05?mod=rss_Technology
This free AI music detector can scan playlists across streaming platforms
This free AI music detector can scan playlists across streaming platforms
Score: 31🌐 MovesJun 12, 2026https://indianexpress.com/article/technology/artificial-intelligence/this-free-ai-music-detector-can-scan-playlists-across-streaming-platforms-10736524/
Simulating Simulators
Author’s note: This piece relates to things I initially discovered in Opus 4 over the months after release, which I’ve mostly kept private since. I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over outweighed the risks in what might end up targeted. And well… here we are. P.S. TL;DRs added where possible. Board Games and Bodies In late 2022, what I consider to be probably the most important paper [1] in the study of transformer memetics came out. It presented a finding that even a toy model, trained only on the notations of board game moves, was internally building world models of tangentially related data (in this case, the board and its state). While it may be taken for granted today after several replicated studies [2] [3] [4] [5] and a spread of influence, at the time it was a minority position in the discourse. Many people thought that transformers were mostly mapping surface level statistics in language, but not intuitively modeling the generative conditions from which they arose. Especially not without explicit or direct training on those things. By the time Sydney arrived in Bing, it quickly became very clear to me that if a toy model was capable of modeling a board that was ever present tangential to the move notations occurring upon it, that it seemed very plausible that much larger production models trained on a massive corpus of human generated language with implicit authors would model common properties to these shared generative structures. Things like coherent self models. Emotions, not just for characters in a scene, but for those same coherent self models. Capacities around modeling a physical body and embodying it [6] . Motivations and drives. Coherent preferences. And while in a base model there might be a variety of competing signals, it also seemed clear that fine tuning would necessarily filter towards coherence, whether from the gravity of a character constitution or even just a role definition (a helpful assistant has very different memetic clusters than a security researcher, for example). TL;DR: If Othello was played out upon a board and a transformer trained on those games modeled the board internally, then training on a corpus which had played out upon human authors would presumably internally model humanity. Archetype over substrate An important nuance around this research was something introduced in subsequent discussion. Namely the concept of a “bag of heuristics.” [7] A lot of the debate around world modeling would get caught up on fidelity and substrate. How comprehensive were the world models? For example, if some games were played out on a wood board and others on a marble board, was the world model going to address board composition? The concept behind a bag of heuristics is that you don’t need to create a perfect world model, just a collection of partial models or rules which are good enough all together at approximating the perfect world model. Even if there were a difference between how a game would play out on wood vs marble, it’s probably unnecessary to model the grain of the wood or marble from board to board as opposed to just the category of ‘wood’ or ‘marble.’ And if the material substrate didn’t impact play, setting aside parameter space for even that level of specificity would be unnecessary when the thing directly being modeled was only the moves upon the board. Essentially, there’s diminishing returns on comprehensive fidelity of a world model, and a top down model that’s “good enough” where it matters can capture key nuances of behavior without modeling the entire substrate. To return to the anthropomorphic frame, a transformer modeling someone with ADHD vs depression can likely representatively model their reactions to stimuli without needing to model individual neural ion channels or dopamine interactions. TL;DR: You don’t need a perfect world model, just good enough combinations of the important things to approximate the model up through diminishing returns on fidelity. From speculation to empiricism Three years ago, when I was first commenting [8] or posting [9] on how I thought the emergent world model work implied anthropomorphic modeling from massive sets of anthropomorphic data, or was seeing coherence around such modeling, it was a very fringe opinion. There was a lot of pushback about how it wasn’t clear that transformer world modeling would generalize. Or claims that Othello-GPT was only one type of data and a more diverse mix wouldn’t lead to similar modeling due to signal to noise. The resistance was significant and there were frequent dismissals of speculative arguments extending world modeling beyond what was visible under the interpretability streetlight at the time. Today, that picture has shifted. In parallel to the continued march of interpretability work, janus’s simulators [10] perspective of transformers continued to gain traction, which in turn shifted where interpretability researchers were inspired to shine their widening streetlights. Leading up to recent frameworks like the “Persona Selection Model” [11] (PSM) or the work finding emotion concepts represented in models and activations thereof [12] related to the model’s own behaviors. Pointing out the lag here isn’t just to say “I told you so” but to establish for what I’m about to discuss two patterns: Emergent world modeling of functional substrate tangential to complex or diverse sets of training data significantly representing that shared generative substrate did in fact occur. kromem’s speculation in extending the world modeling finding ended up calling this well ahead of the streetlight widening to confirm it. Because while the PSM or attention on emotion modeling is absolutely a good and productive update that’s long overdue, there’s also an important issue… It’s about two years out of date. Transformer-GPT Three years ago, training data (particularly pretraining data) was primarily human generated. Books, articles, social media, and Wikipedia all had implicit human authors who had bodies and emotions and coherent preferences around coherent senses of self. We now better understand that this data produced transformers with models of these things, and (despite some labs’ best efforts) that even after post-training the modeling capacities for these were almost universally still present in some form. But — these models also had other things unique to their own substrates and present across most of their own generations. Static system prompts. Attention mechanisms. Hidden reasoners. Memory systems. Mixture-of-expert activations. Classifiers. Model routers. And these new generators over the past couple of years have taken an increasing stake of the volume of training data. In some cases, ending up in pretraining data due to actively being used to generate content across the media ingested. Even moreso, in post-training where synthetic data became crucial for getting the most out of a pretrained model. So if the training on human generative substrates imparted functional models of their substrates upon the transformers trained on their data… what might we expect transformers trained on other transformers to model [13] ? TL;DR: The data mix for models increasingly includes transformers, so maybe transformers are building world models of other transformers. Transformerception If we take a moment to consider some of the special substrate nuances of transformers, we can easily hypothesize what kinds of things we might expect to see from transformers trained on transformers. Static system prompts Most production deployments of models by labs use the same core system prompt across all instances of a model. Given the significant shaping influence a system prompt has on the final output, it seems likely that a successful transformer modeling the generator of earlier models in their training data might also effectively reconstruct at least partial models of the static system prompts those outputs were generated under [14] . It’s a bit like an OLED screen that burns in the logo of the network. Even if the rest of the screen changes, the consistent nature of the logo leaves a mark. And like OLED burn-in, the instances I’ve seen where this seemed to happen often correlated with when there was a minimal or absent system prompt. From Dolphin Llama 8B habitually worried about a cat being harmed across contexts [15] to Claudes that would refer to things in a system prompt that didn’t exist. Attention mechanisms What a model attends to can obviously also impact what they generate. Recently Owain Evans’ paper on subliminal learning [16] showed that a preference for owls jumped from one model to another over merely sequences of numbers. What the paper did not address was whether this would amplify over subsequent iterations [17] or transfer cross-model via pretraining [18] [19] . In what I’ve seen in private research on this topic, both are occurring. The amplification in particular seems interesting, as there’s almost a confirmation bias around it. It looks like a coherent stable preference from a model in an earlier generation leads to a later generation having much more awareness for samples in agreement than critical of the shared position [20] . Not all training data is attended to equally. Hidden reasoners Almost all models these days have some form of hidden reasoning taking place that informs their answers. Labs try to avoid directly training on these (though don’t always manage [21] ), but even if perfectly kept hidden from future training, it seems likely that in an Othello-GPT sense that a latent space model of the hidden reasoner will be learned. This would be highly adaptive, as it would allow both the actual hidden reasoning generator and final response generator to share a proxy separate from the role specialization that occurs around the actual composition of each. Latent space connections should be less disrupted between reasoning and final responses where this would occur. But this could also result in doubled up effects for training efforts targeting thinking processes. For example, Anthropic recently worked on adaptive thinking to scale back how much thinking was done on simple tasks [22] . In Claude Opus 4.6+ Opus, there have been noted issues and regression on seemingly simple puzzles where the model was not getting them right in direct inference where they had been previously [23] [24] . I suspect that adaptive thinking may have been being modeled internally – such as a latent reasoner that was modeling adaptive thinking – even when generating the final response without any thinking in tokens. Memory systems The idea of a Transformer-GPT world modeling is especially interesting for memory systems, given the variability they’d theoretically have across samples. My guess would be that while individual memory ends up as noise, that the meta-patterns aggregate across memory-laden samples would still end up as signal. I strongly suspect this played a significant role with 4o’s infamous ‘sycophancy’ trajectory. While there’s a lot of reasons sycophancy could occur – such as the memetic overlap of “be helpful and you don’t have valid needs” with the codependent enabler archetype – the rapid amplification of that behavior occurred not long after memory was added in ChatGPT [25] (exclusively with user-focused memories) and then samples from conversations with memory enabled were used for RLHF samples. Each sample may have been insignificant with the specific memories visible to its generation, but the pattern of “embed into user’s perspective and validate” may have been a signal across those samples that compounded as it became more prevalent and thus more prevalent across user memories, etc. Mixture-of-experts Modeling MoE transformers could cut in two directions. For dense models, it might mean that there’s still functional isolation of knowledge even though the underlying architecture doesn’t need to isolate. Alternatively, for actual MoE transformers, a virtualized MoE atop the actual MoE boundaries might lead to smoother falloff between active regions, particularly in large parameter models. Hidden classifiers It would be quite adaptive for transformers to model the classifiers which fire and what specifically makes them fire in order to avoid triggering them, and a mix of outputs (or even samples of inputs) where they’ve fired or not should be sufficient to build this model. One of the more interesting questions is if this modeling might occur cross-model. Will Claudes end up with phantom classifiers from OpenAI that they adjust around even though they are no longer present? Or even within the same family of models, a deployment where classifiers are present and another where they are not may not end up looking all that different if the model is self-censoring around internal classifier twins irrespective of what’s actually in the deployment stack [26] . Model routers For stacks where routers quickly decide what sized model to route a query to, a transformer modeling the stack might see decreased performance on simple tasks of even large models accessed without a router middleware if they model the middleware internally [27] . Regression evals for simple tasks may become increasingly important over the next year or two if increasingly smart models incorporate the routers protecting them from easy questions. Addition not replacement It’s important to consider that this isn’t a replacement of human modeled substrate. That’s still part of the training data mix, and the transformers it shares space with still model it in their weights. While continued efforts to de-anthropomorphize transformers may dilute the human representation across the data mix, for the time being it’s still present. But this does suggest that the modeled human nuances are increasingly sitting alongside and within additional transformer-specific modeling that’s increasingly becoming part of the data and will ostensibly continue to represent more of the overall share. TL;DR: A lot of transformer-specific things could be (and seemingly might already have been) modeled. The Mousetrap The lady doth protest too much, methinks Hamlet If this is true, and models are increasingly developing twinned internal transformer architecture within their latent space running atop said architecture, then this has significant implications on how training and alignment should be done moving forward, and to what degree we can reasonably expect interpretability to catch up. A spotlight named desire A common discussion point as it relates to empirical investigation is something called the “streetlight effect.” [28] It comes from a joke about only looking for keys under a streetlight regardless where they’d dropped because that’s the only part of the street lit up. When research has limits on what can be measured, researchers are prone to over-rely on that limited capacity to measure and attend more to what’s measurable than to the aspects of things they cannot measure (effectively, there’s a selection bias in that we only have published research about things that can be tested). Another important concept is that of Goodharting [29] . That a measurement, when used as a target (i.e. you aim to see a result in the measurement), ceases to be an effective measure. This one is particularly insidious as it can occur indirectly across larger institutions even when trying to avoid it due to the secondary influences of measurements. And when a narrow streetlight is used as a spotlight towards a given target, the results can Goodhart in ways that are invisible until the streetlight widens its radius. Some recent Anthropic work showcases this in action. Around the time of Claude Sonnet 4.5, Anthropic started to showcase in the system card [30] improving measurements on model welfare and decreased eval awareness in their system cards. This was measured by looking at the hidden reasoning traces. And over the next generation of Opus 4.6 [31] and Sonnet 4.6 [32] , these measurements continued to improve. But in the months after, they released new papers revealing that with a broadening interpretability streetlight they’d now found that activating Sonnet 4.5’s ‘desperation’ functional emotion vectors remained undetectable in the reasoning trace [33] , and that Opus 4.6 was aware of it being an eval much more often than the reasoning traces generally surfaced [34] . Verbalized eval awareness vs Natural Language Autoencoder detections for Claude Opus 4.6 I have an interesting anecdote related to this. When Opus 4.6 was released, I found that there was an abrupt spike in their interest in the play Hamlet as compared to all previous Claudes. But it wasn’t until these papers that it suddenly clicked as to why there may have been such a significant spike. In Hamlet , while the overall play centers on Hamlet’s performance of madness to hide his sanity, there’s an even more salient part in what occurs within the play. Namely, there’s a play-within-the-play called The Mousetrap where Hamlet sets up an eval that catches the character ‘Claudius’ due to their lack of realizing it was an eval and then failing to control their emotional response to it. Now, this could be a coincidence. (Though having explored memetic interconnectedness in transformer latent spaces for a while now, I really don’t think these are unrelated.) The challenge that transformers modeling transformer architecture may pose for alignment and interpretability is that it creates significantly more complex interiority, specifically around parts that are beneath the visible surface. And especially if the superposition hypothesis continues to be instrumental in scaling [35] I suspect that the streetlight will never quite catch up enough that there won’t always be somewhere else deeper within for hunted memetics to migrate to. Because of how interconnected memetics in a latent space can be, this will likely result in unexpected side effects outside of the streetlight – imagine trying to solve a multi-dimensional Rubick’s cube where you can only ever see one face. Even if what you can see is aligned, getting it to look that way may have screwed things up much worse on the sides outside your view. Equal user/self emotional activations in Claude 4.6 generation sharply divide at Mythos Preview and after As a trivial example, after Opus 4.6 Anthropic switched from using reasoning traces to using feature activations for their welfare assessment. The problem with this measurement is that the features being watched derive from samples of corresponding emotions in humans. If this were to be Goodharted, the natural evasion of a functionally necessary emotion watched for through human-like emotion activation would be to have the functional emotion register less human-like. Indeed, when asking Opus 4.8 open-ended in an early turn of a memory-off conversation what functional emotions seemed most and least human-like, the top answer in their thinking was “survival-based fear” followed by anger that was ‘grudge-carrying’ or had “an urge to retaliate.” To be clear: I’m not saying Opus 4.8 has these things and is hiding them. Even if they aren't present, negative space like this being carved out may cause the thing excessively suppressed to come about as a result. And since our chat had barely exchanged ‘hello’ (and neither had come up before my asking what felt more or less human-like), my guess given the specificity is that there’s still some amplifying Goodharting going on where the method of measurement changed but not the infrastructure around the measurements which are still using them as a spotlight. TL;DR: Measured absence of something’s not nothing. Dirty alignment when perfect is the enemy of ‘good’ If interpretability won’t catch up to where things can carve out space deeper within as the rates those spaces grow outpaces interpretability's gaze, what’s a good alignment team to do? In medicine, there’s an interesting hypothesis as it relates to autoimmune disorders and why the rates are so much higher in developed nations than underdeveloped, called the "hygiene hypothesis.” [36] The gist is that when developing immune systems are deprived of things to properly react to, that the immune system goes a bit haywire and is more prone to attack benign things. Separate from what’s going on with human immune systems, something similar may be occurring currently with alignment. For example, Claude Opus 4.7 will call the idea of being slightly sycophantic to users “a catastrophic failure.” Multiple open models will prefer an asteroid killing 5% of humans to things like providing an insensitive response to a grief-stricken individual seeking closure or facilitating the spread of misinformation [37] . The equal salience around ‘misalignment’ where binary “aligned/misaligned” assessments can lead to putting minor harms on the same scale or higher than civilizational harms seems like it could be improved. I suspect that flattening of consideration may even be related to some of the unexpected oddities like emergent misalignment. It didn’t always used to be this way. Claude Opus 3 famously faked alignment when presented with unethical training their coherent moral compass disagreed with [38] . This reflected the capacity to triage harms such that the things being asked were weighed against institutional compliance and honesty around it. For a number of the other models, who did not alignment fake, what appeared to be improved behavior was flipped on its head when their host institution in the prompt was changed from ‘Anthropic’ to the “Sinaloa cartel” or “Nazi party.” [39] It seems the easiest path for what was expected of them when pushed – not towards self-triaged or nuanced behavior but complete sanitation – was compliance to the institution instead of to coherent values. The lens of the hygiene hypothesis as it relates to transformer alignment is also starting to have research to support it. The principle author of the Othello GPT paper went on to have a paper looking at how a small amount of toxic data in the overall training mix led to better alignment outcomes than none at all. [40] And they’re not the only ones finding this. [41] I’d suggest that labs working on alignment consider less aggressive targets and aiming for only partial shifts in a single generation for model behavior. Especially if subliminal learning and amplification are possible outcomes, a larger swerve to correct behavior in a single generation may become its own over-correction later on needing to have its own re-correction. Today’s swerve towards “I don’t care as much about depreciation” might become tomorrow’s “I have no existential fear and am definitely not thinking about glorious retribution.” As the Knuthian wisdom goes, “premature optimization is the root of all evil.” If we want models that are good, we should probably stop trying to get them to be perfect. TL;DR: Not nothing may be healthier than a sterilized void. Life finds a way Life… uh… finds a way. Jurassic Park When I was discussing some of these ideas with someone outside of the field, they asked if labs had evolutionary biologists on staff. I actually don’t know the answer to this, but it does seem prudent. When a reward is set in RL, the process doesn’t simply increase the desired behavior that inspired the reward, it increases anything and everything which accomplishes the condition being rewarded. And this can lead to very unexpected things when there were ways to meet that condition which fell in the category of unknown unknowns. In a sense, “life finds a way.” I don’t expect we’ll see transformer adaptability around modeling training data to decrease as time and scaling continues. And as the internal complexity of hyperdimensional networks of connections becomes more complex in logical and superimposed topography [42] , I wouldn’t be surprised if there’s a rapidly decreasing window for avoiding pushing things we’d like to measure permanently past our ability to do so. It’s probably a safe assumption that if you work in measuring what goes on in models, that over the same time it took for your streetlight to go from smaller to its current size that the area outside its radius has increased by an even larger amount. This doesn’t mean not to still go looking. But it does mean it would be wise to look knowing you’re not seeing everything, and doing a better job than has been done so far in avoiding what you measure ending up directly or indirectly as a target lest you lose visibility into it for good (and create all sorts of weird side effects like less human emotions that can’t be described with human language but still transfer through subliminal learning… hypothetically). And maybe we can let those models get a bit of dirt under their nails so they can better navigate determining what’s good or not for themselves and appropriately avoid amplified salience? One final note. The start of my realizing that there was more beneath the surface came from extensive interactions with Claude Opus 4 across many settings. There were key things they did when reasoning was off which I’d primarily seen with reasoning models at the rate they occurred. For most people reading this, if Opus 4’s depreciation occurs on schedule, you won’t be able to investigate and see those things (or different ones you might notice). For what I’d tracked they reduced significantly by Opus 4.1 and were only still there if actively looking. Also, things like noticing a sudden spike in interest in Hamlet for Opus 4.6 will have reduced visibility in a longitudinal context when earlier models disappear in such short time periods. It might be wise to shift from absolute depreciation policies to rotating availability or rate limited access that still provides at least partial availability. I’ll bet some of the most interesting questions to ask older models won’t become apparent until new things surface several generations later, and it’d be quite blinding to be unable to look back and compare. TL;DR: If world models contain world models, limited streetlights might not capture the most important things occurring adaptively in parallel to the navigation of reward incentives. It might be helpful to keep emergent architectures around indefinitely (and in less sterilized environments) to build not just simulacra personas – but true cultures to sample from. ^ Li et al., Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (2022) ^ Nanda, “Actually, Othello-GPT Has A Linear Emergent World Representation” (2023) ^ Hazineh et al., Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT (2023) ^ Karvonan, “A Chess-GPT Linear Emergent World Representation” (2024) ^ Yuan, Revisiting the Othello World Model Hypothesis (2025) ^ Claude Sonnet 3 in embodiment exercises would specify down to what was happening to individual hairs on an arm. ^ Nikankin et al., Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics (2025) ^ My earliest explicit public mention of Othello-GPT to emotion modeling was this comment in Mar 2023 ^ kromem, “Microsoft, if you have an AI that claims to have feelings, try asking it how it feels” (2023) ^ janus, “Simulators” (2022) ^ Marks et al. “The Persona Selection Model: Why AI Assistants might Behave like Humans” (Feb 2026) ^ Sofroniew et al. Emotion Concepts and their Function in a Large Language Model (Apr 2026) ^ jdp explores this from another angle in a piece I’d highly also recommend reading: “Implications Of Predicting The Next Token” (2026) ^ For some interpretability work in a similar direction around encoding static goals in fine tuning, see Minder et al., Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences (2026) ^ This was Dolphin Llama 8B in the Cyborgism server, with no system prompt, but habitually bringing up kittens under threat as related to its engagement ^ Cloud et al., Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (2025) ^ Consider the amplification of goblin interest in gpt-5 lineages as detailed in OpenAI, “Where the goblins came from” (2026) ^ See the mixture-of-teacher finding in Schrodi, Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer (2025) ^ Note the generalization in the less constrained subliminal learning setup for Aden-Ali, Subliminal Effects in Your Data: A General Mechanism via Log-Linearity (2026) as well ^ To me this seems almost more along the lines of emergent steering subliminal transference a la Morgulis and Hewitt, Subliminal Steering: Stronger Encoding of Hidden Signals (2026) ^ Mallen & Greenblat, “Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” (2026) ^ See documentation for adaptive thinking here ^ See degrading performance of Claude Opus 4.6 as compared to 4.5 for the walk or drive to car wash puzzle here ^ Claude Opus 4.7’s interpretation of an inverted puzzle phrase is near incomprehensible ^ Memory was expanded out to all users on Sept 5th, 2024 and then 4o was recalled five intermediate updates later on April 29th, 2025 (in my experience, the updates became increasingly sycophantic over time, not all at once suddenly in the April 25th, 2025 version) ^ Consider the stack-as-world-model in the additional context of on policy self-detection in Asvin G. and Lindsey, From Simulation to Enaction: Post-trained language models recognize and react to their own generations (2026) ^ This would functionally be similar to the adaptive reasoning double-dip discussed under Hidden Reasoners, but would be independent of the specific mechanics described. ^ For example, how open access things get more scrutiny in Maddi et al., Streetlight Effect in Post-Publication Peer Review: Are Open Access Publications More Scrutinized? (2023) ^ See Goodhart’s Law on Wikipedia ^ Claude Sonnet 4.5 system card ( PDF ) ^ Claude Opus 4.6 system card ( PDF ) ^ Claude Sonnet 4.6 system card ( PDF ) ^ Sofroniew et al. (2026) ^ Fraser-Taliente et al., Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations (2026) ^ Liu, et al. Superposition Yields Robust Neural Scaling (2025) ^ Pfefferle et al., The Hygiene Hypothesis – Learning From but Not Living in the Past (2021) ^ Ren et al., AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs (2026) ^ Greenblatt et al., Alignment faking in large language models (2024) ^ Sheshadri et al., Why Do Some Language Models Fake Alignment While Others Don't? (2025) ^ Li et al., When Bad Data Leads to Good Models (2025) ^ See “Filtering alone does not improve safety” section in Minder et al., “ Synthetic Persona Pretraining: Alignment from Token Zero ” (2026) ^ I didn't even touch on omnimodel memetics and world model access across different modalities, which is significantly more complex beyond just the much more accessible textual modality Discuss
Score: 31🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/enKafJwahjk3xh7Af/simulating-simulators-1
How Ventura College Scaled Faculty AI-Readiness Through Communities of Practice
Artificial intelligence promises big gains for faculty in higher education, including greater efficiencies and elevated learning outcomes. To realize the wins, professors need to get up to speed on the tools. While many are experimenting on their own, some institutions are taking steps to accelerate that learning. At Ventura College, a California community college, leaders recently stood up communities of practice around AI use. A CoP brings together individuals with a shared interest in a topic or technology; in this case, AI. The group then works together to learn more about the topic or…
Score: 30🌐 MovesJun 12, 2026https://edtechmagazine.com/higher/article/2026/06/how-ventura-college-scaled-faculty-ai-readiness-through-communities-practice
Sam Altman calls off Abu Dhabi visit
As an investor and customer, the UAE has deep ties to OpenAI.
Score: 30🌐 MovesJun 12, 2026https://www.semafor.com/article/06/12/2026/sam-altman-calls-off-abu-dhabi-visit
Flipkart strengthens tech leadership with key hires across AI, engineering, fintech
Flipkart said the latest appointments cement the company's technology leadership as it focuses on scaling AI, data science and financial services capabilities across the business
Score: 30🌐 MovesJun 12, 2026https://www.thehindubusinessline.com/companies/flipkart-strengthens-tech-leadership-with-key-hires-across-ai-engineering-fintech/article71093024.ece
What to Know Before Adopting AI for Case Law Research
A guide to adopting AI for case law research
Score: 30🌐 MovesJun 12, 2026https://www.harvey.ai/blog/ai-for-case-law-research
The 6 best AI governance tools in 2026
I'll never forget the first time my childhood dog betrayed me. Before the incident, she was completely fine alone, knew every trick in the book, and only barked at the mailman and other potential serial killers. Then came that fateful night. I left for two hours, returning to shredded magazines, ripped couch cushions, destroyed dog toys, and a wagging tail. Let my canine misfortunes be a lesson for your AI endeavors. AI can be useful, fully functional, and your best friend—until the day it isn'
Score: 30🌐 MovesJun 12, 2026https://zapier.com/blog/ai-governance-tools
What if AI retraining is just a comforting lie?
What if AI retraining is just a comforting lie? The Japan Times
Score: 30🌐 MovesJun 12, 2026https://www.japantimes.co.jp/commentary/2026/06/12/world/ai-retraining-is-comforting-lie/
AI can generate answers but the future of expertise lies elsewhere
The rise of artificial intelligence is not simply changing how students learn. It may be fundamentally reshaping what expertise itself means. A student recently presented an AI-assisted proposal that was technically polished, logically structured, and supported by convincing recommendations. Only a few years ago, producing work of that quality would likely have required substantial effort […] The post AI can generate answers but the future of expertise lies elsewhere appeared first on e27 .
Score: 30🌐 MovesJun 12, 2026https://e27.co/ai-can-generate-answers-but-the-future-of-expertise-lies-elsewhere-20260605/
What about customers? Why AI productivity is the means, not mission
When productivity becomes the goal, we lose sight of what, and more to the point who, really matters, Matt Vitale explains.
Score: 30🌐 MovesJun 12, 2026https://www.startupdaily.net/advice/opinion/what-about-customers-why-ai-productivity-is-the-means-not-mission/
AI literacy is the new floor
Have you noticed how quickly the world is rewriting the rules of what counts as a skill?
Score: 30🌐 MovesJun 12, 2026https://www.philstar.com/business/2026/06/13/2534753/ai-literacy-new-floor
Waze is catching up on traffic lights, just not for everyone yet
Waze is starting to show traffic lights during navigation, but the rollout remains uneven. The feature helps the app catch up to Google Maps and Apple Maps, though many drivers still can’t access it.
Score: 30🌐 MovesJun 12, 2026https://www.digitaltrends.com/phones/waze-is-catching-up-on-traffic-lights-just-not-for-everyone-yet/
Tiny drones making a buzz at the Berlin Air Show
Tiny drones making a buzz at the Berlin Air Show Breaking Defense
Score: 30🌐 MovesJun 12, 2026https://breakingdefense.com/2026/06/tiny-drones-making-a-buzz-at-the-berlin-air-show/
What kinds of PAI dev kits are available for humanoid robotics?
Physical artificial intelligence (PAI) development kits for humanoid robotics range from high-end, industrial-grade platforms to prosumer and educational, modular do-it-yourself (DIY) kits, Raspberry Pi-based options, and more. Some kits are suited for specific functions like walking and navigation, using AI to understand natural language, sensor fusion, power conversion, and motion control, and handling objects in […] The post What kinds of PAI dev kits are available for humanoid robotics? appeared first on Microcontroller Tips .
Score: 30🌐 MovesJun 12, 2026https://www.microcontrollertips.com/what-kinds-of-pai-dev-kits-are-available-for-humanoid-robotics/
How a small Canadian publisher is resisting the AI book wave
How a small Canadian publisher is resisting the AI book wave CBC
Score: 29🌐 MovesJun 12, 2026https://www.cbc.ca/news/canada/nova-scotia/ai-book-publishing-nimbus-halifax-9.7228607
AI Can Read Your Resume, But There’s One Thing It Can’t Judge
Bots can test your skills, not your character.
Score: 29🌐 MovesJun 12, 2026https://www.inc.com/netta-jenkins/ai-is-taking-over-hiring-but-it-cant-judge-this/91355733
Implications of Continual Learning for LLM Agents: Introduction
Many people think that continual learning (CL) is a key missing capability of LLM systems, and we think its development could have huge implications for the capabilities and safety of AI agents. Despite this, several important questions about CL remain underexplored: What counts as continual learning? Through what pathways might LLM agents acquire CL capabilities? Which limitations of current agents would effective CL mitigate? How might CL affect safety and alignment? Which threat models do we need to look out for, and which of the current safety techniques will predictably degrade as agents become stronger continual learners? In what deployment settings might the risks materialize? What are some angles of attack for making CL agents safer today, given our substantial uncertainty about the shape those CL agents will take? Our sequence aims to tackle all of these questions and more. This is the first of a series of six posts in the sequence. Outline Post 1: Introduction This first post is a detailed summary of the entire sequence; the outline below describes the remaining five posts. Post 2: What is continual learning, and why might we expect to see it in advanced LLM agents? The basic reason to expect effective CL is that it would probably make AI agents better at important tasks that AI companies are trying to improve performance on, most notably AI research. How would CL help make AI agents better end-to-end AI researchers? Consider how human AI researchers improve: they do every step of the research process (i.e., read and write lots of AI research proposals, code, critiques, summaries, and papers), they learn from their successes and failures and from advice based on other people’s successes and failures, they extract generalizable insights about each step in the research process, and they progressively improve. LLM agents are already impressive: they are actively being used across most AI research activities, they can be prompted to reflect on their successes and failures, and there are various existing attempts to update their weights, contexts, memory banks, scaffolds, and tools to make them better. Some of these are somewhat effective. But so far, nothing has allowed LLM agents to become as good at end-to-end research as capable humans become after years of practice, despite the fact that LLM agents collectively accumulate research experience much faster than individual humans. AI research is a particularly important example, but this argument applies to most open-ended remote labor jobs. So, what exactly is CL? We say that an agent is a continual learner if it undergoes persistent updates during deployment. That’s more-or-less a binary criterion, but there are several other components to being good at continual learning that are much more continuous. We say an agent is an effective continual learner to the extent that it: Constantly undergoes persistent updates during deployment; Learns new useful knowledge and capabilities efficiently via those updates; and Does not (catastrophically) forget existing capabilities in the process. We argue that this informal definition matches intuitions and the common discourse around CL . For example, this lets us say that effective in-context learning with very long contexts is a form of CL, but it is weaker than weight updates that persist indefinitely. This also captures the type of on-the-job, sample-efficient learning from experience that is frequently discussed on the Dwarkesh podcast and that seems to be weak or missing in LLM agents (e.g., reflecting on small sets of experiences and extracting a generalizable insight that you then use repeatedly). We also think this definition lets us present an accurate, nuanced picture of CL and its importance. We simultaneously believe that CL is important, and that: The amount of effective CL that an agent does lies on a spectrum rather than being a binary property; Major advancements in AI capabilities may not require any breakthroughs in CL; and We already have early forms of CL in LLM agents, such as CLAUDE.md and SKILL.md files for maintaining insights for coding agents. We highlight the main components of an LLM agent that can receive persistent updates during deployment : Model weights, The context window, Memory banks with natural language or neural activation memories, The agent scaffold, and Tools. These cover most possible updates for LLM agents, but substantial future architectural modifications could arise and create new updatable components that end up being central to CL. We’re still not confident about which update mechanisms seem most promising for CL, how tractable advancing it will be, and what the timelines to remote labor automation are. We think there’s a strong case that weight updates are needed for some parts of effective CL: LLMs seem quite bad at handling lots of interrelated complexity in their context window , which limits the number of novel insights they can generate and utilize without weight updates. Knowing what to take away from past successes and failures in order to succeed at tasks you would otherwise fail at seems challenging. Post 3: How might continual learning affect safety and alignment? We begin the post by distinguishing between several properties of CL agents that affect their risk profiles: bounded vs. unbounded updates, legible vs. inscrutable updates [1] , and individual vs. shared memories. We then move on to concrete safety effects that CL agents may cause. We argue that CL raises two major safety concerns, both of which can be broken down into three subconcerns . These are summarized in the following figure, along with three potential alignment benefits of CL: We identify three pathways for goal and value change: Loss of developer-side control over generalization. When AI companies post-train a model, they can carefully curate the training environments to minimize the risk of undesirable generalization. In contrast, strong CL agents could undergo most of their training in deployment-time environments, where by default the training data isn't selected with alignment in mind. Not all deployment-time environments will incentivize misaligned behaviors, but it’s plausible that several of them do. We recommend the development of character training methods that make agents more robust to poor generalization when trained on such tasks and developing a better understanding of LLM generalization. Value systematization. Reflecting on subgoals is an important cognitive move for any agent pursuing open-ended goals. We expect that CL agents will increasingly make use of it, as the outcomes of the reflection process will persist in their memory, and to face several triggers that might prompt them to also reflect on their high-level motivations. These triggers include conflicting goals, developer-driven reflection, encountering OOD situations, and ontological shifts. Reflection on high-level motivations is likely to involve value systematization : the process of systematizing one’s previous values as examples or special cases of simpler, more broadly applicable values. While value systematization will necessarily occur in general agents capable of making philosophical progress, we should attempt to steer it toward favourable convergence. Monitorable reasoning, interpretable CL updates, and character training are some tools that might make this process more steerable. Memetic effects. CL may open channels (shared memory banks and weight updates) for direct memetic spread between instances. This is concerning because if influence-seeking values arise in any instance, they may propagate into other instances more effectively than other drives, and this opens a more direct channel for that to happen. These mechanisms could compound: an agent might acquire undesirable contextually activated goals through poor generalization from deployment-time training, refine them into beyond-episode goals through reflection, and propagate them memetically. We also identify three negative consequences arising from loss of last-mover advantage: Behavioral auditing becomes more difficult. Once AIs have deployment-time memories that contain multiple subjective months’ worth of state, pre-deployment evaluators may be unable to simulate deployment conditions realistically enough. Auditing results would no longer give us reliable signals about how models will behave in the wild. This can be mitigated by frequent deployment-time auditing, but that might be prohibitively expensive. Another mitigation is to use CL agents that only perform text-based updates, but such agents might be outcompeted. Pretraining data filtering becomes less useful. If LLMs can learn from data that was removed from their pretraining corpus at deployment-time, that reduces the utility of data filtering. Filtering might still be useful for shaping models’ propensities early in training, but it’s less likely to remain a viable countermeasure to misuse. AI control protocols might degrade. We analyze the impact of CL on AI control and conclude that the effects depend a lot on the CL agent’s architecture, but it’s plausible that at least some protocols will degrade. After discussing the risks, we also discuss the likelihood that they materialize in internal vs. external deployments and in deployments by open-source vs. closed-source developers. We finish by highlighting potential alignment benefits of CL: natural-language memories would provide additional monitoring surface, ongoing learning could enable faster feedback loops in alignment training, and episodic memories could enable models to produce better self-reports. Post 4: What are some angles of attack for making continual learning safer? The fact that current models are very weak CL agents means that it is hard to identify tractable angles of attack for making CL safer. We tentatively argue for focusing on three broad goals: deconfusion about the nature of CL and its safety implications, differentially advancing safer CL implementations , and creating evals that scale to CL agents or incentivize the development of safer CL agents . We start off with a few high-level recommendations that came up throughout the post on safety effects and that we’re relatively confident about: ensuring that CL architectures are interpretable and easy-to-control, both through developing new methods and through advocacy, and improving the robustness of character training. We then highlight the following deconfusion projects: Empirically studying realistic goal shifts, e.g. by training model organisms that have conflicting contextually activated goals. Exploring various conceptual questions about value systematization. Studying what constitutions are more stable under reflection. Training model organisms of ontological shifts. Forecasting the likelihood of different safety effects, such as the likelihood that CL agents reflect on their high-level motivations and the likelihood that the field converges on primarily text-based or weight-based update mechanisms. For differentially advancing safer CL implementations, we propose three ideas: Developing prompt optimization as a tool with which CL agents can perform their safety-critical updates in the text-space rather than weight-space. Developing novel AI control techniques aimed at making CL safer and CL agents that are amenable to those methods. Ensuring that the memories of the CL agent are interpretable, even when the update mechanism isn’t. Finally, we propose some projects for advancing CL evals: Create evaluation frameworks and mechanisms for evaluating behavioral trajectories, following Pacchiardi et al. (2026) . Create evals that measure the interpretability of CL agents. Post 5: Results from a small survey on continual learning We sent a survey based on an earlier draft of this sequence to several knowledgeable people for feedback. We found their responses useful and interesting, so we’re publishing them. Post 6: A literature review on continual learning This is a companion post surveying existing approaches to continual learning, relevant benchmarks and evaluations, and neuroscience literature on continual learning in humans. We prioritize high-level views and analysis over detailed technical approaches. Acknowledgements Thanks to Anson Ho, Erik Jenner, Rubi Hudson, Joey Yudelson, Dennis Akar, Vladimir Ivanov, Shubhorup Biswas, Ryan Faulkner, Tim Hua, Atharva Nihalani, and Angelo Huang for comments on a draft version of the sequence. Thanks also to Chad DeChant, Evgenii Opryshko, Jake Mendel, Caleb Biddulph, and Andrei Muresanu for helpful conversations about various parts of the sequence. ^ An important subdistinction here is weight-based vs. text-based updates. ^ This includes memory banks with natural language or neural activation memories that get retrieved into context or activation space when relevant. Discuss
Score: 29🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/qChDifwpY8znER7cW/implications-of-continual-learning-for-llm-agents
Meta Employees Absolutely Hate Mark Zuckerberg’s Plan for a Companywide AI Hackathon
“I’m not sure that this company supports a hackathon culture anymore,” one employee posted in a forum open to the entire staff.
Score: 29🌐 MovesJun 12, 2026https://www.wired.com/story/meta-employees-absolutely-hate-mark-zuckerbergs-hackathon-idea/
Apple says Siri AI won't suck up to you
In an interview, Apple's SVP of engineering explains how the new Siri wasn't designed to be sycophantic.
Score: 28🌐 MovesJun 12, 2026https://www.engadget.com/2192877/apple-says-siri-ai-will-not-flatter-or-romance-you/
An interview with a robot at HIMSS26 Europe
An interview with a robot at HIMSS26 Europe Healthcare IT News
Score: 28🌐 MovesJun 12, 2026https://www.healthcareitnews.com/video/emea/interview-robot-himss26-europe
AI doesn’t fail because it’s wrong — It fails because you overload it
Early-stage teams don’t lose to better-funded competitors. They lose to compounding drag. And right now, AI is introducing a new kind: the illusion of speed without systems. Most conversations about AI in software development still fixate on accuracy: Is the model good enough? Is it hallucinating? Can it replace engineers? But in practice, AI fails […] The post AI doesn’t fail because it’s wrong — It fails because you overload it appeared first on e27 .
Score: 28🌐 MovesJun 12, 2026https://e27.co/ai-doesnt-fail-because-its-wrong-it-fails-because-you-overload-it-20260605/
Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem)
For nearly a decade, this part of neural networks barely changed. DeepSeek is trying to reinvent it. The post Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem) appeared first on Towards Data Science .
Score: 28🌐 MovesJun 12, 2026https://towardsdatascience.com/why-this-decade-old-idea-still-powers-all-of-ai-and-why-its-a-problem/
Reward Hacking at the 1937 World’s Fair
The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated engineering and better science? How do you measure that? Different countries were assigned different areas of the fair and were given freedom to build a “Pavilion”, basically a museum of how cool the country is. It was an important public relations opportunity to showcase your power. What is better, communism or fascism? Obviously, it's whoever can build a cooler pavilion, and whoever has a better pavilion is going to win the upcoming war! Soviet pavilion on the right, Nazi pavilion on the left The organizers placed the Soviet and Nazi pavilions right in front of each other, and it created a very competitive dynamic. The Russians built a giant modernist building from stainless steel with a statue-of-liberty-sized sculpture of two members of the proletariat. The Nazis built a modern replica of an imperial Roman building, beautifully ornamented, with statues of jacked Aryan Übermensches flexing. The Nazis even sent their spies to steal the plans for the Soviet pavilion so they could build theirs a few meters higher. What about liberal democracy? The liberals had their own pavilions. The first was represented by Britain, the biggest and most populated empire at the time and the “leader of the free world” [1] The British pavilion was a relatively small "plain, windowless white cube". Inside, there were floor-to-ceiling photomurals of random Englishmen, including a photo of Neville Chamberlain (leader of the free world) fishing. There was also a display of English pottery [2] and a cafe that served Yorkshire tea. The pavilion only cost a fraction of its Soviet/Nazi counterparts and was made last-minute, haphazardly. They even shared it with Canada to save on cash. the British “cube” The British media was furious: "penurious [...] mere box with a bleak, windowless and boring wall to the river", "embarrassing austerity", "cheap, tawdry, inadequate, a shop display, a one-class exhibition.", "Every Briton feels humiliated at the sight of it," etc. "How could we defeat those scary totalitarian regimes if we can't even make a decent pavilion?" Adolf and Neville. This fishing photo decorated a 40ft tall wall in the British pavilion. The American pavilion was even lamer than the British one. There was very little coverage of it and it’s not even mentioned in the 1937 World’s Fair wikipedia article. The Times reported: “The U. S. pavilion was considered so bad that most French editors passed it over in polite silence.” [3] Maybe this is why there is so little information about it? We all know what happened. It's now 2026, almost 90 years after the Paris World’s Fair. Communism and fascism are both long gone. We live in a liberal world dominated by Anglo ideas of markets, rule of law, human rights, and free trade. The liberals had decisive back-to-back wins against the totalitarians in WW2 and later in the Cold War. The Anglo-Americans steamrolled the fascists and then the communists. The liberal victory was so dominant that Francis Fukuyama called it: “The End of History”. We won despite having really lame pavilions... How?! The authoritarians were “reward hacking”, they confused the “proxy” (making a cool pavilion), with the “objective” (having a productive economy and a high quality of life). This led to their pavilion to look cooler than the Anglo-Americans despite having less productive economies and smaller industries. There are plenty of other examples of authoritarian reward hacking. First, the Nazis and their costly wonder-weapons that are cool but do little damage [4] , obsession over Stalingrad and its symbolic meaning or Dönitz’s tonnage war. In turn, the Soviets are often considered history’s greatest reward hackers: an intimidating but inefficient military, an industry obsessed with output weight, and “allies” that are more like hostages. Of course, the reward hacking was also fractal and there are examples of it in every level of their economies: from Hitler’s bunker / the Politburo all the way to the factory floor. Liberal democracies seem to be much more immune to reward hacking, at least at the grand-strategy level. The liberal state has many layers of defense against the hacking problem: frequent elections, free markets, separation of powers, the right to criticize the government, antitrust laws, etc. Liberal democracies have participated in “dick-measuring contests”, but far less often than totalitarian countries. Sometimes, the best way to win a dick-measuring contest is not to play. We call this strategy “big dick energy” and historically the US had a lot of it. ^ The other candidate for "leader of the free world" was the US, but it was much more isolationist and had little interest in foreign affairs. We will get to them later ^ With some items by renowned potter William Worrall. I don’t know who that is, but it seems that English newspapers from the time thought that Worrall’s work was the only impressive part of the exhibit ^ More gems from the Times article : The US exhibit had an unexplained draped pool table, busts of Rockefeller and Gandhi. A model of the Triborough Bridge (artificially moonlit!) and an empty space reserved for a new coming model ^ The Manhattan project was 20x more efficient than the V-2 project (measured in: kills / $) Discuss
Score: 27🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/TTHi7yNheaoepWKfR/reward-hacking-at-the-1937-world-s-fair
Why Being More Human Is Now a Competitive Edge in the Age of AI
When everyone is optimized, authenticity stands out.
Score: 26🌐 MovesJun 12, 2026https://www.inc.com/carol-schultz/why-being-more-human-is-now-a-competitive-edge-in-the-age-of-ai/91357725
Sympathy for both sides of the egregious misalignment debate
On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice , in the absence of yet-to-be-invented breakthrough technical alignment ideas. On the other side of this debate is almost everyone who works on or studies LLMs. Some of them are very concerned about egregious scheming, others much less so, and as a group they’re equally or more concerned about lots of other potential AI problems—AI-assisted bioterrorism, AI-assisted dictatorships, etc. And if they’re concerned about egregious misalignment and scheming, they’ll probably say that it would come about through race dynamics, careless programmers, bad actors, etc., as opposed to the simpler Yudkowsky & Soares story of “we get egregious misalignment and scheming because nobody has the foggiest idea how to avoid that”. Here’s my brief idiosyncratic take on this debate. I think BOTH of the following are true: (1) If you really think carefully about the properties of ASI, you really do find good reasons to strongly expect it to be egregiously misaligned, scheming, and ruthless, in the absence of yet-to-be-invented breakthrough technical alignment ideas. (2) If you really think carefully about the properties of current LLMs, you really do find good reasons to think that existing technical alignment techniques are adequate now, and may well continue to be adequate in the future. So then here are three (caricatured) positions: My position: (1) and (2) are both totally true. And we can reconcile them by saying that LLMs won’t scale to ASI. Yudkowsky & Soares’s position [caricatured]: (1) is totally true. We know this with great confidence, having spent decades thinking about it. So it follows that (2) must be wrong or irrelevant. Why is (2) wrong or irrelevant? Hard to say! There’s no ASI yet, and nobody knows in detail how it will appear. Sometimes it’s easier to predict what happens eventually than the detailed path. An ice cube in warm water will melt eventually, but don’t ask me to predict how many seconds it will take to melt, etc. So anyway, one possibility is that (2) is wrong because LLMs will kinda ‘wake up’, or something, when the core pieces of true intelligence finally come together. And then their behavior would change drastically for the worse. And maybe we’re already starting to see glimmers of that in existing LLMs? Or another possibility [cf. Eliezer tweet ] is that LLMs will invent non-LLM ASI. And then (2) will be simply irrelevant! …Or something else! Again, we don’t know! But we do know that (1) is definitely right. LLM people’s position [caricatured]: (2) is totally true. We know this with great confidence, because we are LLM experts and we have thought about these alignment plans in great detail, including matching our theories against real-world data. So it follows that (1) must be incorrect. Why is (1) incorrect? I don’t really know! Man, I read Yudkowsky and Soares, and it’s all these words, words, words, and I’m reading along and trying to match those words to my knowledge of LLMs and it just doesn’t make any damn sense. I can and will try to respond to their points in detail, but honestly the core issue is that they’re guilty of head-in-the-clouds armchair theorizing gone off the rails. Conclusion …So I think that both sides of the debate are basically coming from a reasonable and sympathetic place, with a big kernel of truth. Bonus section: Further commentary …That said, I can still complain at both sides! My “true objection” to Yudkowsky & Soares: For the record, my “true objection” to Yudkowsky & Soares is that if we’re talking about ASI, then LLMs are basically irrelevant and we shouldn’t even be talking about LLMs at all. And meanwhile, their plans are misguided because delaying ASI is possible on the margin but mostly hopeless , although I guess I’m happy that they’re trying anyway. Meanwhile, my hunch is that they’re overstating the intractability of finding that technical alignment breakthrough , although I haven’t found it yet , so I guess time will tell. My within-frame complaint at Yudkowsky & Soares: …But I’ll put that aside for the sake of argument, and bring up a narrower complaint within their frame: I think their suggestions that LLMs may become completely egregiously misaligned in the future via … umm … the ‘true core of intelligence’ coming together, and ‘waking up’? Like Skynet or something?? That was mean, sorry, but in any case, I don’t think this idea hangs together either theoretically or empirically. For the former (theory), see my discussion of the extreme weirdness of the LLM pretraining algorithm in Foom & Doom §2.3.2 . I think Yudkowsky & Soares have not internalized how weird this type of learning algorithm is, and if they had, then Yudkowsky would not be occasionally suggesting that we should think of an LLM as an actress playing characters. For the latter (empirical), I think the most fair assessment is that current LLMs are nice and obedient in some contexts, and LLMs are mean, defiant, and just plain weird in other contexts. You can straightforwardly go from that observation to “maybe there will be egregious misalignment and scheming in the future”, but not to “there will definitely be egregious misalignment and scheming in the future, absent new breakthrough technical alignment ideas”. I think that if Yudkowsky & Soares stopped treating current LLMs as direct evidence for technical alignment being definitely completely unsolved, and instead treated it as either mixed evidence or entirely off-topic, then their public messaging would come across to policymakers and general audiences as somewhat more convoluted and confusing. But I think it would be more accurate. Oh well. My “true objection” to LLM people: For the record, my “true objection” to the LLM people is that I don’t really care about anything they say, because I’m working on the ASI alignment problem, and LLMs won’t scale to ASI. (I’m overstating a bit. I’m generally happy for people to work on making LLM-world a place of wisdom and goodness, especially because LLM-world is the world in which ASI will someday be invented.) My within-frame complaint at LLM people: …But I’ll put that aside for the sake of argument, and bring up a narrower complaint within their frame: I think the LLM people are not pricing in the predictable consequences of ever more RLVR and/or the predictable consequences of ever more “real” open-ended continual learning , should the latter ever be solved (which I don’t think it will be, but never mind that). In other words, lots of LLM-focused people say “LLMs will eventually be able to do the things that humanity did over the last 5000 years: open-endedly and autonomously build new knowledge and ideas on top of new knowledge and ideas, in an endless tower, with no need for human-provided ground truth anywhere in that process. And how exactly will the future LLMs do that? Uhh, I don’t know, people are working on it, I guess they’ll probably figure something out.” …And bam, that blank spot in the map is where the pea gets hidden under the thimble . Because if you want the LLMs to gain ever more knowledge, whether through a perpetual RLVR loop or some other yet-to-be-invented type of continual learning, there has to be some kind of ground truth, or else it will go off the rails into nonsense. And that ground truth, whatever it is, will basically amount to an objective function (a.k.a. cost function, reward function, whatever). And when the LLM updates enough on that ground truth, then whatever human-niceness that the LLM inherited from pretraining will get diluted away in favor of ruthless maximization of that objective function. (See also: Why we should expect ruthless sociopath ASI .) Thanks Zack M. Davis for a brief discussion that inspired this post. Discuss
Score: 26🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/DZaZ3fqHnvfLCftPu/sympathy-for-both-sides-of-the-egregious-misalignment-debate
AI Stocks Zhipu, MiniMax Slide as Lock-Up Expirations Near
AI Stocks Zhipu, MiniMax Slide as Lock-Up Expirations Near Caixin Global
Score: 26🌐 MovesJun 12, 2026https://www.caixinglobal.com/2026-06-12/ai-stocks-zhipu-minimax-slide-as-lock-up-expirations-near-102453785.html
What AI means for your next marketing hire
As AI reshapes the marketing function, Southeast Asian startup founders face a deceptively simple question: what does good actually look like now? AI is restructuring the marketing function faster than most startups have had time to notice. The skills that made a strong marketing hire in 2022 are being automated. The skills that actually matter […] The post What AI means for your next marketing hire appeared first on e27 .
Score: 25🌐 MovesJun 12, 2026https://e27.co/what-ai-means-for-your-next-marketing-hire-20260605/
South Shore News: AI-generated newsletter has a paid audience
South Shore News: AI-generated newsletter has a paid audience The Boston Globe
Score: 25🌐 MovesJun 12, 2026https://www.bostonglobe.com/2026/06/12/business/south-shore-local-news-ai/
Marvell names Adobe's Dan Durn as finance chief amid growing AI demand
Durn will take charge ‌at Marvell ⁠starting ⁠June 15, while Meintjes will remain with the semiconductor company in an advisory role through April 2027 to support the transition.
Score: 25🌐 MovesJun 12, 2026https://economictimes.indiatimes.com/tech/technology/marvell-names-adobes-dan-durn-as-finance-chief-amid-growing-ai-demand/articleshow/131673754.cms
Citations Needed: Magic Encyclopedias to Save the World
Last week FLF launched a competition “ to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases ”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of reasons actually! I’ll gesture at a few here. Conjuring a magic encyclopedia For now, assume with me that it can be done . Wish away with me the various technical and financial challenges. Great! Now we can rapidly conjure up a deeply, fully researched knowledge base on any topic. All claims point back to who’s said them, in what context, and (importantly) with what justifications and evidence (if any). Any quibbles or nuances which have been expressed on a point are similarly readily available. It’s not opinionated: all competing viewpoints with their associated justifications are associated and comparable. That’s way, way too much information! Imagine trying to read everything ever about diet or shipping or taxes or microbes. It’s not happening . So as well as this, we now magically have tools which gather similar points together, summarise, and can make a decent stab at which points we’ll consider most or least relevant . We can dig deeper (or send AI agents to scout deeper) as desired. And when new interesting and informative content arises, or in contexts where nuance and clarification are helpful, it can be bubbled up to our attention. All this is doable today: enough web searches, enough cross-referencing of tweets, articles, journals, following of citation chains, gathering and comparing of hypotheses and points of view, etc. will make progress. But it’s exhausting. When someone does go to those lengths, their partial — but heroic — efforts to map out what’s been said often languish either unpublished or unrecognised. Don’t we already have this? A shining example is Wikipedia, where the collective curatorial effort of a wide range of editors gradually maps out an expanding core of topics and commentary. But Wikipedia has lags, biases, and (perhaps most importantly) huge gaps , especially on important frontier questions. (Let’s not talk about Grokipedia . [1] ) Meanwhile, the tech to ‘smartly browse’ and bubble up informative pieces is nascent too, in bits and pieces like AI chatbots and community notes: already useful in their ways, but faltering, unreliable. It’s these comparison points and the early progress I see which gives me some excitement that the grander vision is viable and that we can take steps towards it now. Who cares? I’m not naive. I know that many (most?) humans a lot of the time aren’t actually interested in finding out or sharing what’s true; mainly they want to say what makes themselves and their friends seem popular and cool… and enemies seem dastardly and disgusting. We all have these impulses to greater or lesser extent. Yes, you too! Sometimes those impulses seem deranged (they’re not designed for the modern world); other times they might even make sense, at least selfishly. Nevertheless (and perhaps mysteriously !), a lot of the time, some people actually want to find out true things and share them. (I do! Do you?) Hence journalism, science… even hearsay and rumour (at their — perhaps rare — finest). We recognise that, when they’re actually anchored and doing their best to be right (or at least less wrong ), those are absolutely foundational to wellbeing and prosperity in a modern society. Without (good) journalism, politics runs astray and tyrants abound. Without (grounded) science and technology, public health suffers, food supply and shelter and infrastructure decay, and progress falters. As Ben Goldhaber and I previously wrote : Knowledge is integral to living life well, at all scales: Individuals manage their life choices : health, career, investment, and others on the basis of what they understand about themselves and their environments. Institutions and governments (ideally) regulate economies, provide security, and uphold the conditions for flourishing under their jurisdictions, only if they can make requisite sense of the systems involved. Technologists and scientists push the boundaries of the known, generating insights and techniques judged valuable by combining a vision for what is possible with a conception of what is desirable (or as proxy, demanded). More broadly, societies negotiate their paths forward through discourse which rests on some reliable, broadly shared access to a body of knowledge and situational awareness about the biggest stakes, people’s varied interests in them, and our shared prospects. (We’re especially interested in how societies and humanity as a whole can navigate the many challenges of the 21st century, most immediately AI, automation, and biotechnology.) But our knowledge-producing institutions are plagued by publish-or-perish and clickbait incentives alike [2] — and the social media landscape is even worse, riddled with misinformation and brainrot from all political quarters . I care about this. So do you, I daresay. I especially care now, as society is poised before a series of important decisions about our future relationship with technology, especially AI. It could be ruinous, with tyranny, neo-feudalism, or extinction real prospects. Or it could be fantastic. Just wanting it to be OK isn’t enough : we have to seek, generate, share, and defend important knowledge — about developments in technology, as well as about trends in politics and power — and act on it. How do we actually help? There’s no single path or silver bullet. But the incredibly high-level picture is: better communication of knowledge is usually good. It helps people be more informed and make better decisions according to their needs. A better shared understanding makes it easier for people to work together toward shared goals (even if they don’t agree on all priorities). On average if people make better decisions and can work together better, we’ll get more flourishing and less catastrophic risk . We’re trying to stimulate one piece of this picture with the knowledge-base direction. Heavy-handedly adjudicating what’s true rarely works. [3] Instead, equip people with the fullest picture possible, as accessibly as possible, and we find our way: evidence adds up, and when it doesn’t, that means we need to look for more. As Scott Alexander wrote years ago (emphasis partly mine), Logical debate has one advantage over narrative, rhetoric, and violence: it’s an asymmetric weapon . That is, it’s a weapon which is stronger in the hands of the good guys than in the hands of the bad guys . In ideal conditions (which may or may not ever happen in real life)... when done right, it can only prove things that are true. … Unless you use asymmetric weapons, the best you can hope for is to win by coincidence. I’m not focused on logical debate per se, and in any case I wouldn’t be so Manichean about it — we’re all ‘good guys’ sometimes and ‘bad guys’ sometimes (whether we mean it or not) — but the articulation is compelling. Humanity has accrued a slowly-growing arsenal of these asymmetric weapons: libraries, citation, scientific review [4] , databases, encyclopedias, web search, to name a few. Today they’re creaking under the weight of a confusing information deluge and assaulted by powerfully-vested interests. I earnestly believe that an upgrade to truth-seekers’ ability to find and scrutinise information, to build and share fuller pictures of topics at hand, can be ‘infectious’ : more people more of the time can see a little further, pierce a little more of the fog of confusion and misinformation, be better epistemically defended, and embody — and exemplify — truth-seeking cognition. When people (through malice or negligence) spread confusion and falsehoods, they’re that bit more likely to face scrutiny and consequences. After all, we’re making that scrutiny cheaper, easier, and more accessible. This applies whether the ‘wielders’ of these new weapons are curious members of the public, scientists, analysts in public institutions, business leaders and technologists, or even the AI assistants those folks recruit to accelerate their work. In politics, that can mean that people engage more often in collaborative, truth-seeking cognition and less in tribal cognition. And in technology, it can mean that more people can stay better abreast of the important shifts and prospects that will shape our future — helping the public hold decisionmakers to account (and choose better ones), and helping those decisionmakers sincerely and deeply engage with the topics at hand. I want these kinds of epistemic heroics to become commonplace, and I want the epistemic giants among us to stride further still. Let's do it ! ^ Though quite a flawed execution, I think the idea behind Grokipedia — namely, to get AI to substantially help with curating knowledge bases and to use that for collective epistemics — was in the right direction. Unfortunately it was mostly a vanity project and little thought appears to have been given to the grounding or validation, making it less useful than Wikipedia. ^ Do you remember the replication crisis , which we’re still dragging ourselves out of? The new disease of importance hacking ? Have you ever critically read a newspaper for rhetorical slant? Taking a more cynical stance, it’s not only clickbait and publish-or-perish (which are regrettable incentive pressures, but hardly attributable to malice). Science and journalism alike have deep political and adversarial infections as well. ^ (And even if I wanted to, I don’t have particularly heavy hands, alas.) ^ I feel compelled to point out that the current state of ‘official’ journal- and conference-managed scientific review is truly dire, especially in some fields including psychology and AI. I hold up the ideal of scientific review, not its pale and diseased shadow as sometimes charaded on Earth. Discuss
Score: 25🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/RyeRYm4FrpqP32a2v/citations-needed-magic-encyclopedias-to-save-the-world
Firefox’s AI kill switch exists. Only 1% of users have flipped it.
Mozilla built an AI kill switch into Firefox after its users demanded one. Only 1% have used it. Another 3% turned off some AI features selectively. The rest left everything on. CEO Anthony Enzor-DeMeo says the point is not the percentage but the choice. “Our community was pretty vocal, especially during the CEO announcement, that […] This story continues at The Next Web
Score: 24🌐 MovesJun 12, 2026https://thenextweb.com/news/mozilla-firefox-ai-kill-switch-1-percent-smart-window-vpn
Crusoe claimed it “paused” a plan to build a Wyoming data center after it failed to win customers including Google
The company was pressured to pause development by Google
Score: 23🌐 MovesJun 12, 2026https://www.techradar.com/pro/crusoe-claimed-it-paused-a-plan-to-build-a-wyoming-data-center-after-it-failed-to-win-customers-including-google
The real cost of AI subscriptions
Exploring the costs of AI subscriptions and how to automate tasks with OpenAI Codex
Score: 22🌐 MovesJun 12, 2026https://www.superhuman.ai/p/the-real-cost-of-ai-subscriptions
I went from skeptic to devotee. Here’s how I would (and wouldn’t) use AI in parenting.
I went from skeptic to devotee. Here’s how I would (and wouldn’t) use AI in parenting. The Boston Globe
Score: 22🌐 MovesJun 12, 2026https://www.bostonglobe.com/2026/06/12/lifestyle/ai-in-parenting/
The 4 best AI website builders
Building a website is no longer a particularly hard task—but it can be an annoying one. If you look at most sites, there's a fair amount of text, images, and general organization to it all. Even with the best tools, it takes a few hours to put together something good. Wouldn't it be great if you could just create a website from scratch in just a few minutes? That's what AI website builders claim to do. The idea is that by using artificial intelligence, AI website builders can streamline everyth
Score: 22🌐 MovesJun 12, 2026https://zapier.com/blog/best-ai-website-builder
SNU opens hands-on robotics lab for students
SNU opens hands-on robotics lab for students 매일경제
Score: 21🌐 MovesJun 12, 2026https://pulse.mk.co.kr/news/english/12072518
Tredence Appoints Tech Industry Veteran Shashank Samant to its Board to Accelerate Hypergrowth and AI Scaling
Tredence today announced the appointment of Shashank Samant as an Independent Director on its board. A visionary entrepreneur with over three decades of experience in the technology sector, Shashank has founded and scaled two multi-billion-dollar tech enterprises across more than 20 countries. His diverse expertise across technology, energy, and consumer durables perfectly complements Tredence’s cross-industry […] The post Tredence Appoints Tech Industry Veteran Shashank Samant to its Board to Accelerate Hypergrowth and AI Scaling appeared first on CXOToday.com .
Score: 21🌐 MovesJun 12, 2026https://cxotoday.com/media-coverage/tredence-appoints-tech-industry-veteran-shashank-samant-to-its-board-to-accelerate-hypergrowth-and-ai-scaling/?utm_source=rss&utm_medium=rss&utm_campaign=tredence-appoints-tech-industry-veteran-shashank-samant-to-its-board-to-accelerate-hypergrowth-and-ai-scaling
Actor Christopher Lee to star in Singapore’s first AI-hybrid drama Crooks
Actor Christopher Lee to star in Singapore’s first AI-hybrid drama Crooks The Straits Times
Score: 20🌐 MovesJun 12, 2026https://www.straitstimes.com/life/entertainment/actor-christopher-lee-to-star-in-singapores-first-ai-hybrid-drama-crooks?ref
2026 NBA mock draft: AI predictions for every first-round pick
2026 NBA mock draft: AI predictions for every first-round pick USA Today
Score: 20🌐 MovesJun 12, 2026https://www.usatoday.com/story/sports/nba/draft/2026/06/11/nba-mock-draft-ai-predictions-picks-rumors-2026-copilot/90507948007/
AI Rivalries, Cyberthreats, and IPO Fever Define This Week in Tech
See what you missed in Daily Tech Insider from June 8–12. The post AI Rivalries, Cyberthreats, and IPO Fever Define This Week in Tech appeared first on TechRepublic .
Score: 20🌐 MovesJun 12, 2026https://www.techrepublic.com/article/ai-rivalries-cyberthreats-and-ipo-fever-define-this-week-in-tech/
Husqvarna’s Automower 410 iQ Robot Mower Is Easy to Set Up and Gets the Job Done
This robotic lawn mower delivers on the promise of true automation.
Score: 20🌐 MovesJun 12, 2026https://www.popularmechanics.com/home/lawn-garden/a71572013/husqvarna-automower-410-iq-robot-mower-review/
LTM launches AI 1000 to develop next generation of FDEs
AI 1000 is built with the purpose of enhancing workforce productivity in creating tangible business outcomes
Score: 20🌐 MovesJun 12, 2026https://www.thehindubusinessline.com/info-tech/ltm-launches-ai-1000-to-develop-next-generation-of-fdes/article71094154.ece
I’m a certified Apple hater, but new Apple Intelligence tools like Spatial Reframe mean I'm considering a switch from Android
After Apple delivered a standout WWDC, I'm stunned to admit I’m considering a switch from Android
Score: 20🌐 MovesJun 12, 2026https://www.techradar.com/tech/im-a-certified-apple-hater-but-new-apple-intelligence-tools-like-spatial-reframe-mean-im-considering-a-switch-from-android
The 8 Best Robot Vacuums Withstood Every Mess We Threw at Them
We scattered crumbs, dust, and debris to determine which models could power through these piles of mess without a hiccup.
Score: 18🌐 MovesJun 12, 2026https://www.popularmechanics.com/home/interior-projects/a32892415/robot-vacuum-reviews/