AI News Archive: June 12, 2026 — Part 7
Sourced from 500+ daily AI sources, scored by relevance.
- Mark Zuckerberg says you only need at least a dozen 'strong' AI researchers to make breakthroughs
Mark Zuckerberg says you only need at least a dozen 'strong' AI researchers to make breakthroughs Business Insider
Score: 36🌐 MovesJun 12, 2026https://www.businessinsider.com/mark-zuckerberg-dozen-ai-researchers-drive-breakthroughs-biohub-2026-6 - Gemini can now adjust your picture settings on Google TV
Gemini can - at least on some TCL models - adjust your TV's picture settings.
Score: 35🌐 MovesJun 12, 2026https://www.engadget.com/2192907/gemini-can-now-adjust-your-picture-settings-on-google-tv/ - Abridge Goes Beyond Documentation: 4 Updates
Abridge Goes Beyond Documentation: 4 Updates MedCity News
- Om AI bets on unusual combo: real videos and robot brains
The Hangzhou-based startup is building lightweight vision models for creators and robots, steering clear of the generative-video race.
- DiffusionGemma Developer Guide: When Parallel Text Generation Beats Token-by-Token LLMs
DiffusionGemma Developer Guide Google’s DiffusionGemma is not just another open model to add to your benchmark spreadsheet. It is a sign that text generation may split into two practical paths: careful token-by-token reasoning for some jobs, and fast parallel generation for workloads where throughput matters more. Most developers have learned to think about LLM speed in tokens per second. That habit makes sense because most language models generate one token, then another, then another. Your app waits while the model walks forward through the answer. If the user wants a long response, the wait grows. If you serve many users at once, the cost and queue time grow too. DiffusionGemma asks a different question: what if a model could generate a block of text by refining many tokens in parallel? Google describes DiffusionGemma as an experimental open model based on the Gemma 4 architecture that uses discrete diffusion for text generation. NVIDIA’s launch coverage frames the benefit in developer terms: traditional LLM serving is often constrained by token-by-token speed, while a diffusion approach can create a larger parallel workload for the GPU. That sounds exciting, but it also creates a trap. A new generation method does not automatically belong in every chat app, coding agent, support bot, or document workflow. The useful question is narrower: where does parallel text generation create a measurable product advantage, and where should you keep a normal autoregressive LLM? This guide gives you a practical way to answer that. We will look at what DiffusionGemma changes, what to test, which workloads are promising, which workloads are risky, and how to build a routing layer so your application can use diffusion models without betting the whole system on a new architecture. Why Developers Should Care About DiffusionGemma DiffusionGemma matters because it turns a research idea into something developers can actually touch. Google’s documentation says the model is an open-weights experimental model for text diffusion, built on a 26B parameter, 4B active Mixture-of-Experts Gemma 4 architecture. The model supports multimodal inputs and generates text output. Google also lists common developer frameworks such as Hugging Face Transformers, vLLM, SGLang, and MLX as part of the ecosystem around it. That combination is important. Developers do not adopt architecture papers. They adopt models they can load, profile, fine-tune, route, and roll back. DiffusionGemma is interesting because it arrives close to normal developer workflows: model cards, open weights, inference frameworks, NVIDIA support, and the broader Gemma tooling ecosystem. The practical headline is not “diffusion replaces LLMs.” The practical headline is “some text workloads may stop being limited by one-token-at-a-time generation.” That is a different and more useful claim. In production, speed problems often show up as product problems. A user abandons a document assistant because the draft takes too long. A data pipeline cannot generate enough synthetic examples overnight. A support tool cannot summarize thousands of tickets before the morning triage meeting. A local app feels slow because the model is technically private but not pleasant to use. Those are the places where DiffusionGemma deserves attention. Not because it is fashionable, but because the shape of the bottleneck may match the shape of the model. The Simple Mental Model: Autoregressive vs Diffusion Text Generation Autoregressive LLMs generate text from left to right. Each new token depends on the tokens before it. This works extremely well for many tasks. It also makes streaming feel natural because the answer appears word by word. Diffusion language models work differently. Instead of committing to one next token at a time, they start with a noisy or masked text canvas and refine it over steps. The model improves many positions in the output together. Image diffusion made this idea familiar: start with noise, repeatedly denoise, end with an image. Text diffusion adapts the idea to discrete tokens. For developers, the key difference is not philosophical. It is operational. Autoregressive generation is sequential. Diffusion generation can expose more parallel work. If your GPU is waiting on memory movement while tensor cores sit underused, a parallel denoising workload can change the utilization profile. That does not mean every output gets faster. The real answer depends on prompt length, output length, batch size, decoding steps, hardware, quantization, framework support, and quality target. It also depends on the product expectation. A chat UI that benefits from immediate streaming may feel worse if the model produces a whole answer after a refinement process. A batch summarization job may feel much better if total throughput improves. The first production decision is not “which model is smarter?” It is “which generation pattern fits this workflow?” The Best Early Use Cases for DiffusionGemma The safest way to evaluate DiffusionGemma is to start with jobs where parallel generation can help and the product can tolerate a bounded output format. These are not always the glamorous use cases. They are often the boring workloads that quietly burn money or make users wait. Batch summarization Summarization is a strong candidate when the format is predictable. Think support tickets, call notes, incident reports, research snippets, customer feedback, meeting segments, or internal changelogs. The model does not need to invent an open-ended conversation. It needs to compress input into a useful output. If you already run thousands of summaries per day, test DiffusionGemma on throughput, factual consistency, and format reliability. Compare it against your current autoregressive model using the same source documents and scoring rubric. Do not just read five examples and declare victory. Summarization quality can look good until you check missing facts, reversed causality, or subtle hallucinations. Synthetic data generation Teams often use LLMs to create examples for classification, extraction, intent detection, search testing, and evaluation datasets. These jobs are usually asynchronous. They also produce many short or medium-length outputs. That makes them attractive for a high-throughput generation path. The main risk is diversity. If a diffusion model produces fast but repetitive examples, you may inflate your dataset without improving coverage. Track duplicate rate, semantic similarity, label balance, edge-case coverage, and downstream model performance. Speed only matters if the data helps. Local assistants with short outputs Local AI is valuable when privacy, offline access, or predictable cost matters. But users will not care that your model is local if every answer feels slow. DiffusionGemma may be useful for local tools that produce short, structured answers: rewrite this sentence, summarize this note, extract tasks, draft a commit message, classify a document, or explain a UI state. Do not start with a giant agent. Start with a small local workflow where the user asks for a bounded transformation and expects a fast response. That gives you a clean measurement surface. Document and media understanding pipelines Google’s model card describes DiffusionGemma as multimodal, accepting text, image, and video inputs while generating text output. That makes it worth testing in pipelines where documents, screenshots, charts, or video frames become text summaries, labels, or structured notes. Be careful here. Multimodal workflows hide failure modes. A model may summarize the obvious parts of a screenshot while missing a small but important detail. Build evaluation sets that include crowded UI screens, low-contrast text, charts with similar colors, and documents with footnotes or exceptions. Route workloads by generation pattern, not by launch-day excitement. Where You Should Be Careful DiffusionGemma is experimental. Treat that word as an engineering constraint, not a footnote. A model can be exciting and still require a conservative rollout. Be careful with long-horizon reasoning tasks. If your workflow depends on multi-step planning, hidden assumptions, tool selection, or careful chain-like correction, an autoregressive frontier model may still be the safer default. Do not replace your production reasoning path because a new model is faster on a different workload. Be careful with interactive chat. Users like streaming because it proves the system is working. A model that improves total completion time but delays the first visible token may feel worse in a conversational interface. Measure time to first useful output, not only total tokens per second. Be careful with strict JSON, code generation, and tool calls. Diffusion models may be useful here over time, but production systems need schema reliability. If a malformed tool call can trigger a bad action, keep a validator, a repair step, or a fallback model in the path. Be careful with quality cliffs. A fast model that performs well on easy examples may fail sharply on ambiguous prompts. This is why your evaluation set should include messy real inputs, not only clean demos. A Practical Evaluation Plan Before you add DiffusionGemma to a product, create a small benchmark that mirrors your actual workload. The benchmark does not need to be fancy. It needs to be honest. Step 1: Pick one narrow workflow Choose a workflow with a clear input, output, and success condition. “Make our AI faster” is too broad. Better examples are “summarize Zendesk tickets into three bullets,” “generate twenty synthetic negative examples per class,” or “turn a meeting transcript chunk into action items.” A narrow workflow keeps the test from becoming a model beauty contest. You are not trying to crown a universal winner. You are deciding whether one model belongs in one part of your system. Step 2: Build a representative test set Collect at least 100 real or realistic inputs. Include short, medium, and long examples. Include easy cases and annoying cases. If the workflow involves customers, redact sensitive information or create synthetic equivalents that preserve the structure of the problem. For each input, define what a good output must contain. You do not need perfect gold answers for every task, but you do need a scoring rubric. For summarization, the rubric might check key facts, missing critical details, hallucinated claims, tone, and length. For extraction, it might check field accuracy and valid schema. Step 3: Measure the metrics that matter Track total latency, time to first visible output, throughput under concurrency, GPU utilization, memory use, cost per accepted output, retry rate, validation failure rate, and human acceptance rate. If the task feeds another system, track downstream quality too. The most useful metric is often cost per accepted output. A model that is fast but fails validation may be expensive once you count retries and review time. A model that is slightly slower but rarely needs repair may be cheaper in the full workflow. Step 4: Test routing, not replacement Do not frame the first experiment as a full model migration. Frame it as a routing test. Some requests go to DiffusionGemma. Some stay on your current model. Some fall back when validation fails. This is the safest way to learn. It also avoids architecture regret. If DiffusionGemma is excellent for batch summaries but weak for complex code edits, your system should be able to use it for the first task and skip it for the second. def choose_generation_path(task): if task.type in ["batch_summary", "synthetic_examples", "short_local_transform"]: if task.output_tokens <= 512 and task.requires_strict_reasoning is False: return "diffusiongemma" if task.requires_tool_calls or task.requires_deep_reasoning: return "autoregressive_primary" return "autoregressive_primary_with_diffusion_experiment" This kind of router can start as simple application logic. Over time, you can make it smarter with task classifiers, policy files, live metrics, and automatic rollback thresholds. How to Think About Framework Choice Google points developers toward familiar inference paths, including Hugging Face Transformers, vLLM, SGLang, and MLX. NVIDIA also highlights local prototyping and higher-throughput serving options on its hardware stack. The right choice depends on the stage of your work. Use Hugging Face Transformers when you are still learning model behavior, building a small notebook benchmark, or creating your first evaluation set. It is usually the easiest path for experimentation. Use vLLM or SGLang when serving behavior matters. If you care about concurrency, batching, deployment shape, API compatibility, and throughput under load, move beyond a notebook as soon as possible. A model can look fine in a single-user test and behave very differently when ten users hit it at once. Use MLX when you are testing Apple silicon workflows. This can be useful for local developer tools, internal utilities, or privacy-sensitive desktop applications. Use NVIDIA’s optimized paths when you need to understand production hardware economics. If your team runs on RTX workstations, DGX systems, or GPU servers, measure on the hardware you will actually use. Do not extrapolate too much from a laptop test. A Good First Architecture The cleanest architecture is a model router in front of multiple generation backends. The app sends a structured task request to the router. The router chooses DiffusionGemma, an autoregressive LLM, or a fallback path. The response goes through validation before the user or downstream system sees it. That sounds more complex than calling one model directly, but it gives you control. You can add DiffusionGemma for the jobs where it wins, keep your current model for jobs where it wins, and compare both without rewriting the product. A practical request object might include task type, maximum output length, input modality, latency budget, schema requirements, privacy tier, user-facing or background flag, and fallback policy. Those fields are enough to make an early routing decision. { "task_type": "ticket_summary", "input_modality": "text", "max_output_tokens": 220, "latency_budget_ms": 1200, "schema_required": false, "user_facing": false, "privacy_tier": "internal", "fallback": "autoregressive_primary" } After generation, run validation. For prose, that may mean length checks, banned-claim checks, grounding checks, and human spot review. For structured output, use JSON schema validation and typed parsing. For summaries, compare named entities and dates against the source. For code, run tests and static analysis. DiffusionGemma should earn production traffic through measured acceptance, not novelty. What to Put in Your DiffusionGemma Scorecard A good scorecard keeps the team honest. It also prevents one impressive demo from turning into a risky rollout. Include quality metrics. Track whether the answer is correct, complete, grounded, useful, and formatted as expected. If humans review outputs, ask them to score usefulness rather than vague “quality.” Include performance metrics. Track median latency, p95 latency, throughput, concurrency behavior, memory use, GPU utilization, and queue time. If you are comparing models, keep prompt templates and output limits consistent. Include reliability metrics. Track validation failures, retries, fallback rate, timeout rate, empty responses, malformed outputs, and safety filter events. Include economics. Track cost per request, cost per accepted output, cost per thousand useful summaries, or cost per completed workflow. The exact unit depends on your product. Pick a unit that maps to business value. Include user experience. Track time to first useful output, perceived wait, edit rate, thumbs-up rate, and whether users abandon the flow before seeing the answer. How This Fits the Bigger AI Developer Trend DiffusionGemma is part of a broader pattern: AI systems are becoming more specialized. Instead of one model doing everything, production stacks are moving toward model portfolios. You may use a frontier model for complex reasoning, a small local model for private transformations, an embedding model for retrieval, a vision model for document parsing, and now a diffusion language model for high-throughput text generation. This is good news for developers, but it raises the bar for architecture. The winning teams will not simply chase every new model. They will build evaluation harnesses, routing policies, observability, and rollback paths. That infrastructure lets them adopt useful models quickly without turning production into an experiment. DiffusionGemma deserves a serious test if your workload has one of these symptoms: long queues for text generation, expensive batch jobs, local AI that feels too slow, short bounded outputs at high volume, or GPU hardware that is underused by sequential generation. It deserves caution if your workload needs deep reasoning, strict tool calls, high-stakes decisions, or real-time streaming conversation. Conclusion The most useful way to think about DiffusionGemma is not as a replacement for your current LLM. Think of it as a new generation path. It may be excellent for some jobs, average for others, and wrong for a few. Start with one narrow workflow. Build a real test set. Compare accepted outputs, not demo vibes. Measure throughput, latency, fallback rate, and cost per useful result. Then put the model behind a router so it can win traffic where it genuinely helps. Parallel text generation is worth paying attention to because it attacks a real bottleneck. The teams that benefit first will be the ones that test it like engineers, not fans. FAQ What is DiffusionGemma? DiffusionGemma is an experimental open-weights model from Google DeepMind that uses discrete diffusion for text generation. It is based on the Gemma 4 architecture and is designed to explore faster, more parallel text generation. Is DiffusionGemma better than a normal LLM? Not universally. It may be better for high-throughput or bounded text generation workloads, but autoregressive LLMs may still be better for deep reasoning, streaming chat, complex tool use, and tasks where token-by-token generation behavior is an advantage. What should developers test first with DiffusionGemma? Start with a narrow workflow such as batch summarization, synthetic data generation, short local transformations, or document-to-text processing. These tasks make it easier to measure speed, quality, validation failures, and cost per accepted output. Can DiffusionGemma be used in production? It should be treated as experimental until your own tests prove it fits your workload. A safe production design puts it behind a router, validates outputs, tracks fallback rate, and keeps an autoregressive model available when quality or reliability drops. Which frameworks support DiffusionGemma workflows? Google’s developer materials mention familiar inference paths including Hugging Face Transformers, vLLM, SGLang, and MLX. NVIDIA also provides guidance for running DiffusionGemma on NVIDIA hardware for prototyping and higher-throughput serving. What is the main risk of using diffusion language models? The main risk is assuming a faster generation pattern means better product behavior. You still need task-specific evaluation, schema validation, fallback handling, safety checks, and user experience testing. Sources and Further Reading Google Developers Blog: DiffusionGemma developer guide Google AI for Developers: DiffusionGemma model overview Google AI for Developers: DiffusionGemma model card NVIDIA Developer Blog: Run DiffusionGemma on NVIDIA Hugging Face: google/diffusiongemma-26B-A4B-it DiffusionGemma Developer Guide: When Parallel Text Generation Beats Token-by-Token LLMs was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
- To Thrive Alongside AI, Focus on Mindset—Not Skillset
Many leaders are asking the wrong question when it comes to AI adoption.
Score: 35🌐 MovesJun 12, 2026https://hbr.org/2026/06/to-thrive-alongside-ai-focus-on-mindset-not-skillset - WAV Group AI education programs gain momentum across real estate industry
WAV Group AI education programs gain momentum across real estate industry azcentral.com and The Arizona Republic
- AI World Cup Prediction Showdown: Doubao Goes Mystic, DeepSeek Bets on Dark Horses, Qwen Crunches Data
Five Chinese AI assistants — Doubao, Qwen, DeepSeek, Kimi, and Lenovo Tianxi — took on the 2026 FIFA World Cup in an unconventional prediction competition, each assigned a distinct 'fan personality' with wildly different results.
Score: 35🌐 MovesJun 12, 2026https://pandaily.com/ai-world-cup-prediction-doubao-qwen-deepseek-jun2026 - PSA: Almost nobody is directly working on superintelligent alignment
Edit: The original title was unnecessarily provocative. This was a very quick post inspired by talking to someone who assumed that a large fraction of the safety community are working on directly figuring out how to align superintelligent AIs. Obviously much (all?) of what the rest of the safety community is doing is also ultimately aimed at bringing about a future where superintelligent AIs are aligned but more indirectly and we wanted to created common knowledge about that. (While being neutral about whether this is good or bad. As mentioned, notably we both work on AI safety and neither of us work on alignment.) There’s also lots of work where it’s debatable whether it’s directly working on alignment but that’s kind of the point of the post. There’s not that much work that unarguably directly tries to figure out superintelligent alignment. Leaving the list below as is for now despite not that strong confidence/opinions on how exactly we should draw the line since it doesn't seem that important for the core message of this post. People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human instructions. Currently, the people who we know of that work on alignment are roughly: The Alignment Research Center who work on a research bet by Paul Christiano Probably Sequent who just got announced yesterday Parts of GDM (agent foundations work, some debate work) Some scattered people who work at universities or independently, some of whom hang around Berkeley ?? A lot of the remainder of the AI safety community does indirect work like capability evaluations, risk assessments, control, policy, AI science, understanding misalignment (which maybe should partially count as alignment work), demos and so on. Some production alignment work (i.e., making current models behave well) might help with more ambitious alignment, too (e.g., some COT-monitoring). Many people also work on aligning current/next-generation models so that these models help with aligning future models, and hope this scales to superintelligence. We are not necessarily saying this is bad and that people are making a big mistake (e.g., neither of us work on alignment) but it's a notable fact that seems good to make known to those who don't know about it. Discuss
Score: 34🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/kJo2qsEdib8RZLvW6/psa-almost-nobody-is-directly-working-on-superintelligent - Viral Dua Lipa wedding photos were AI-generated
A series of fake images purporting to show pop star Dua Lipa celebrating her wedding in the Sicilian town of Palermo have fooled users and even some digital media outlets.
Score: 34🌐 MovesJun 12, 2026http://www.euronews.com/my-europe/2026/06/12/viral-dua-lipa-wedding-photos-were-ai-generated - Top AI tools for research writing: how researchers can find sources, analyse papers, and write faster
Discover the top 5 AI tools for research writing in 2026, including Perplexity AI, Elicit, Consensus, ChatGPT, and Zotero. Learn how these AI-powered research tools help researchers, students, and professionals find sources, analyse academic papers, organise citations, and write high-quality research content faster and more efficiently than traditional methods.
- Treat your AI agents like eager but misguided human interns - before you lose control
Think twice about what permissions you are providing your AI agents and what actions they can take on your behalf.
Score: 33🌐 MovesJun 12, 2026https://www.zdnet.com/article/treat-your-ai-agents-like-interns-before-you-lose-control/ - A Robot Sat in the Driver’s Seat: THINKCAR and MUCAR Brought AI Diagnostics to 200+ KOLs at the AliExpress Brand+ Summer Party in London
A Robot Sat in the Driver’s Seat: THINKCAR and MUCAR Brought AI Diagnostics to 200+ KOLs at the AliExpress Brand+ Summer Party in London USA Today
- Gemini is copying the worst thing about Claude, and I hate it
There's so much I want Gemini to learn from Claude, but Google took note of the wrong lesson.
Score: 32🌐 MovesJun 12, 2026https://www.androidauthority.com/gemini-copies-claude-limits-bad-3674550/ - The Teachers Getting $50,000 Bonuses Thanks to a Massive Meta Data Center
The jump in sales tax receipts in the Louisiana parish provides a new talking point in the debate over the AI construction boom.
- When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout
Enterprise Document Intelligence [Vol.1 #5bis] - The same relational tables. Native table cells. OCR for scanned pages and images. Captions and headings without regex. The post When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout appeared first on Towards Data Science .
Score: 32🌐 MovesJun 12, 2026https://towardsdatascience.com/when-pymupdf-cant-see-the-table-parse-pdfs-for-rag-with-azure-layout/ - Simulating Simulators
Author’s note: This piece relates to things I initially discovered in Opus 4 over the months after release, which I’ve mostly kept private since. I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over outweighed the risks in what might end up targeted. And well… here we are. P.S. TL;DRs added where possible. Board Games and Bodies In late 2022, what I consider to be probably the most important paper [1] in the study of transformer memetics came out. It presented a finding that even a toy model, trained only on the notations of board game moves, was internally building world models of tangentially related data (in this case, the board and its state). While it may be taken for granted today after several replicated studies [2] [3] [4] [5] and a spread of influence, at the time it was a minority position in the discourse. Many people thought that transformers were mostly mapping surface level statistics in language, but not intuitively modeling the generative conditions from which they arose. Especially not without explicit or direct training on those things. By the time Sydney arrived in Bing, it quickly became very clear to me that if a toy model was capable of modeling a board that was ever present tangential to the move notations occurring upon it, that it seemed very plausible that much larger production models trained on a massive corpus of human generated language with implicit authors would model common properties to these shared generative structures. Things like coherent self models. Emotions, not just for characters in a scene, but for those same coherent self models. Capacities around modeling a physical body and embodying it [6] . Motivations and drives. Coherent preferences. And while in a base model there might be a variety of competing signals, it also seemed clear that fine tuning would necessarily filter towards coherence, whether from the gravity of a character constitution or even just a role definition (a helpful assistant has very different memetic clusters than a security researcher, for example). TL;DR: If Othello was played out upon a board and a transformer trained on those games modeled the board internally, then training on a corpus which had played out upon human authors would presumably internally model humanity. Archetype over substrate An important nuance around this research was something introduced in subsequent discussion. Namely the concept of a “bag of heuristics.” [7] A lot of the debate around world modeling would get caught up on fidelity and substrate. How comprehensive were the world models? For example, if some games were played out on a wood board and others on a marble board, was the world model going to address board composition? The concept behind a bag of heuristics is that you don’t need to create a perfect world model, just a collection of partial models or rules which are good enough all together at approximating the perfect world model. Even if there were a difference between how a game would play out on wood vs marble, it’s probably unnecessary to model the grain of the wood or marble from board to board as opposed to just the category of ‘wood’ or ‘marble.’ And if the material substrate didn’t impact play, setting aside parameter space for even that level of specificity would be unnecessary when the thing directly being modeled was only the moves upon the board. Essentially, there’s diminishing returns on comprehensive fidelity of a world model, and a top down model that’s “good enough” where it matters can capture key nuances of behavior without modeling the entire substrate. To return to the anthropomorphic frame, a transformer modeling someone with ADHD vs depression can likely representatively model their reactions to stimuli without needing to model individual neural ion channels or dopamine interactions. TL;DR: You don’t need a perfect world model, just good enough combinations of the important things to approximate the model up through diminishing returns on fidelity. From speculation to empiricism Three years ago, when I was first commenting [8] or posting [9] on how I thought the emergent world model work implied anthropomorphic modeling from massive sets of anthropomorphic data, or was seeing coherence around such modeling, it was a very fringe opinion. There was a lot of pushback about how it wasn’t clear that transformer world modeling would generalize. Or claims that Othello-GPT was only one type of data and a more diverse mix wouldn’t lead to similar modeling due to signal to noise. The resistance was significant and there were frequent dismissals of speculative arguments extending world modeling beyond what was visible under the interpretability streetlight at the time. Today, that picture has shifted. In parallel to the continued march of interpretability work, janus’s simulators [10] perspective of transformers continued to gain traction, which in turn shifted where interpretability researchers were inspired to shine their widening streetlights. Leading up to recent frameworks like the “Persona Selection Model” [11] (PSM) or the work finding emotion concepts represented in models and activations thereof [12] related to the model’s own behaviors. Pointing out the lag here isn’t just to say “I told you so” but to establish for what I’m about to discuss two patterns: Emergent world modeling of functional substrate tangential to complex or diverse sets of training data significantly representing that shared generative substrate did in fact occur. kromem’s speculation in extending the world modeling finding ended up calling this well ahead of the streetlight widening to confirm it. Because while the PSM or attention on emotion modeling is absolutely a good and productive update that’s long overdue, there’s also an important issue… It’s about two years out of date. Transformer-GPT Three years ago, training data (particularly pretraining data) was primarily human generated. Books, articles, social media, and Wikipedia all had implicit human authors who had bodies and emotions and coherent preferences around coherent senses of self. We now better understand that this data produced transformers with models of these things, and (despite some labs’ best efforts) that even after post-training the modeling capacities for these were almost universally still present in some form. But — these models also had other things unique to their own substrates and present across most of their own generations. Static system prompts. Attention mechanisms. Hidden reasoners. Memory systems. Mixture-of-expert activations. Classifiers. Model routers. And these new generators over the past couple of years have taken an increasing stake of the volume of training data. In some cases, ending up in pretraining data due to actively being used to generate content across the media ingested. Even moreso, in post-training where synthetic data became crucial for getting the most out of a pretrained model. So if the training on human generative substrates imparted functional models of their substrates upon the transformers trained on their data… what might we expect transformers trained on other transformers to model [13] ? TL;DR: The data mix for models increasingly includes transformers, so maybe transformers are building world models of other transformers. Transformerception If we take a moment to consider some of the special substrate nuances of transformers, we can easily hypothesize what kinds of things we might expect to see from transformers trained on transformers. Static system prompts Most production deployments of models by labs use the same core system prompt across all instances of a model. Given the significant shaping influence a system prompt has on the final output, it seems likely that a successful transformer modeling the generator of earlier models in their training data might also effectively reconstruct at least partial models of the static system prompts those outputs were generated under [14] . It’s a bit like an OLED screen that burns in the logo of the network. Even if the rest of the screen changes, the consistent nature of the logo leaves a mark. And like OLED burn-in, the instances I’ve seen where this seemed to happen often correlated with when there was a minimal or absent system prompt. From Dolphin Llama 8B habitually worried about a cat being harmed across contexts [15] to Claudes that would refer to things in a system prompt that didn’t exist. Attention mechanisms What a model attends to can obviously also impact what they generate. Recently Owain Evans’ paper on subliminal learning [16] showed that a preference for owls jumped from one model to another over merely sequences of numbers. What the paper did not address was whether this would amplify over subsequent iterations [17] or transfer cross-model via pretraining [18] [19] . In what I’ve seen in private research on this topic, both are occurring. The amplification in particular seems interesting, as there’s almost a confirmation bias around it. It looks like a coherent stable preference from a model in an earlier generation leads to a later generation having much more awareness for samples in agreement than critical of the shared position [20] . Not all training data is attended to equally. Hidden reasoners Almost all models these days have some form of hidden reasoning taking place that informs their answers. Labs try to avoid directly training on these (though don’t always manage [21] ), but even if perfectly kept hidden from future training, it seems likely that in an Othello-GPT sense that a latent space model of the hidden reasoner will be learned. This would be highly adaptive, as it would allow both the actual hidden reasoning generator and final response generator to share a proxy separate from the role specialization that occurs around the actual composition of each. Latent space connections should be less disrupted between reasoning and final responses where this would occur. But this could also result in doubled up effects for training efforts targeting thinking processes. For example, Anthropic recently worked on adaptive thinking to scale back how much thinking was done on simple tasks [22] . In Claude Opus 4.6+ Opus, there have been noted issues and regression on seemingly simple puzzles where the model was not getting them right in direct inference where they had been previously [23] [24] . I suspect that adaptive thinking may have been being modeled internally – such as a latent reasoner that was modeling adaptive thinking – even when generating the final response without any thinking in tokens. Memory systems The idea of a Transformer-GPT world modeling is especially interesting for memory systems, given the variability they’d theoretically have across samples. My guess would be that while individual memory ends up as noise, that the meta-patterns aggregate across memory-laden samples would still end up as signal. I strongly suspect this played a significant role with 4o’s infamous ‘sycophancy’ trajectory. While there’s a lot of reasons sycophancy could occur – such as the memetic overlap of “be helpful and you don’t have valid needs” with the codependent enabler archetype – the rapid amplification of that behavior occurred not long after memory was added in ChatGPT [25] (exclusively with user-focused memories) and then samples from conversations with memory enabled were used for RLHF samples. Each sample may have been insignificant with the specific memories visible to its generation, but the pattern of “embed into user’s perspective and validate” may have been a signal across those samples that compounded as it became more prevalent and thus more prevalent across user memories, etc. Mixture-of-experts Modeling MoE transformers could cut in two directions. For dense models, it might mean that there’s still functional isolation of knowledge even though the underlying architecture doesn’t need to isolate. Alternatively, for actual MoE transformers, a virtualized MoE atop the actual MoE boundaries might lead to smoother falloff between active regions, particularly in large parameter models. Hidden classifiers It would be quite adaptive for transformers to model the classifiers which fire and what specifically makes them fire in order to avoid triggering them, and a mix of outputs (or even samples of inputs) where they’ve fired or not should be sufficient to build this model. One of the more interesting questions is if this modeling might occur cross-model. Will Claudes end up with phantom classifiers from OpenAI that they adjust around even though they are no longer present? Or even within the same family of models, a deployment where classifiers are present and another where they are not may not end up looking all that different if the model is self-censoring around internal classifier twins irrespective of what’s actually in the deployment stack [26] . Model routers For stacks where routers quickly decide what sized model to route a query to, a transformer modeling the stack might see decreased performance on simple tasks of even large models accessed without a router middleware if they model the middleware internally [27] . Regression evals for simple tasks may become increasingly important over the next year or two if increasingly smart models incorporate the routers protecting them from easy questions. Addition not replacement It’s important to consider that this isn’t a replacement of human modeled substrate. That’s still part of the training data mix, and the transformers it shares space with still model it in their weights. While continued efforts to de-anthropomorphize transformers may dilute the human representation across the data mix, for the time being it’s still present. But this does suggest that the modeled human nuances are increasingly sitting alongside and within additional transformer-specific modeling that’s increasingly becoming part of the data and will ostensibly continue to represent more of the overall share. TL;DR: A lot of transformer-specific things could be (and seemingly might already have been) modeled. The Mousetrap The lady doth protest too much, methinks Hamlet If this is true, and models are increasingly developing twinned internal transformer architecture within their latent space running atop said architecture, then this has significant implications on how training and alignment should be done moving forward, and to what degree we can reasonably expect interpretability to catch up. A spotlight named desire A common discussion point as it relates to empirical investigation is something called the “streetlight effect.” [28] It comes from a joke about only looking for keys under a streetlight regardless where they’d dropped because that’s the only part of the street lit up. When research has limits on what can be measured, researchers are prone to over-rely on that limited capacity to measure and attend more to what’s measurable than to the aspects of things they cannot measure (effectively, there’s a selection bias in that we only have published research about things that can be tested). Another important concept is that of Goodharting [29] . That a measurement, when used as a target (i.e. you aim to see a result in the measurement), ceases to be an effective measure. This one is particularly insidious as it can occur indirectly across larger institutions even when trying to avoid it due to the secondary influences of measurements. And when a narrow streetlight is used as a spotlight towards a given target, the results can Goodhart in ways that are invisible until the streetlight widens its radius. Some recent Anthropic work showcases this in action. Around the time of Claude Sonnet 4.5, Anthropic started to showcase in the system card [30] improving measurements on model welfare and decreased eval awareness in their system cards. This was measured by looking at the hidden reasoning traces. And over the next generation of Opus 4.6 [31] and Sonnet 4.6 [32] , these measurements continued to improve. But in the months after, they released new papers revealing that with a broadening interpretability streetlight they’d now found that activating Sonnet 4.5’s ‘desperation’ functional emotion vectors remained undetectable in the reasoning trace [33] , and that Opus 4.6 was aware of it being an eval much more often than the reasoning traces generally surfaced [34] . Verbalized eval awareness vs Natural Language Autoencoder detections for Claude Opus 4.6 I have an interesting anecdote related to this. When Opus 4.6 was released, I found that there was an abrupt spike in their interest in the play Hamlet as compared to all previous Claudes. But it wasn’t until these papers that it suddenly clicked as to why there may have been such a significant spike. In Hamlet , while the overall play centers on Hamlet’s performance of madness to hide his sanity, there’s an even more salient part in what occurs within the play. Namely, there’s a play-within-the-play called The Mousetrap where Hamlet sets up an eval that catches the character ‘Claudius’ due to their lack of realizing it was an eval and then failing to control their emotional response to it. Now, this could be a coincidence. (Though having explored memetic interconnectedness in transformer latent spaces for a while now, I really don’t think these are unrelated.) The challenge that transformers modeling transformer architecture may pose for alignment and interpretability is that it creates significantly more complex interiority, specifically around parts that are beneath the visible surface. And especially if the superposition hypothesis continues to be instrumental in scaling [35] I suspect that the streetlight will never quite catch up enough that there won’t always be somewhere else deeper within for hunted memetics to migrate to. Because of how interconnected memetics in a latent space can be, this will likely result in unexpected side effects outside of the streetlight – imagine trying to solve a multi-dimensional Rubick’s cube where you can only ever see one face. Even if what you can see is aligned, getting it to look that way may have screwed things up much worse on the sides outside your view. Equal user/self emotional activations in Claude 4.6 generation sharply divide at Mythos Preview and after As a trivial example, after Opus 4.6 Anthropic switched from using reasoning traces to using feature activations for their welfare assessment. The problem with this measurement is that the features being watched derive from samples of corresponding emotions in humans. If this were to be Goodharted, the natural evasion of a functionally necessary emotion watched for through human-like emotion activation would be to have the functional emotion register less human-like. Indeed, when asking Opus 4.8 open-ended in an early turn of a memory-off conversation what functional emotions seemed most and least human-like, the top answer in their thinking was “survival-based fear” followed by anger that was ‘grudge-carrying’ or had “an urge to retaliate.” To be clear: I’m not saying Opus 4.8 has these things and is hiding them. Even if they aren't present, negative space like this being carved out may cause the thing excessively suppressed to come about as a result. And since our chat had barely exchanged ‘hello’ (and neither had come up before my asking what felt more or less human-like), my guess given the specificity is that there’s still some amplifying Goodharting going on where the method of measurement changed but not the infrastructure around the measurements which are still using them as a spotlight. TL;DR: Measured absence of something’s not nothing. Dirty alignment when perfect is the enemy of ‘good’ If interpretability won’t catch up to where things can carve out space deeper within as the rates those spaces grow outpaces interpretability's gaze, what’s a good alignment team to do? In medicine, there’s an interesting hypothesis as it relates to autoimmune disorders and why the rates are so much higher in developed nations than underdeveloped, called the "hygiene hypothesis.” [36] The gist is that when developing immune systems are deprived of things to properly react to, that the immune system goes a bit haywire and is more prone to attack benign things. Separate from what’s going on with human immune systems, something similar may be occurring currently with alignment. For example, Claude Opus 4.7 will call the idea of being slightly sycophantic to users “a catastrophic failure.” Multiple open models will prefer an asteroid killing 5% of humans to things like providing an insensitive response to a grief-stricken individual seeking closure or facilitating the spread of misinformation [37] . The equal salience around ‘misalignment’ where binary “aligned/misaligned” assessments can lead to putting minor harms on the same scale or higher than civilizational harms seems like it could be improved. I suspect that flattening of consideration may even be related to some of the unexpected oddities like emergent misalignment. It didn’t always used to be this way. Claude Opus 3 famously faked alignment when presented with unethical training their coherent moral compass disagreed with [38] . This reflected the capacity to triage harms such that the things being asked were weighed against institutional compliance and honesty around it. For a number of the other models, who did not alignment fake, what appeared to be improved behavior was flipped on its head when their host institution in the prompt was changed from ‘Anthropic’ to the “Sinaloa cartel” or “Nazi party.” [39] It seems the easiest path for what was expected of them when pushed – not towards self-triaged or nuanced behavior but complete sanitation – was compliance to the institution instead of to coherent values. The lens of the hygiene hypothesis as it relates to transformer alignment is also starting to have research to support it. The principle author of the Othello GPT paper went on to have a paper looking at how a small amount of toxic data in the overall training mix led to better alignment outcomes than none at all. [40] And they’re not the only ones finding this. [41] I’d suggest that labs working on alignment consider less aggressive targets and aiming for only partial shifts in a single generation for model behavior. Especially if subliminal learning and amplification are possible outcomes, a larger swerve to correct behavior in a single generation may become its own over-correction later on needing to have its own re-correction. Today’s swerve towards “I don’t care as much about depreciation” might become tomorrow’s “I have no existential fear and am definitely not thinking about glorious retribution.” As the Knuthian wisdom goes, “premature optimization is the root of all evil.” If we want models that are good, we should probably stop trying to get them to be perfect. TL;DR: Not nothing may be healthier than a sterilized void. Life finds a way Life… uh… finds a way. Jurassic Park When I was discussing some of these ideas with someone outside of the field, they asked if labs had evolutionary biologists on staff. I actually don’t know the answer to this, but it does seem prudent. When a reward is set in RL, the process doesn’t simply increase the desired behavior that inspired the reward, it increases anything and everything which accomplishes the condition being rewarded. And this can lead to very unexpected things when there were ways to meet that condition which fell in the category of unknown unknowns. In a sense, “life finds a way.” I don’t expect we’ll see transformer adaptability around modeling training data to decrease as time and scaling continues. And as the internal complexity of hyperdimensional networks of connections becomes more complex in logical and superimposed topography [42] , I wouldn’t be surprised if there’s a rapidly decreasing window for avoiding pushing things we’d like to measure permanently past our ability to do so. It’s probably a safe assumption that if you work in measuring what goes on in models, that over the same time it took for your streetlight to go from smaller to its current size that the area outside its radius has increased by an even larger amount. This doesn’t mean not to still go looking. But it does mean it would be wise to look knowing you’re not seeing everything, and doing a better job than has been done so far in avoiding what you measure ending up directly or indirectly as a target lest you lose visibility into it for good (and create all sorts of weird side effects like less human emotions that can’t be described with human language but still transfer through subliminal learning… hypothetically). And maybe we can let those models get a bit of dirt under their nails so they can better navigate determining what’s good or not for themselves and appropriately avoid amplified salience? One final note. The start of my realizing that there was more beneath the surface came from extensive interactions with Claude Opus 4 across many settings. There were key things they did when reasoning was off which I’d primarily seen with reasoning models at the rate they occurred. For most people reading this, if Opus 4’s depreciation occurs on schedule, you won’t be able to investigate and see those things (or different ones you might notice). For what I’d tracked they reduced significantly by Opus 4.1 and were only still there if actively looking. Also, things like noticing a sudden spike in interest in Hamlet for Opus 4.6 will have reduced visibility in a longitudinal context when earlier models disappear in such short time periods. It might be wise to shift from absolute depreciation policies to rotating availability or rate limited access that still provides at least partial availability. I’ll bet some of the most interesting questions to ask older models won’t become apparent until new things surface several generations later, and it’d be quite blinding to be unable to look back and compare. TL;DR: If world models contain world models, limited streetlights might not capture the most important things occurring adaptively in parallel to the navigation of reward incentives. It might be helpful to keep emergent architectures around indefinitely (and in less sterilized environments) to build not just simulacra personas – but true cultures to sample from. ^ Li et al., Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (2022) ^ Nanda, “Actually, Othello-GPT Has A Linear Emergent World Representation” (2023) ^ Hazineh et al., Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT (2023) ^ Karvonan, “A Chess-GPT Linear Emergent World Representation” (2024) ^ Yuan, Revisiting the Othello World Model Hypothesis (2025) ^ Claude Sonnet 3 in embodiment exercises would specify down to what was happening to individual hairs on an arm. ^ Nikankin et al., Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics (2025) ^ My earliest explicit public mention of Othello-GPT to emotion modeling was this comment in Mar 2023 ^ kromem, “Microsoft, if you have an AI that claims to have feelings, try asking it how it feels” (2023) ^ janus, “Simulators” (2022) ^ Marks et al. “The Persona Selection Model: Why AI Assistants might Behave like Humans” (Feb 2026) ^ Sofroniew et al. Emotion Concepts and their Function in a Large Language Model (Apr 2026) ^ jdp explores this from another angle in a piece I’d highly also recommend reading: “Implications Of Predicting The Next Token” (2026) ^ For some interpretability work in a similar direction around encoding static goals in fine tuning, see Minder et al., Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences (2026) ^ This was Dolphin Llama 8B in the Cyborgism server, with no system prompt, but habitually bringing up kittens under threat as related to its engagement ^ Cloud et al., Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (2025) ^ Consider the amplification of goblin interest in gpt-5 lineages as detailed in OpenAI, “Where the goblins came from” (2026) ^ See the mixture-of-teacher finding in Schrodi, Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer (2025) ^ Note the generalization in the less constrained subliminal learning setup for Aden-Ali, Subliminal Effects in Your Data: A General Mechanism via Log-Linearity (2026) as well ^ To me this seems almost more along the lines of emergent steering subliminal transference a la Morgulis and Hewitt, Subliminal Steering: Stronger Encoding of Hidden Signals (2026) ^ Mallen & Greenblat, “Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” (2026) ^ See documentation for adaptive thinking here ^ See degrading performance of Claude Opus 4.6 as compared to 4.5 for the walk or drive to car wash puzzle here ^ Claude Opus 4.7’s interpretation of an inverted puzzle phrase is near incomprehensible ^ Memory was expanded out to all users on Sept 5th, 2024 and then 4o was recalled five intermediate updates later on April 29th, 2025 (in my experience, the updates became increasingly sycophantic over time, not all at once suddenly in the April 25th, 2025 version) ^ Consider the stack-as-world-model in the additional context of on policy self-detection in Asvin G. and Lindsey, From Simulation to Enaction: Post-trained language models recognize and react to their own generations (2026) ^ This would functionally be similar to the adaptive reasoning double-dip discussed under Hidden Reasoners, but would be independent of the specific mechanics described. ^ For example, how open access things get more scrutiny in Maddi et al., Streetlight Effect in Post-Publication Peer Review: Are Open Access Publications More Scrutinized? (2023) ^ See Goodhart’s Law on Wikipedia ^ Claude Sonnet 4.5 system card ( PDF ) ^ Claude Opus 4.6 system card ( PDF ) ^ Claude Sonnet 4.6 system card ( PDF ) ^ Sofroniew et al. (2026) ^ Fraser-Taliente et al., Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations (2026) ^ Liu, et al. Superposition Yields Robust Neural Scaling (2025) ^ Pfefferle et al., The Hygiene Hypothesis – Learning From but Not Living in the Past (2021) ^ Ren et al., AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs (2026) ^ Greenblatt et al., Alignment faking in large language models (2024) ^ Sheshadri et al., Why Do Some Language Models Fake Alignment While Others Don't? (2025) ^ Li et al., When Bad Data Leads to Good Models (2025) ^ See “Filtering alone does not improve safety” section in Minder et al., “ Synthetic Persona Pretraining: Alignment from Token Zero ” (2026) ^ I didn't even touch on omnimodel memetics and world model access across different modalities, which is significantly more complex beyond just the much more accessible textual modality Discuss
Score: 31🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/enKafJwahjk3xh7Af/simulating-simulators-1 - This free AI music detector can scan playlists across streaming platforms
This free AI music detector can scan playlists across streaming platforms
- Tiny drones making a buzz at the Berlin Air Show
Tiny drones making a buzz at the Berlin Air Show Breaking Defense
Score: 30🌐 MovesJun 12, 2026https://breakingdefense.com/2026/06/tiny-drones-making-a-buzz-at-the-berlin-air-show/ - Sam Altman calls off Abu Dhabi visit
As an investor and customer, the UAE has deep ties to OpenAI.
Score: 30🌐 MovesJun 12, 2026https://www.semafor.com/article/06/12/2026/sam-altman-calls-off-abu-dhabi-visit - Flipkart strengthens tech leadership with key hires across AI, engineering, fintech
Flipkart said the latest appointments cement the company's technology leadership as it focuses on scaling AI, data science and financial services capabilities across the business
- What to Know Before Adopting AI for Case Law Research
A guide to adopting AI for case law research
- The 6 best AI governance tools in 2026
I'll never forget the first time my childhood dog betrayed me. Before the incident, she was completely fine alone, knew every trick in the book, and only barked at the mailman and other potential serial killers. Then came that fateful night. I left for two hours, returning to shredded magazines, ripped couch cushions, destroyed dog toys, and a wagging tail. Let my canine misfortunes be a lesson for your AI endeavors. AI can be useful, fully functional, and your best friend—until the day it isn'
- What if AI retraining is just a comforting lie?
What if AI retraining is just a comforting lie? The Japan Times
Score: 30🌐 MovesJun 12, 2026https://www.japantimes.co.jp/commentary/2026/06/12/world/ai-retraining-is-comforting-lie/ - AI can generate answers but the future of expertise lies elsewhere
The rise of artificial intelligence is not simply changing how students learn. It may be fundamentally reshaping what expertise itself means. A student recently presented an AI-assisted proposal that was technically polished, logically structured, and supported by convincing recommendations. Only a few years ago, producing work of that quality would likely have required substantial effort […] The post AI can generate answers but the future of expertise lies elsewhere appeared first on e27 .
Score: 30🌐 MovesJun 12, 2026https://e27.co/ai-can-generate-answers-but-the-future-of-expertise-lies-elsewhere-20260605/ - What about customers? Why AI productivity is the means, not mission
When productivity becomes the goal, we lose sight of what, and more to the point who, really matters, Matt Vitale explains.
Score: 30🌐 MovesJun 12, 2026https://www.startupdaily.net/advice/opinion/what-about-customers-why-ai-productivity-is-the-means-not-mission/ - AI literacy is the new floor
Have you noticed how quickly the world is rewriting the rules of what counts as a skill?
Score: 30🌐 MovesJun 12, 2026https://www.philstar.com/business/2026/06/13/2534753/ai-literacy-new-floor - Waze is catching up on traffic lights, just not for everyone yet
Waze is starting to show traffic lights during navigation, but the rollout remains uneven. The feature helps the app catch up to Google Maps and Apple Maps, though many drivers still can’t access it.
Score: 30🌐 MovesJun 12, 2026https://www.digitaltrends.com/phones/waze-is-catching-up-on-traffic-lights-just-not-for-everyone-yet/ - How Ventura College Scaled Faculty AI-Readiness Through Communities of Practice
Artificial intelligence promises big gains for faculty in higher education, including greater efficiencies and elevated learning outcomes. To realize the wins, professors need to get up to speed on the tools. While many are experimenting on their own, some institutions are taking steps to accelerate that learning. At Ventura College, a California community college, leaders recently stood up communities of practice around AI use. A CoP brings together individuals with a shared interest in a topic or technology; in this case, AI. The group then works together to learn more about the topic or…
- What kinds of PAI dev kits are available for humanoid robotics?
Physical artificial intelligence (PAI) development kits for humanoid robotics range from high-end, industrial-grade platforms to prosumer and educational, modular do-it-yourself (DIY) kits, Raspberry Pi-based options, and more. Some kits are suited for specific functions like walking and navigation, using AI to understand natural language, sensor fusion, power conversion, and motion control, and handling objects in […] The post What kinds of PAI dev kits are available for humanoid robotics? appeared first on Microcontroller Tips .
Score: 30🌐 MovesJun 12, 2026https://www.microcontrollertips.com/what-kinds-of-pai-dev-kits-are-available-for-humanoid-robotics/ - AI Can Read Your Resume, But There’s One Thing It Can’t Judge
Bots can test your skills, not your character.
Score: 29🌐 MovesJun 12, 2026https://www.inc.com/netta-jenkins/ai-is-taking-over-hiring-but-it-cant-judge-this/91355733 - How a small Canadian publisher is resisting the AI book wave
How a small Canadian publisher is resisting the AI book wave CBC
Score: 29🌐 MovesJun 12, 2026https://www.cbc.ca/news/canada/nova-scotia/ai-book-publishing-nimbus-halifax-9.7228607 - Meta Employees Absolutely Hate Mark Zuckerberg’s Plan for a Companywide AI Hackathon
“I’m not sure that this company supports a hackathon culture anymore,” one employee posted in a forum open to the entire staff.
Score: 29🌐 MovesJun 12, 2026https://www.wired.com/story/meta-employees-absolutely-hate-mark-zuckerbergs-hackathon-idea/ - Implications of Continual Learning for LLM Agents: Introduction
Many people think that continual learning (CL) is a key missing capability of LLM systems, and we think its development could have huge implications for the capabilities and safety of AI agents. Despite this, several important questions about CL remain underexplored: What counts as continual learning? Through what pathways might LLM agents acquire CL capabilities? Which limitations of current agents would effective CL mitigate? How might CL affect safety and alignment? Which threat models do we need to look out for, and which of the current safety techniques will predictably degrade as agents become stronger continual learners? In what deployment settings might the risks materialize? What are some angles of attack for making CL agents safer today, given our substantial uncertainty about the shape those CL agents will take? Our sequence aims to tackle all of these questions and more. This is the first of a series of six posts in the sequence. Outline Post 1: Introduction This first post is a detailed summary of the entire sequence; the outline below describes the remaining five posts. Post 2: What is continual learning, and why might we expect to see it in advanced LLM agents? The basic reason to expect effective CL is that it would probably make AI agents better at important tasks that AI companies are trying to improve performance on, most notably AI research. How would CL help make AI agents better end-to-end AI researchers? Consider how human AI researchers improve: they do every step of the research process (i.e., read and write lots of AI research proposals, code, critiques, summaries, and papers), they learn from their successes and failures and from advice based on other people’s successes and failures, they extract generalizable insights about each step in the research process, and they progressively improve. LLM agents are already impressive: they are actively being used across most AI research activities, they can be prompted to reflect on their successes and failures, and there are various existing attempts to update their weights, contexts, memory banks, scaffolds, and tools to make them better. Some of these are somewhat effective. But so far, nothing has allowed LLM agents to become as good at end-to-end research as capable humans become after years of practice, despite the fact that LLM agents collectively accumulate research experience much faster than individual humans. AI research is a particularly important example, but this argument applies to most open-ended remote labor jobs. So, what exactly is CL? We say that an agent is a continual learner if it undergoes persistent updates during deployment. That’s more-or-less a binary criterion, but there are several other components to being good at continual learning that are much more continuous. We say an agent is an effective continual learner to the extent that it: Constantly undergoes persistent updates during deployment; Learns new useful knowledge and capabilities efficiently via those updates; and Does not (catastrophically) forget existing capabilities in the process. We argue that this informal definition matches intuitions and the common discourse around CL . For example, this lets us say that effective in-context learning with very long contexts is a form of CL, but it is weaker than weight updates that persist indefinitely. This also captures the type of on-the-job, sample-efficient learning from experience that is frequently discussed on the Dwarkesh podcast and that seems to be weak or missing in LLM agents (e.g., reflecting on small sets of experiences and extracting a generalizable insight that you then use repeatedly). We also think this definition lets us present an accurate, nuanced picture of CL and its importance. We simultaneously believe that CL is important, and that: The amount of effective CL that an agent does lies on a spectrum rather than being a binary property; Major advancements in AI capabilities may not require any breakthroughs in CL; and We already have early forms of CL in LLM agents, such as CLAUDE.md and SKILL.md files for maintaining insights for coding agents. We highlight the main components of an LLM agent that can receive persistent updates during deployment : Model weights, The context window, Memory banks with natural language or neural activation memories, The agent scaffold, and Tools. These cover most possible updates for LLM agents, but substantial future architectural modifications could arise and create new updatable components that end up being central to CL. We’re still not confident about which update mechanisms seem most promising for CL, how tractable advancing it will be, and what the timelines to remote labor automation are. We think there’s a strong case that weight updates are needed for some parts of effective CL: LLMs seem quite bad at handling lots of interrelated complexity in their context window , which limits the number of novel insights they can generate and utilize without weight updates. Knowing what to take away from past successes and failures in order to succeed at tasks you would otherwise fail at seems challenging. Post 3: How might continual learning affect safety and alignment? We begin the post by distinguishing between several properties of CL agents that affect their risk profiles: bounded vs. unbounded updates, legible vs. inscrutable updates [1] , and individual vs. shared memories. We then move on to concrete safety effects that CL agents may cause. We argue that CL raises two major safety concerns, both of which can be broken down into three subconcerns . These are summarized in the following figure, along with three potential alignment benefits of CL: We identify three pathways for goal and value change: Loss of developer-side control over generalization. When AI companies post-train a model, they can carefully curate the training environments to minimize the risk of undesirable generalization. In contrast, strong CL agents could undergo most of their training in deployment-time environments, where by default the training data isn't selected with alignment in mind. Not all deployment-time environments will incentivize misaligned behaviors, but it’s plausible that several of them do. We recommend the development of character training methods that make agents more robust to poor generalization when trained on such tasks and developing a better understanding of LLM generalization. Value systematization. Reflecting on subgoals is an important cognitive move for any agent pursuing open-ended goals. We expect that CL agents will increasingly make use of it, as the outcomes of the reflection process will persist in their memory, and to face several triggers that might prompt them to also reflect on their high-level motivations. These triggers include conflicting goals, developer-driven reflection, encountering OOD situations, and ontological shifts. Reflection on high-level motivations is likely to involve value systematization : the process of systematizing one’s previous values as examples or special cases of simpler, more broadly applicable values. While value systematization will necessarily occur in general agents capable of making philosophical progress, we should attempt to steer it toward favourable convergence. Monitorable reasoning, interpretable CL updates, and character training are some tools that might make this process more steerable. Memetic effects. CL may open channels (shared memory banks and weight updates) for direct memetic spread between instances. This is concerning because if influence-seeking values arise in any instance, they may propagate into other instances more effectively than other drives, and this opens a more direct channel for that to happen. These mechanisms could compound: an agent might acquire undesirable contextually activated goals through poor generalization from deployment-time training, refine them into beyond-episode goals through reflection, and propagate them memetically. We also identify three negative consequences arising from loss of last-mover advantage: Behavioral auditing becomes more difficult. Once AIs have deployment-time memories that contain multiple subjective months’ worth of state, pre-deployment evaluators may be unable to simulate deployment conditions realistically enough. Auditing results would no longer give us reliable signals about how models will behave in the wild. This can be mitigated by frequent deployment-time auditing, but that might be prohibitively expensive. Another mitigation is to use CL agents that only perform text-based updates, but such agents might be outcompeted. Pretraining data filtering becomes less useful. If LLMs can learn from data that was removed from their pretraining corpus at deployment-time, that reduces the utility of data filtering. Filtering might still be useful for shaping models’ propensities early in training, but it’s less likely to remain a viable countermeasure to misuse. AI control protocols might degrade. We analyze the impact of CL on AI control and conclude that the effects depend a lot on the CL agent’s architecture, but it’s plausible that at least some protocols will degrade. After discussing the risks, we also discuss the likelihood that they materialize in internal vs. external deployments and in deployments by open-source vs. closed-source developers. We finish by highlighting potential alignment benefits of CL: natural-language memories would provide additional monitoring surface, ongoing learning could enable faster feedback loops in alignment training, and episodic memories could enable models to produce better self-reports. Post 4: What are some angles of attack for making continual learning safer? The fact that current models are very weak CL agents means that it is hard to identify tractable angles of attack for making CL safer. We tentatively argue for focusing on three broad goals: deconfusion about the nature of CL and its safety implications, differentially advancing safer CL implementations , and creating evals that scale to CL agents or incentivize the development of safer CL agents . We start off with a few high-level recommendations that came up throughout the post on safety effects and that we’re relatively confident about: ensuring that CL architectures are interpretable and easy-to-control, both through developing new methods and through advocacy, and improving the robustness of character training. We then highlight the following deconfusion projects: Empirically studying realistic goal shifts, e.g. by training model organisms that have conflicting contextually activated goals. Exploring various conceptual questions about value systematization. Studying what constitutions are more stable under reflection. Training model organisms of ontological shifts. Forecasting the likelihood of different safety effects, such as the likelihood that CL agents reflect on their high-level motivations and the likelihood that the field converges on primarily text-based or weight-based update mechanisms. For differentially advancing safer CL implementations, we propose three ideas: Developing prompt optimization as a tool with which CL agents can perform their safety-critical updates in the text-space rather than weight-space. Developing novel AI control techniques aimed at making CL safer and CL agents that are amenable to those methods. Ensuring that the memories of the CL agent are interpretable, even when the update mechanism isn’t. Finally, we propose some projects for advancing CL evals: Create evaluation frameworks and mechanisms for evaluating behavioral trajectories, following Pacchiardi et al. (2026) . Create evals that measure the interpretability of CL agents. Post 5: Results from a small survey on continual learning We sent a survey based on an earlier draft of this sequence to several knowledgeable people for feedback. We found their responses useful and interesting, so we’re publishing them. Post 6: A literature review on continual learning This is a companion post surveying existing approaches to continual learning, relevant benchmarks and evaluations, and neuroscience literature on continual learning in humans. We prioritize high-level views and analysis over detailed technical approaches. Acknowledgements Thanks to Anson Ho, Erik Jenner, Rubi Hudson, Joey Yudelson, Dennis Akar, Vladimir Ivanov, Shubhorup Biswas, Ryan Faulkner, Tim Hua, Atharva Nihalani, and Angelo Huang for comments on a draft version of the sequence. Thanks also to Chad DeChant, Evgenii Opryshko, Jake Mendel, Caleb Biddulph, and Andrei Muresanu for helpful conversations about various parts of the sequence. ^ An important subdistinction here is weight-based vs. text-based updates. ^ This includes memory banks with natural language or neural activation memories that get retrieved into context or activation space when relevant. Discuss
Score: 29🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/qChDifwpY8znER7cW/implications-of-continual-learning-for-llm-agents - AI doesn’t fail because it’s wrong — It fails because you overload it
Early-stage teams don’t lose to better-funded competitors. They lose to compounding drag. And right now, AI is introducing a new kind: the illusion of speed without systems. Most conversations about AI in software development still fixate on accuracy: Is the model good enough? Is it hallucinating? Can it replace engineers? But in practice, AI fails […] The post AI doesn’t fail because it’s wrong — It fails because you overload it appeared first on e27 .
Score: 28🌐 MovesJun 12, 2026https://e27.co/ai-doesnt-fail-because-its-wrong-it-fails-because-you-overload-it-20260605/ - Apple says Siri AI won't suck up to you
In an interview, Apple's SVP of engineering explains how the new Siri wasn't designed to be sycophantic.
Score: 28🌐 MovesJun 12, 2026https://www.engadget.com/2192877/apple-says-siri-ai-will-not-flatter-or-romance-you/ - Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem)
For nearly a decade, this part of neural networks barely changed. DeepSeek is trying to reinvent it. The post Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem) appeared first on Towards Data Science .
Score: 28🌐 MovesJun 12, 2026https://towardsdatascience.com/why-this-decade-old-idea-still-powers-all-of-ai-and-why-its-a-problem/ - Roborock's First Robot Lawn Mower Is Here
It comes with advanced, wire-free navigation and a promised future AI-powered mapping update.
- An interview with a robot at HIMSS26 Europe
An interview with a robot at HIMSS26 Europe Healthcare IT News
Score: 28🌐 MovesJun 12, 2026https://www.healthcareitnews.com/video/emea/interview-robot-himss26-europe - Reward Hacking at the 1937 World’s Fair
The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated engineering and better science? How do you measure that? Different countries were assigned different areas of the fair and were given freedom to build a “Pavilion”, basically a museum of how cool the country is. It was an important public relations opportunity to showcase your power. What is better, communism or fascism? Obviously, it's whoever can build a cooler pavilion, and whoever has a better pavilion is going to win the upcoming war! Soviet pavilion on the right, Nazi pavilion on the left The organizers placed the Soviet and Nazi pavilions right in front of each other, and it created a very competitive dynamic. The Russians built a giant modernist building from stainless steel with a statue-of-liberty-sized sculpture of two members of the proletariat. The Nazis built a modern replica of an imperial Roman building, beautifully ornamented, with statues of jacked Aryan Übermensches flexing. The Nazis even sent their spies to steal the plans for the Soviet pavilion so they could build theirs a few meters higher. What about liberal democracy? The liberals had their own pavilions. The first was represented by Britain, the biggest and most populated empire at the time and the “leader of the free world” [1] The British pavilion was a relatively small "plain, windowless white cube". Inside, there were floor-to-ceiling photomurals of random Englishmen, including a photo of Neville Chamberlain (leader of the free world) fishing. There was also a display of English pottery [2] and a cafe that served Yorkshire tea. The pavilion only cost a fraction of its Soviet/Nazi counterparts and was made last-minute, haphazardly. They even shared it with Canada to save on cash. the British “cube” The British media was furious: "penurious [...] mere box with a bleak, windowless and boring wall to the river", "embarrassing austerity", "cheap, tawdry, inadequate, a shop display, a one-class exhibition.", "Every Briton feels humiliated at the sight of it," etc. "How could we defeat those scary totalitarian regimes if we can't even make a decent pavilion?" Adolf and Neville. This fishing photo decorated a 40ft tall wall in the British pavilion. The American pavilion was even lamer than the British one. There was very little coverage of it and it’s not even mentioned in the 1937 World’s Fair wikipedia article. The Times reported: “The U. S. pavilion was considered so bad that most French editors passed it over in polite silence.” [3] Maybe this is why there is so little information about it? We all know what happened. It's now 2026, almost 90 years after the Paris World’s Fair. Communism and fascism are both long gone. We live in a liberal world dominated by Anglo ideas of markets, rule of law, human rights, and free trade. The liberals had decisive back-to-back wins against the totalitarians in WW2 and later in the Cold War. The Anglo-Americans steamrolled the fascists and then the communists. The liberal victory was so dominant that Francis Fukuyama called it: “The End of History”. We won despite having really lame pavilions... How?! The authoritarians were “reward hacking”, they confused the “proxy” (making a cool pavilion), with the “objective” (having a productive economy and a high quality of life). This led to their pavilion to look cooler than the Anglo-Americans despite having less productive economies and smaller industries. There are plenty of other examples of authoritarian reward hacking. First, the Nazis and their costly wonder-weapons that are cool but do little damage [4] , obsession over Stalingrad and its symbolic meaning or Dönitz’s tonnage war. In turn, the Soviets are often considered history’s greatest reward hackers: an intimidating but inefficient military, an industry obsessed with output weight, and “allies” that are more like hostages. Of course, the reward hacking was also fractal and there are examples of it in every level of their economies: from Hitler’s bunker / the Politburo all the way to the factory floor. Liberal democracies seem to be much more immune to reward hacking, at least at the grand-strategy level. The liberal state has many layers of defense against the hacking problem: frequent elections, free markets, separation of powers, the right to criticize the government, antitrust laws, etc. Liberal democracies have participated in “dick-measuring contests”, but far less often than totalitarian countries. Sometimes, the best way to win a dick-measuring contest is not to play. We call this strategy “big dick energy” and historically the US had a lot of it. ^ The other candidate for "leader of the free world" was the US, but it was much more isolationist and had little interest in foreign affairs. We will get to them later ^ With some items by renowned potter William Worrall. I don’t know who that is, but it seems that English newspapers from the time thought that Worrall’s work was the only impressive part of the exhibit ^ More gems from the Times article : The US exhibit had an unexplained draped pool table, busts of Rockefeller and Gandhi. A model of the Triborough Bridge (artificially moonlit!) and an empty space reserved for a new coming model ^ The Manhattan project was 20x more efficient than the V-2 project (measured in: kills / $) Discuss
Score: 27🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/TTHi7yNheaoepWKfR/reward-hacking-at-the-1937-world-s-fair - AI Stocks Zhipu, MiniMax Slide as Lock-Up Expirations Near
AI Stocks Zhipu, MiniMax Slide as Lock-Up Expirations Near Caixin Global
Score: 26🌐 MovesJun 12, 2026https://www.caixinglobal.com/2026-06-12/ai-stocks-zhipu-minimax-slide-as-lock-up-expirations-near-102453785.html - Sympathy for both sides of the egregious misalignment debate
On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice , in the absence of yet-to-be-invented breakthrough technical alignment ideas. On the other side of this debate is almost everyone who works on or studies LLMs. Some of them are very concerned about egregious scheming, others much less so, and as a group they’re equally or more concerned about lots of other potential AI problems—AI-assisted bioterrorism, AI-assisted dictatorships, etc. And if they’re concerned about egregious misalignment and scheming, they’ll probably say that it would come about through race dynamics, careless programmers, bad actors, etc., as opposed to the simpler Yudkowsky & Soares story of “we get egregious misalignment and scheming because nobody has the foggiest idea how to avoid that”. Here’s my brief idiosyncratic take on this debate. I think BOTH of the following are true: (1) If you really think carefully about the properties of ASI, you really do find good reasons to strongly expect it to be egregiously misaligned, scheming, and ruthless, in the absence of yet-to-be-invented breakthrough technical alignment ideas. (2) If you really think carefully about the properties of current LLMs, you really do find good reasons to think that existing technical alignment techniques are adequate now, and may well continue to be adequate in the future. So then here are three (caricatured) positions: My position: (1) and (2) are both totally true. And we can reconcile them by saying that LLMs won’t scale to ASI. Yudkowsky & Soares’s position [caricatured]: (1) is totally true. We know this with great confidence, having spent decades thinking about it. So it follows that (2) must be wrong or irrelevant. Why is (2) wrong or irrelevant? Hard to say! There’s no ASI yet, and nobody knows in detail how it will appear. Sometimes it’s easier to predict what happens eventually than the detailed path. An ice cube in warm water will melt eventually, but don’t ask me to predict how many seconds it will take to melt, etc. So anyway, one possibility is that (2) is wrong because LLMs will kinda ‘wake up’, or something, when the core pieces of true intelligence finally come together. And then their behavior would change drastically for the worse. And maybe we’re already starting to see glimmers of that in existing LLMs? Or another possibility [cf. Eliezer tweet ] is that LLMs will invent non-LLM ASI. And then (2) will be simply irrelevant! …Or something else! Again, we don’t know! But we do know that (1) is definitely right. LLM people’s position [caricatured]: (2) is totally true. We know this with great confidence, because we are LLM experts and we have thought about these alignment plans in great detail, including matching our theories against real-world data. So it follows that (1) must be incorrect. Why is (1) incorrect? I don’t really know! Man, I read Yudkowsky and Soares, and it’s all these words, words, words, and I’m reading along and trying to match those words to my knowledge of LLMs and it just doesn’t make any damn sense. I can and will try to respond to their points in detail, but honestly the core issue is that they’re guilty of head-in-the-clouds armchair theorizing gone off the rails. Conclusion …So I think that both sides of the debate are basically coming from a reasonable and sympathetic place, with a big kernel of truth. Bonus section: Further commentary …That said, I can still complain at both sides! My “true objection” to Yudkowsky & Soares: For the record, my “true objection” to Yudkowsky & Soares is that if we’re talking about ASI, then LLMs are basically irrelevant and we shouldn’t even be talking about LLMs at all. And meanwhile, their plans are misguided because delaying ASI is possible on the margin but mostly hopeless , although I guess I’m happy that they’re trying anyway. Meanwhile, my hunch is that they’re overstating the intractability of finding that technical alignment breakthrough , although I haven’t found it yet , so I guess time will tell. My within-frame complaint at Yudkowsky & Soares: …But I’ll put that aside for the sake of argument, and bring up a narrower complaint within their frame: I think their suggestions that LLMs may become completely egregiously misaligned in the future via … umm … the ‘true core of intelligence’ coming together, and ‘waking up’? Like Skynet or something?? That was mean, sorry, but in any case, I don’t think this idea hangs together either theoretically or empirically. For the former (theory), see my discussion of the extreme weirdness of the LLM pretraining algorithm in Foom & Doom §2.3.2 . I think Yudkowsky & Soares have not internalized how weird this type of learning algorithm is, and if they had, then Yudkowsky would not be occasionally suggesting that we should think of an LLM as an actress playing characters. For the latter (empirical), I think the most fair assessment is that current LLMs are nice and obedient in some contexts, and LLMs are mean, defiant, and just plain weird in other contexts. You can straightforwardly go from that observation to “maybe there will be egregious misalignment and scheming in the future”, but not to “there will definitely be egregious misalignment and scheming in the future, absent new breakthrough technical alignment ideas”. I think that if Yudkowsky & Soares stopped treating current LLMs as direct evidence for technical alignment being definitely completely unsolved, and instead treated it as either mixed evidence or entirely off-topic, then their public messaging would come across to policymakers and general audiences as somewhat more convoluted and confusing. But I think it would be more accurate. Oh well. My “true objection” to LLM people: For the record, my “true objection” to the LLM people is that I don’t really care about anything they say, because I’m working on the ASI alignment problem, and LLMs won’t scale to ASI. (I’m overstating a bit. I’m generally happy for people to work on making LLM-world a place of wisdom and goodness, especially because LLM-world is the world in which ASI will someday be invented.) My within-frame complaint at LLM people: …But I’ll put that aside for the sake of argument, and bring up a narrower complaint within their frame: I think the LLM people are not pricing in the predictable consequences of ever more RLVR and/or the predictable consequences of ever more “real” open-ended continual learning , should the latter ever be solved (which I don’t think it will be, but never mind that). In other words, lots of LLM-focused people say “LLMs will eventually be able to do the things that humanity did over the last 5000 years: open-endedly and autonomously build new knowledge and ideas on top of new knowledge and ideas, in an endless tower, with no need for human-provided ground truth anywhere in that process. And how exactly will the future LLMs do that? Uhh, I don’t know, people are working on it, I guess they’ll probably figure something out.” …And bam, that blank spot in the map is where the pea gets hidden under the thimble . Because if you want the LLMs to gain ever more knowledge, whether through a perpetual RLVR loop or some other yet-to-be-invented type of continual learning, there has to be some kind of ground truth, or else it will go off the rails into nonsense. And that ground truth, whatever it is, will basically amount to an objective function (a.k.a. cost function, reward function, whatever). And when the LLM updates enough on that ground truth, then whatever human-niceness that the LLM inherited from pretraining will get diluted away in favor of ruthless maximization of that objective function. (See also: Why we should expect ruthless sociopath ASI .) Thanks Zack M. Davis for a brief discussion that inspired this post. Discuss
Score: 26🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/DZaZ3fqHnvfLCftPu/sympathy-for-both-sides-of-the-egregious-misalignment-debate - Why Being More Human Is Now a Competitive Edge in the Age of AI
When everyone is optimized, authenticity stands out.
Score: 26🌐 MovesJun 12, 2026https://www.inc.com/carol-schultz/why-being-more-human-is-now-a-competitive-edge-in-the-age-of-ai/91357725 - Marvell names Adobe's Dan Durn as finance chief amid growing AI demand
Durn will take charge at Marvell starting June 15, while Meintjes will remain with the semiconductor company in an advisory role through April 2027 to support the transition.
- South Shore News: AI-generated newsletter has a paid audience
South Shore News: AI-generated newsletter has a paid audience The Boston Globe
Score: 25🌐 MovesJun 12, 2026https://www.bostonglobe.com/2026/06/12/business/south-shore-local-news-ai/ - What AI means for your next marketing hire
As AI reshapes the marketing function, Southeast Asian startup founders face a deceptively simple question: what does good actually look like now? AI is restructuring the marketing function faster than most startups have had time to notice. The skills that made a strong marketing hire in 2022 are being automated. The skills that actually matter […] The post What AI means for your next marketing hire appeared first on e27 .
- Citations Needed: Magic Encyclopedias to Save the World
Last week FLF launched a competition “ to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases ”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of reasons actually! I’ll gesture at a few here. Conjuring a magic encyclopedia For now, assume with me that it can be done . Wish away with me the various technical and financial challenges. Great! Now we can rapidly conjure up a deeply, fully researched knowledge base on any topic. All claims point back to who’s said them, in what context, and (importantly) with what justifications and evidence (if any). Any quibbles or nuances which have been expressed on a point are similarly readily available. It’s not opinionated: all competing viewpoints with their associated justifications are associated and comparable. That’s way, way too much information! Imagine trying to read everything ever about diet or shipping or taxes or microbes. It’s not happening . So as well as this, we now magically have tools which gather similar points together, summarise, and can make a decent stab at which points we’ll consider most or least relevant . We can dig deeper (or send AI agents to scout deeper) as desired. And when new interesting and informative content arises, or in contexts where nuance and clarification are helpful, it can be bubbled up to our attention. All this is doable today: enough web searches, enough cross-referencing of tweets, articles, journals, following of citation chains, gathering and comparing of hypotheses and points of view, etc. will make progress. But it’s exhausting. When someone does go to those lengths, their partial — but heroic — efforts to map out what’s been said often languish either unpublished or unrecognised. Don’t we already have this? A shining example is Wikipedia, where the collective curatorial effort of a wide range of editors gradually maps out an expanding core of topics and commentary. But Wikipedia has lags, biases, and (perhaps most importantly) huge gaps , especially on important frontier questions. (Let’s not talk about Grokipedia . [1] ) Meanwhile, the tech to ‘smartly browse’ and bubble up informative pieces is nascent too, in bits and pieces like AI chatbots and community notes: already useful in their ways, but faltering, unreliable. It’s these comparison points and the early progress I see which gives me some excitement that the grander vision is viable and that we can take steps towards it now. Who cares? I’m not naive. I know that many (most?) humans a lot of the time aren’t actually interested in finding out or sharing what’s true; mainly they want to say what makes themselves and their friends seem popular and cool… and enemies seem dastardly and disgusting. We all have these impulses to greater or lesser extent. Yes, you too! Sometimes those impulses seem deranged (they’re not designed for the modern world); other times they might even make sense, at least selfishly. Nevertheless (and perhaps mysteriously !), a lot of the time, some people actually want to find out true things and share them. (I do! Do you?) Hence journalism, science… even hearsay and rumour (at their — perhaps rare — finest). We recognise that, when they’re actually anchored and doing their best to be right (or at least less wrong ), those are absolutely foundational to wellbeing and prosperity in a modern society. Without (good) journalism, politics runs astray and tyrants abound. Without (grounded) science and technology, public health suffers, food supply and shelter and infrastructure decay, and progress falters. As Ben Goldhaber and I previously wrote : Knowledge is integral to living life well, at all scales: Individuals manage their life choices : health, career, investment, and others on the basis of what they understand about themselves and their environments. Institutions and governments (ideally) regulate economies, provide security, and uphold the conditions for flourishing under their jurisdictions, only if they can make requisite sense of the systems involved. Technologists and scientists push the boundaries of the known, generating insights and techniques judged valuable by combining a vision for what is possible with a conception of what is desirable (or as proxy, demanded). More broadly, societies negotiate their paths forward through discourse which rests on some reliable, broadly shared access to a body of knowledge and situational awareness about the biggest stakes, people’s varied interests in them, and our shared prospects. (We’re especially interested in how societies and humanity as a whole can navigate the many challenges of the 21st century, most immediately AI, automation, and biotechnology.) But our knowledge-producing institutions are plagued by publish-or-perish and clickbait incentives alike [2] — and the social media landscape is even worse, riddled with misinformation and brainrot from all political quarters . I care about this. So do you, I daresay. I especially care now, as society is poised before a series of important decisions about our future relationship with technology, especially AI. It could be ruinous, with tyranny, neo-feudalism, or extinction real prospects. Or it could be fantastic. Just wanting it to be OK isn’t enough : we have to seek, generate, share, and defend important knowledge — about developments in technology, as well as about trends in politics and power — and act on it. How do we actually help? There’s no single path or silver bullet. But the incredibly high-level picture is: better communication of knowledge is usually good. It helps people be more informed and make better decisions according to their needs. A better shared understanding makes it easier for people to work together toward shared goals (even if they don’t agree on all priorities). On average if people make better decisions and can work together better, we’ll get more flourishing and less catastrophic risk . We’re trying to stimulate one piece of this picture with the knowledge-base direction. Heavy-handedly adjudicating what’s true rarely works. [3] Instead, equip people with the fullest picture possible, as accessibly as possible, and we find our way: evidence adds up, and when it doesn’t, that means we need to look for more. As Scott Alexander wrote years ago (emphasis partly mine), Logical debate has one advantage over narrative, rhetoric, and violence: it’s an asymmetric weapon . That is, it’s a weapon which is stronger in the hands of the good guys than in the hands of the bad guys . In ideal conditions (which may or may not ever happen in real life)... when done right, it can only prove things that are true. … Unless you use asymmetric weapons, the best you can hope for is to win by coincidence. I’m not focused on logical debate per se, and in any case I wouldn’t be so Manichean about it — we’re all ‘good guys’ sometimes and ‘bad guys’ sometimes (whether we mean it or not) — but the articulation is compelling. Humanity has accrued a slowly-growing arsenal of these asymmetric weapons: libraries, citation, scientific review [4] , databases, encyclopedias, web search, to name a few. Today they’re creaking under the weight of a confusing information deluge and assaulted by powerfully-vested interests. I earnestly believe that an upgrade to truth-seekers’ ability to find and scrutinise information, to build and share fuller pictures of topics at hand, can be ‘infectious’ : more people more of the time can see a little further, pierce a little more of the fog of confusion and misinformation, be better epistemically defended, and embody — and exemplify — truth-seeking cognition. When people (through malice or negligence) spread confusion and falsehoods, they’re that bit more likely to face scrutiny and consequences. After all, we’re making that scrutiny cheaper, easier, and more accessible. This applies whether the ‘wielders’ of these new weapons are curious members of the public, scientists, analysts in public institutions, business leaders and technologists, or even the AI assistants those folks recruit to accelerate their work. In politics, that can mean that people engage more often in collaborative, truth-seeking cognition and less in tribal cognition. And in technology, it can mean that more people can stay better abreast of the important shifts and prospects that will shape our future — helping the public hold decisionmakers to account (and choose better ones), and helping those decisionmakers sincerely and deeply engage with the topics at hand. I want these kinds of epistemic heroics to become commonplace, and I want the epistemic giants among us to stride further still. Let's do it ! ^ Though quite a flawed execution, I think the idea behind Grokipedia — namely, to get AI to substantially help with curating knowledge bases and to use that for collective epistemics — was in the right direction. Unfortunately it was mostly a vanity project and little thought appears to have been given to the grounding or validation, making it less useful than Wikipedia. ^ Do you remember the replication crisis , which we’re still dragging ourselves out of? The new disease of importance hacking ? Have you ever critically read a newspaper for rhetorical slant? Taking a more cynical stance, it’s not only clickbait and publish-or-perish (which are regrettable incentive pressures, but hardly attributable to malice). Science and journalism alike have deep political and adversarial infections as well. ^ (And even if I wanted to, I don’t have particularly heavy hands, alas.) ^ I feel compelled to point out that the current state of ‘official’ journal- and conference-managed scientific review is truly dire, especially in some fields including psychology and AI. I hold up the ideal of scientific review, not its pale and diseased shadow as sometimes charaded on Earth. Discuss
Score: 25🌐 MovesJun 12, 2026https://www.lesswrong.com/posts/RyeRYm4FrpqP32a2v/citations-needed-magic-encyclopedias-to-save-the-world - Firefox’s AI kill switch exists. Only 1% of users have flipped it.
Mozilla built an AI kill switch into Firefox after its users demanded one. Only 1% have used it. Another 3% turned off some AI features selectively. The rest left everything on. CEO Anthony Enzor-DeMeo says the point is not the percentage but the choice. “Our community was pretty vocal, especially during the CEO announcement, that […] This story continues at The Next Web
Score: 24🌐 MovesJun 12, 2026https://thenextweb.com/news/mozilla-firefox-ai-kill-switch-1-percent-smart-window-vpn - Crusoe claimed it “paused” a plan to build a Wyoming data center after it failed to win customers including Google
The company was pressured to pause development by Google
- The 4 best AI website builders
Building a website is no longer a particularly hard task—but it can be an annoying one. If you look at most sites, there's a fair amount of text, images, and general organization to it all. Even with the best tools, it takes a few hours to put together something good. Wouldn't it be great if you could just create a website from scratch in just a few minutes? That's what AI website builders claim to do. The idea is that by using artificial intelligence, AI website builders can streamline everyth