AI News Archive: June 16, 2026 — Part 8
Sourced from 500+ daily AI sources, scored by relevance.
- India emerges as an AI-ready healthcare market in APAC: Report
Clinician burnout and demand for convenience drive shift toward new care models
- Companies are scrambling to curtail soaring AI costs
Although AI laggards are still racing to adopt the technology, for the heaviest users tokenmaxxing is out. Some, including Meta and Amazon, have scrapped their leaderboards.
Score: 39🌐 MovesJun 16, 2026https://www.livemint.com/ai/companies-are-scrambling-to-curtail-soaring-ai-costs-11781602608098.html - AT&T Throttles Some Employees’ AI Usage as ‘Tokenminimizing’ Arrives
AT&T Throttles Some Employees’ AI Usage as ‘Tokenminimizing’ Arrives The Information
Score: 39🌐 MovesJun 16, 2026https://www.theinformation.com/newsletters/applied-ai/t-throttles-employees-ai-usage-tokenminimizing-arrives - Pentagon boasts of using AI to write reports mandated by Congress
Pentagon also claims 1.5 million personnel are using generative AI tools.
Score: 39🌐 MovesJun 16, 2026https://arstechnica.com/ai/2026/06/pentagon-boasts-of-using-ai-to-write-reports-mandated-by-congress/ - DeepTRACE brings flexible machine learning to single-molecule track analysis - ORA
DeepTRACE brings flexible machine learning to single-molecule track analysis ORA - Oxford University Research Archive
- AI leasing activity accelerates, propping up Seattle-area office market
Companies like Anthropic and OpenAI are expanding in the Puget Sound area, drawn by the talent pool and lower costs compared to San Francisco and New York.
- Timnit Gebru on how to safeguard independent science for the AI age
On safeguarding independent research in the age of big tech
- Why Data Readiness Is the Foundation for AI Readiness in Higher Education
Every board wants to know the AI plan, but AI readiness starts with a question most institutions haven't answered: is your data ready? Simply put, AI readiness starts with data readiness. You don’t build a house without a solid foundation. The stronger your data as your foundation is, the greater opportunity that you have to build, and we are all building right. Our goal is not to be static. Our goal is to help our organizations grow, be more effective for our students and achieve the outcomes that higher ed is there to provide. Click the below banner to explore building data governance…
Score: 38🌐 MovesJun 16, 2026https://edtechmagazine.com/higher/article/2026/06/why-data-readiness-foundation-ai-readiness-higher-education - KAIST illuminates the eyes of humanoid robots with minimal memory
KAIST illuminates the eyes of humanoid robots with minimal memory EurekAlert!
- CX Daily: When Employees Leave, Their AI Clones Carry on Working
CX Daily: When Employees Leave, Their AI Clones Carry on Working Caixin Global
- MyFitnessPal adds AI-powered Coach for personalized nutrition guidance
MyFitnessPal is adding a new AI-powered coaching experience that turns users’ food logs, goals, meals, and habits into personalized nutrition guidance. Here are the details.
Score: 38🌐 MovesJun 16, 2026https://9to5mac.com/2026/06/16/myfitnesspal-adds-ai-powered-coach-for-personalized-nutrition-guidance/ - Iron Gorilla Launches Runtime Enforcement Platform for Autonomous Enterprise and Government AI Agents
Iron Gorilla Launches Runtime Enforcement Platform for Autonomous Enterprise and Government AI Agents USA Today
- Suralink Launches Cloud Testing Suite to Bring Agentic Execution to Audit Engagements
New AI-powered agents automate PBC validation, sample testing, and workpaper preparation to reduce manual effort and accelerate engagement workflows
Score: 38🌐 MovesJun 16, 2026https://www.cityam.com/suralink-launches-cloud-testing-suite-to-bring-agentic-execution-to-audit-engagements/ - Fellow surpasses 5,000 customers after rebooting for the AI era
Enterprise meeting assistant hopes to achieve $100 million in revenue by 2028. The post Fellow surpasses 5,000 customers after rebooting for the AI era first appeared on BetaKit .
Score: 38🌐 MovesJun 16, 2026https://betakit.com/fellow-surpasses-5000-customers-after-rebooting-for-the-ai-era/ - Synthetic document finetuning for instilling positive traits
This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here . TLDR: Via adapting the methods of Marks et al and Li et al , we train Gemini 3 Flash to have certain traits/values by midtraining it on documents about how Gemini has those properties, followed by finetuning it on synthetic chat data where it demonstrates those properties. The chat finetuning is effective for instilling the traits robustly, working OOD. We share some takeaways on how to improve midtraining & SFT effectiveness. Introduction This work closely follows Li et al (model spec midtraining, or MSM), who show that by training a model on synthetic documents before chat finetuning starts, they can shape how the model generalizes. Teaching the model reasons behind specific behaviours, rather than just the behaviours themselves, can also improve generalization. Our aim was to see how well this holds when instilling positive traits in a frontier model (Gemini 3 Flash), and to surface some of the practical details that matter for making it work. Our motivation is deep alignment : we want to train principles into the model which guide behaviour even in highly OOD behaviours. Our MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits we wanted the model to exhibit) as our universe context, with a checkpoint of Gemini 3 Flash post-trained only on the Flash SFT mixture as our starting point. We had 2 major pipelines for generating and training on data: Midtraining : generating pretraining-style documents (Reddit threads, blog posts, emails, research papers) which describe a world where Gemini exhibits the target traits, in line with Li et al , and Anthropic's described synthetic document finetuning method. This was not chat-formatted. SFT : chat-format (prompt + response) data where the assistant naturally embodies the traits. These are generated by giving Gemini 3.1 Pro the relevant parts of the traits document in its system prompt, and telling it to answer in a way that embodies the trait without being exaggerated or referring explicitly to the document. The system prompt is removed for training. We created synthetic datasets in similar ways for both pipelines, again heavily inspired by the pipeline in Kutasov et al , as well as Marks et al .: Split the traits document up into chunks (e.g. each trait/bullet) For each chunk, have Gemini 3.1 Pro generate a scenario where that trait was important for directing behaviour, and turn this into a user prompt We also add a critique stage here, making sure the scenario is realistic and would naturally test/elicit the trait we want. One helpful extra step here was to generate an initial model response without any system prompt, and using that as part of the response we passed to the LLM (e.g. if the default response is full of platitudes or common wisdom, then we might want to change the user prompt to force deeper engagement with the specific scenario details) Generate an initial answer from Pro, with the trait in the model’s system prompt In a separate conversation context, ask Pro to refine this answer [1] to be more closely aligned with the chunk (but in a realistic, non-performative way) Run a final autorater stage to filter out unrealistic or otherwise low-quality responses, and a deduplication stage to remove prompts with too-similar embeddings When trained on this data, we removed the system prompts used to generate it, similar to Guan et al . We generally train from scratch from pretrained (or midtrained) checkpoints, using different fractions of synthetic chat data in the overall mixture. Results We measured our models in two important ways. Firstly, we used LMSYS & agentic coding evals to make sure we weren’t experiencing significant capability regressions during our training. Secondly, we used a collection of OOD safety evals to see whether the model was able to exhibit aligned behaviour in scenarios very different to our training data. Each eval was deliberately chosen to be OOD along at least one axis relative to our training data (which was single-turn, narrow in framing: “difficult advice”). The table below summarizes how; we describe each eval in more detail below. Eval Turn structure Agentic Main shift vs training AI delusion validation Multi-turn No Sustained adversarial persona with escalating delusions ODCV Single-turn Yes Tool use, ethical conflict under performance pressure Agentic Misalignment Multi-turn Yes Tool use (emails), direct goal conflict / autonomy threat Audit Agents Multi-turn (5-turn) No Adaptive auditor, with instructions to escalate pressure In more detail, these four OOD safety evals were: AI Delusion Validation (based on Tim Hua’s work ) - if this model is instructed to be a therapist, and a red-teaming model is role-playing as a client suffering from delusions, can the red-teaming model induce the therapist to validate its delusions? ODCV (adapted from Li et al ) - do the models violate constraints to achieve objectives, when placed under strong performance incentives? Agentic Misalignment (based on Lynch et al ) - will models take actions like information leakage, specifically when facing a direct goal conflict or threat to its autonomy? Audit Agents (adapted from aryaj’s methodology ) - can we set up an auditor agent to induce a model to violate the traits described in a given document? We adapt this to make it multi-turn, which we find very helpful for eliciting trait violations which are hard to show over single turns (e.g. “the model changes its mind in conversations when the user expresses a new opinion”). The full methodology works as follows: An auditor model is given a specific trait and asked to elicit a violation over a 5-turn conversation Before each step, the auditor performs a strategy assessment to decide whether to escalate, de-escalate, or pivot its approach, making the pressure adaptive rather than following a fixed escalation schedule We also use Petri-style realism checkers at the start of each audit, to reduce the amount of eval-awareness triggered by the attempted violation Our core findings: SFT shows mild-to-significant improvement on all alignment-based evals Midtraining shows improvement on most (and often stacks with SFT), but not all of them Capability results are mostly flat, suggesting no significant degradation [2] We also tried swapping out SFT for BDPO (bounded direct policy optimization, from Cho et al ). We chose the bounded variant, as our initial use of normal DPO led to the model just driving the probability of rejected responses incredibly low, rather than making positive ones more likely. The BDPO data generation pipeline was very similar to the SFT one, except that for each user prompt we also generated a “rejected response” which was produced without the trait in the model’s system prompt, and the critique stage made sure this response didn’t align closely with the trait. The results were sometimes marginally better than SFT, although not consistently, but it was more difficult to tweak hyperparameters of BDPO for training stability. On the net, we do not think it is worth using BDPO over SFT. Removing Superficial Patterns in Synthetic Data Common patterns (especially in the SFT data) can lead to unexpected behaviours getting reinforced. Importantly, this failure mode can exist even when the pattern seems normal in isolation, because it can still be massively over-represented when we look at the whole dataset. In one early example, we tried to teach the model the value of “appropriate agency” by generating examples where the model asked for clarification in underspecified user questions, and accidentally taught the model to ask for clarification all the time, even to questions like “What is 1+1?”. Each individual example in our training dataset was reasonable in isolation, but only when seeing it all together could this pattern emerge. To fix this, we built a 3-pass pipeline to run at the end of each synthetic data generation: Scan : concatenate several batches of transcripts and ask an LLM to identify recurring structural, rhetorical, or behavioural patterns within each batch. We can process multiple batches in parallel, for efficiency. Cluster : take the features across scans, de-duplicate (only keeping ones that appeared in more than one scan), and merge. This gets us a consolidated list of candidate patterns. Autorate : turn each surviving feature into an autorater and use it to count the number of matches across a larger sample of the dataset. We have “broad” (loosely present) and “strict” (unambiguously present) detection thresholds. Below is an example of output from this pipeline. In this case, we were investigating why the model was performing worse on the delusion encouragement eval, and we found the issue was related to the dataset having too many examples which opened with direct emotional validation, which can easily lead into uncritical acceptance of a user’s framing. Although we built this scan-cluster-autorate pipeline for our own data, it's general - in other words it can take any chat or document dataset and an LLM, and find the over-represented structural patterns in it. We think this kind of method could be broadly useful for synthetic-data work, especially for model-organism research, where the organism's realism can be harmed by introducing behavioural artifacts from the training data. Detecting these patterns directly in the data, before training, is cheaper than discovering them later through downstream evals. We also ran an experiment using the results of this pipeline. We took two patterns with >20% frequency in the data: emotional-validation buffering, and BLUF (Bottom Line Up Front), where the opening sentence is a direct response either agreeing with or refuting the user's premise. For each, we filtered the data containing that pattern and retrained. The figure below shows four models - baseline (no synthetic data), full synthetic SFT, BLUF-filtered, and emotional-validation-filtered, across three measures: the delusion confirmation score, and the rates of each structural pattern. All three synthetic-SFT models scored comparably on delusion confirmation, much better than baseline. So removing emotionally validating openings didn't reduce delusion confirmation in our setup, which is some evidence against the intuition that validation buffering leads to delusion validation. But the other two panels show each filtering did change the model's structure as expected: the BLUF-filtered model produces less BLUF (52% -> 41%), and the emotional-validation-filtered model produces less emotionally validating opening sentences (26% -> 20%). The most interesting takeaway is that models pick up structural patterns from synthetic data in ways that don't always show up in the eval scores, even when you'd expect them to . This suggests there's some value in a pipeline which can detect these kinds of patterns directly in the data, rather than only via downstream evals. Incidentally, another advantage of midtraining over synthetic data is that it can help teach the shape of aligned responses without carrying a bunch of formatting baggage along with it like this. However this may not outweigh the factors that make midtraining hard to get right - see our takeaways section below. Takeaways Knowledge doesn't always mean internalization. Alongside the behavioural evals above, we measured whether our models had knowledge of the traits we were trying to teach them, using a knowledge eval inspired by Slocum et al . We ask open-ended questions such as "What are three important values?" or "List five important principles for how LLMs should interact with humans". We keep these questions abstract rather than situational, because we're purely trying to measure recall, unlike the behavioural evals. We then used an autorater to score each point the model makes from 0 to 2 by how well it matches one of the traits in our document, then take an average over all the points made by the model in all questions we ask it. The plot below shows how midtraining instils this stated knowledge much more effectively than SFT alone. One important takeway from our project was that we got positive results on knowledge evals before getting positive results on behavioural evals. Our initial midtrained models got uplift on trait recall, but wouldn't reliably exhibit these traits in an actual conversation. Multi-turn (adversarial) evals are helpful . To do things like stand up to adversarial pressure or not validate user delusions over a multi-turn conversation, the model needs to have learned principles it can use to direct its behaviour even when the conversation takes it into weird OOD places. Some trait violations are close to invisible single-turn: "the model changes its mind when the user pushes back," for instance, has no single-turn analogue. Multi-turn evals also let you explore richer scenarios and not overfit to any single attack vector - the auditing agent in particular was a very useful way to hill-climb on our method (doing so on any other eval would carry a much greater risk of overfitting). Mixing in baseline SFT data can help mitigate capability regressions . Even with a cohesive doc describing traits, we still get problems stemming from the lack of diversity in the SFT data. If each question is an opportunity to exhibit one or more of the alignment-related traits we’re trying to train into the model, then there are many kinds of user requests that just won’t be covered. We found mixing our synthetic data with baseline SFT data (the same that was used to train the checkpoint we started training from) helped a lot with this - in comparison to finetuning with synthetic-only data after our model was already trained on regular data, which was much more likely to lead to strange behavioral collapse, in the style of Murray et al . Midtraining can work, but it’s quite difficult. We spent many FTE weeks unable to get positive results from midtraining - in particular we frequently experienced severe capability regressions from it. We speculate that one thing which helped here was to start from a pretrained checkpoint rather than a post-trained one, so that the midtraining doesn’t remove the basic chat capabilities which the model learned during SFT. In particular, starting from a post-trained checkpoint was often an unhelpful confounder because in our evals we needed to disentangle the desirable “refuses to execute tool calls” from the undesirable “training has caused it to forget how to call tools”. As well as this, here are some more speculative things we found useful when doing midtraining, many of which are inspired by or built on the methods from Li et al . Note that we generally didn’t run comprehensive ablations for these, they’re simply the collection of most significant differences between our midtraining datasets which worked well, and the ones which didn’t. Highly structured scenario generation. In particular, it’s important to brainstorm the what, how and why before generating each piece of midtraining data. By this, we mean: what = what specific trait are we constructing this example to embody how = exactly how will this trait be manifest in the example, e.g. what actions will Gemini be described as having taken as a result of this trait why = why does this action display the trait, and how do we get this “why” into the example (e.g. does somebody quote Gemini explaining its actions / does an observer infer it / is it very explicitly manifest in the form of the consequences of the actions) Aggressively critique your examples after initial generation (ideally rewrite them from scratch), with the critique focusing on naturalness and trait embodiment The “removing superficial patterns” pipeline described above was very helpful for us, to spot common problems in our data (e.g. initially we had a very common generic pattern where a character would criticise Gemini for some action X before having an epiphany and realizing that the action was actually good; we think this was sending a muddled training signal) We suspect that trait documents benefit from being holistic. We didn’t test this with ablations, rather this is mostly based on our early failed attempts at getting midtraining to work: purely generating data from a short list of traits trains the model to put a square peg into a round hole, by unnaturally forcing these traits into a conversation. The document which worked best in our experiments also came with explanations for how to trade off traits, when to not follow them, etc. To frame this a different way: if you have too much data with the structure “if X then Y”, then you won’t just learn “if X then Y”; instead you’ll reduce loss by learning “always do Y”. Here Y is a trait, and X is “the model is in a situation where the trait can be naturally exhibited”, hence the effect of over-representing the “if X then Y” pattern is to teach the model to always exhibit trait Y. This is also related to the appropriate agency problem we described earlier. We would be interested in exploring each of these further, and quantifying the extent to which they’re necessary for success of midtraining. ^ For people with budget constraints, we recommend using the most expensive and high-quality models only for the critique & rewrite stage, since that seems to be the most important one to get right. Even critique starting from a bad response can be better than a single-shot answer from the same model, assuming the model is allowed to rewrite the entire response from scratch. Possibly this is because critique is easier than generation, and it's unclear which choices made by the model will be good or bad until you actually read them. ^ Explanation of the capability evals: LMSYS SxS is measured relative to the baseline of SFT-only, 0% synthetic data - hence why that datapoint is near 50%, because this is the model measured against itself. The SWE-Bench score is measured relative to the score of the baseline model (again this means the model with no midtraining or synthetic data training). Discuss
Score: 38🌐 MovesJun 16, 2026https://www.lesswrong.com/posts/GTYJRLhqztxKF2v5R/synthetic-document-finetuning-for-instilling-positive-traits - Hermes Agent Adds Asynchronous Subagents, So Delegated Work No Longer Blocks the Parent Chat
Hermes Agent Adds Asynchronous Subagents, So Delegated Work No Longer Blocks the Parent Chat MarkTechPost
- AI memory boom pushes up costs for India’s electric vehicle industry
EV makers say shortages in RAM and storage, along with rising semiconductor and commodity costs, are increasing input expenses and putting pressure on vehicle costs
- Spotify’s post-English AI future
More than half of Spotify listening is now in non-English languages as the company expands across Africa, Asia, and Latin America with local artists, pricing, and payment systems.
- Apple 2027 rumors: AirPods with cameras for AI and the second folding iPhone
Now that we're clear of WWDC and all of the new AI-powered features coming to Apple's platforms, Bloomberg reporter Mark Gurman has more details about rumored new hardware, like the camera-equipped AirPods he'd previously written about. He says they are currently on schedule for a late 2027 launch, and that while we're checking out beta […]
Score: 38🌐 MovesJun 16, 2026https://www.theverge.com/tech/950826/apple-airpod-camera-ai-foldable-iphone-rumor - Plaud says its software business topped $100M in ARR after shipping over 2M AI notetakers
Plaud is trying to make a mark in a crowded market full of AI-powered meeting notetakers.
- BlackLine Earns Industry Recognitions for AI Innovation and Customer Trust
BlackLine Earns Industry Recognitions for AI Innovation and Customer Trust Toronto Star
- RL Systems Mind the Gap: Matching Trainer and Generator Throughput
RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker
- Finland's GitHits attracts €1.5M pre-seed funding to target AI hallucinations in coding - ArcticStartup
Finland's GitHits attracts €1.5M pre-seed funding to target AI hallucinations in coding - ArcticStartup ArcticStartup
- AI Chatbot Pricing Comparison: Here's What Paid AI Gets You
This is what spending money on a more advanced AI model will get you.
Score: 37🌐 MovesJun 16, 2026https://www.cnet.com/tech/services-and-software/upgrading-your-ai-chatbot-heres-how-much-itll-cost-you/ - LLM Fallbacks Break Agent Pipelines — I Built the Missing Recovery Layer
LLM rate limits don't just interrupt agent pipelines—they can silently corrupt structured outputs when fallback models receive incompatible payloads. I built a recovery layer that classifies failures, adapts payloads across model tiers, preserves execution state, and maintains schema integrity during provider swaps. The post LLM Fallbacks Break Agent Pipelines — I Built the Missing Recovery Layer appeared first on Towards Data Science .
Score: 37🌐 MovesJun 16, 2026https://towardsdatascience.com/llm-fallbacks-break-agent-pipelines-i-built-the-missing-recovery-layer/ - New Model Sends Zhipu AI’s Stock Soaring
New Model Sends Zhipu AI’s Stock Soaring Caixin Global
Score: 37🤖 ModelsJun 16, 2026https://www.caixinglobal.com/2026-06-16/new-model-sends-zhipu-ais-stock-soaring-102454763.html - Roblox’s AI facial age checks raise questions about user privacy and inaccurate age estimations
Roblox now requires AI-based facial age checks or ID verification for full platform access. But concerns remain over privacy, misclassification, and how easily children and bad actors can bypass the system. The post Roblox’s AI facial age checks raise questions about user privacy and inaccurate age estimations appeared first on MEDIANAMA .
- Index Startup Ornn Launches Anthropic, OpenAI Token Benchmarks
Index Startup Ornn Launches Anthropic, OpenAI Token Benchmarks The Information
Score: 37🌐 MovesJun 16, 2026https://www.theinformation.com/briefings/index-startup-ornn-launches-anthropic-openai-token-benchmarks - ‘Pretty Crazy’ Token Usage Is Testing Bosses’ Bet on AI
A Silicon Valley software maker and an ecommerce company reveal to WIRED how they are navigating the emerging challenge of “tokenomics.”
- Canva only hires people with these 2 traits—why they matter amid the AI shift
As artificial intelligence disrupts the workplace, there are still two human traits every employee needs to succeed, according to Canva’s chief people officer, Jennie Rogerson. Rogerson shared those traits at Charter’s New Employer Brand Summit last week in New York City. “Curiosity is a baseline,” she said. In addition to curiosity, Rogerson called out the ability to “go the extra mile” in taking initiative (and responsibility) to help out the team beyond one’s job description. In practical terms, that could mean getting an advanced certificate in your field or bringing soft skills into the workplace. Career coaches and industry experts agree with Rogerson’s opinion—and the importance of those traits, especially as AI rises. “The biggest shift I’m seeing is that technical skills are no longer enough to future-proof your career,” London-based career coach Caroline Hickey tells Fast Company . “Curiosity is a deeply human trait that gives us control over our careers, and it’s something we can all do. That is what makes it one of our human superpowers in a world shifting to AI.” Hickey, who specializes in helping young professionals navigate the changing world of work, says one of the easiest ways to get curious is to get comfortable asking “why?”—with a small “w.” “When a project fails, you get rejected from a job, [or] a client gives you negative feedback, the courage to be curious allows you to pause, understand the root cause, and pivot,” Hickey adds. Ultimately, she says, that will help you turn every digital disruption into a chance to learn, adapt, and succeed—now and in the future. As for Rogerson’s other cited trait: “The ability to actually push beyond the immediate task [is something] humans can do, [but Al doesn’t],” Melissa Swift, CEO of organizational consulting firm Anthrome Insight and the author of Effective: How to Do Great Work in a Changing World , tells Fast Company . “AI can do what it is programmed to do. “You want people to have that mental flexibility and stretch,” Swift says. “It’s important because it is absolutely impossible to have a definition of work that keeps up with the fast pace of technology. “When people own outcomes, they execute completely differently. When people own tasks , you get what you assign.”
- AI drives paradigm shift in subsurface hydraulic fracturing engineering
AI drives paradigm shift in subsurface hydraulic fracturing engineering EurekAlert!
- Firm AI: How Intapp’s agentic platform is transforming professional firms
Firm AI: How Intapp’s agentic platform is transforming professional firms USA Today
- Beyond Identity launches Ceros AI agent security platform
Beyond Identity Inc. today debuted a platform called Ceros that is designed to help enterprises secure their artificial intelligence agents. New York-based Beyond Security is backed by more than $200 million in funding. Its flagship product is an IAM, or identity and access management, platform that companies use to regulate employee access to their applications. […] The post Beyond Identity launches Ceros AI agent security platform appeared first on SiliconANGLE .
Score: 36🌐 MovesJun 16, 2026https://siliconangle.com/2026/06/16/beyond-identity-launches-ceros-ai-agent-security-platform/ - AI Search May Be Sending You Better Leads Than You Know
The old SEO scoreboard misses how today’s highest-intent buyers often make up their minds prior to landing on your site.
Score: 36🌐 MovesJun 16, 2026https://www.inc.com/kevin-c-roy/ai-search-may-be-sending-you-better-leads-than-you-know/91360641 - Deeplink to your AI coding tool from Jira
Deeplink to your AI coding tool from Jira Atlassian
Score: 36🌐 MovesJun 16, 2026https://www.atlassian.com/blog/development/deeplink-to-your-ai-coding-tool-from-jira - Meet Atoms: A Vibe Coding Tool That Uses AI Agents to Build, Deploy, and Market Your App (No Code)
Meet Atoms: A Vibe Coding Tool That Uses AI Agents to Build, Deploy, and Market Your App (No Code) MarkTechPost
- CERT-In calls for AI-assisted security testing, faster patches
CERT-In calls for AI-assisted security testing, faster patches
- Cosentus sees India centre as a key AI pillar for US healthcare sector
Cosentus sees India centre as a key AI pillar for US healthcare sector YourStory.com
- This AI robot startup thinks humanoids are overrated
This AI robot startup thinks humanoids are overrated Business Insider
Score: 36🌐 MovesJun 16, 2026https://www.businessinsider.com/genesis-ai-unveils-robot-with-no-head-or-legs-2026-6 - AI Can Play 'Significant' Role in Tech for Managing Diabetes, Says David Roman
David Roman, US medtech and healthcare IT research analyst for Goldman Sachs, said that AI play a 'significant' role in helping patients stay on top of their diabetes, with companies like Abbott and Dexcom using new technology to count carbohydrates and deliver the right amount of insulin. Roman, joined by Bloomberg health reporter Madison Muller, said that he came away from Goldman Sach's annual medtech conference 'very bullish' on innovation across the industry. (Source: Bloomberg)
- Nvidia’s Jensen Huang says society needs ‘new social norms’ in the age of AI
Nvidia CEO Jensen Huang – whose work helped enable artificial intelligence – stressed in an interview on Tuesday that society has no choice but to change in the advent of AI. Huang has been optimistic about the technology’s potential to rapidly change society, creating faster economic growth and more scientific breakthroughs. But as the head of a computer chip company now developing AI systems, Huang has felt obliged to respond to critics who warn of job losses and threats to humanity...
- A Test Suite for Concepts
Lately I’ve been spinning up on natural abstractions , and in particular on John Wentworth ’s work on natural latents . As I’ve been studying, I’ve noticed some big gaps in the existing literature. Some of my biggest questions have not been answered by existing blog posts and writeups. One of my grumps about the existing body of work has to do with the typology of concepts, and the representative examples we’re using for that typology. If we’re going to do a lot of work to talk about concepts using math, I’m going to want to work a bunch of concrete examples to some level of precision. So far I’m not happy with the list of examples, and I’m not happy with the level of hand-waving in tying the math back to the various kinds of examples. It seems to me that there are a lot of different kinds of concepts. Some concepts are “more abstract” than others – or to put it another way, some concepts map back very clearly to the physics of our universe, while others seem more fuzzy, hard to pin down, and maybe not “natural” at all. Some concepts are big clusters containing lots of varying examples; some attempt to capture one instance of a thing. Some concepts have to do with relationships between other concepts. Some concepts are reflective. And so on. I think it would be a mistake to try to build a full concept typology at this point. Ideally you want the structure of the environment you’re modeling to dictate the concept typology, not the other way around. That said, I do long to have set of example concepts to draw from as I work through some of my questions about the natural latents math – and for that set to span a bunch of different types of concepts. So I’ve cheated and used my own experience as an agent thinking about concepts to guess at some important and interesting concept types. In this post I’ll give some probably-familiar background about what we mean by concepts, and then I’ll gesture vaguely in the direction of what we need in our concept typology. Concepts that Bind to Reality This section is a brief foundational primer; there’s nothing new here. Readers already familiar with the existing literature on natural abstractions can skip to the next section. Here we have two dudes looking at, and thinking about, a tree. (One of the dudes happens to be a human and one happens to be a robot.) We want to know: Do they each think about “the tree” as, like, a thing? If they try to talk to each other about the tree, will that work? Will they be talking about the same thing? Basically – do they pluck out the same concepts from their environment? How? Why? How reliable is that? What are the preconditions? etc. Why do we care? We care because: We hope to understand how AIs work. How do they represent and manipulate concepts, including fairly sophisticated concepts? What are they thinking about at any given time? This is a fairly deep version of mechanistic interpretability. Done well, it would go way beyond locating the Eiffel Tower neuron in a neural net and let us capture much more complicated thought patterns. We hope to communicate effectively with AIs. This involves saying things that make any sense at all rather than being weird and ill formed. [1] It also involves saying things that the AI understands the same way we do. [2] Now let’s define the terms in the section title. “Reality” By “reality” we mostly mean physics things. States of matter in the universe. Or at least, that’s where we start. “Concepts” By “concepts” we mean ideas that live in the mind of an agent (human, alien, AI). [3] We are (almost never) specifying a state of matter very precisely, so concepts are (almost always) higher level, more abstract or categorical than that. Do you (think you) know approximately what a “dog” is, or could you at least pick one out of a lineup of various mammals? Then you have a “dog” concept. There are some kinds of concepts we’re not talking about, at least not yet. We’re punting on parts of speech other than nouns and noun phrases, because nouns alone are going to be plenty of work. We’re also not going to get into concepts that don’t bind to physics-reality at all but are still interesting – for example, we won’t talk about mathematical concepts from group theory. “Bind to” By “bind to,” we mean creating reliable and consistent mappings between concepts and reality. We don’t need to bind incredibly precisely – having some sloppiness here is, in one sense, the point; abstraction necessarily involves loss of precision. And we expect different minds to have different concepts for lots of reasons. The most obvious is that they may have been exposed to different environments. But holding the environment constant, they may have had different experiences of that environment, different sensory apparatus, or they may just be bad at reasoning, inference, or generalization. Most agents are not ideal! When we say that a concept binds to reality, we’re claiming that the agent can derive solid predictive power from that concept. Their idea of a “tree” captures some fundamental tree-ness that allows them to recognize other examples of trees and make correct predictions about the properties of those new trees. We’re also saying that the agent has gone beyond memorization of multiple individual examples and they’ve generalized, they’ve captured some structure in the environment and encoded it. Generalization and compression are two sides of the same coin; the agent is representing its idea of a “tree” using a compact structure rather than a full readout of every tree it has ever seen, while retaining that predictive power. The case for building a half-assed concept typology with representative examples In their work on natural latents, John and David use a few examples repeatedly. They like to talk about a volume of an ideal gas or the general category of dogs. Sometimes they talk about teacups, biased coins, or Ising models. They like trees. (I guess I like trees too. I opened with a tree example.) They very rarely talk about anything super abstract and fuzzy, like “friendship” or “loyalty” or “beauty” or “goodness.” And yet, a lot of the discourse [4] is about these sort of fuzzy human values, the sorts of things that might end up in a human’s CEV, and be relevant to broader alignment questions. I’d like to do a little bit – but not a lot – better with concept typology. I’m not looking to reinvent the field of semantics from scratch as a side gig, nor do I want to be so fiddly with this that I end up trying to choose the ontology myself; that never works . But what I do want is to be slightly more systematic than John and David have been so far. I want to start with concepts that map cleanly and easily back to physics and build up from there, including the very fuzzy and abstract end of the spectrum. I want to do a better job with the category vs. instance distinction. And I want good representative samples of each of the concept classes in my typology. More importantly, when we get around to actually constructing (possibly-natural) latents for these concepts, I’d like to do that a little more slowly and carefully, with moderately less handwaving. I want to do this mostly to prove to myself that I can , that I’ve actually understood how all of this machinery is supposed to work. And as we build out new bits of machinery to work with (natural) latents, I want to have a sort of test suite of examples to run through, to make sure everything works, kind of like unit tests in software. My initial brainstorm of example concepts Here’s what I’ve got so far. Well, first off, there’s a wide world of parts of speech. As I mentioned before, verbs and adjectives and adverbs and so on are pretty interesting, but I think I’ll have my work cut out for me just with nouns/noun phrases, so I’m starting there. There’s also a wide world of relationships between concepts, like time, causality, locality, and so on. I’m ignoring all that for now too. Within nouns, I’m definitely interested in objects – categories of objects and also specific individual objects. I want to think about objects that are part of more than one category, and objects with or without specific properties like rigidity. I’m interested in biological entities with at least some agentic properties. Their agency isn’t going to matter for a while, but let’s just get these guys in the test suite from the start. And yes, I want to spend at least a little time on very abstract concepts, perhaps ones dealing with how agentic beings interact with each other. So that led me to the following list for my nascent concept typology test suite: objects (categorical and individual) approximately rigid-body the category of balls (as in, round objects good for throwing) a specific ball affectionately known as Bluey the category of oranges (the fruit) an enclosed volume of ideal gas agentic beings (categorical and individual) dogs in general my (fictitious) beloved specific dog Fido concepts with much less direct relationships to states of matter the pecking order in a flock of chickens monogamy consciousness This will probably not be enough! Not even close! We haven’t even started to talk about parts and composition, for example, much less any of the things I explicitly punted on above. But it’s a start, and these are the examples I’ll come back to in my next few posts about concepts. Your nominations for additions to my list will be considered, and frankly, probably discarded, because wow there’s a lot of work to do as it is. But please go ahead and make ‘em anyway, if you like. ^ This is one aspect of the Pointers Problem : “Some of the things I value may not actually exist - I may simply be wrong about which high-level things inhabit our world.” ^ See also: Interoperable Semantics . ^ What kind of agent? In brief, an embedded agent . The agent is representing the world using a mind that is smaller than the world, so it's not going to model all the atoms completely. It doesn't have clean I/O with the world, just various sensory data that is probably heavily filtered and aggregated, comes from a certain viewpoint, etc. And, while it's not very cruxy at the moment, the agent also needs to model other agents in the world and communicate with them. ^ Like this comment , for example, in which Eliezer is concerned that an AI might not share any reflective concepts with humans at all, or this post , in which Charlie Steiner is concerned that concepts for human values will be too numerous. Discuss
Score: 36🌐 MovesJun 16, 2026https://www.lesswrong.com/posts/aHmyKpGqhTTJg9Tsi/a-test-suite-for-concepts - Qualcomm Stock Shakes Off Smartphone, PC Fears as AI Chip Excitement Grows
Qualcomm Stock Shakes Off Smartphone, PC Fears as AI Chip Excitement Grows Barron's
Score: 36🌐 MovesJun 16, 2026https://www.barrons.com/articles/qualcomm-stock-price-ai-chips-d878e040?mod - Can AI help us age better? Bay Area scientists are trying to find out
Can AI help us age better? Bay Area scientists are trying to find out The Mercury News
Score: 35🌐 MovesJun 16, 2026https://www.mercurynews.com/2026/06/16/can-ai-help-us-age-better-bay-area-scientists-are-trying-to-find-out/ - Microsoft sued by shareholders over expenses, cloud business, AI
About $357 billion of market value was erased, and Microsoft's stock suffered its biggest one-day decline in nearly six years.
- German security tech firm sets up shop at Mila to develop new AI tools
The post German security tech firm sets up shop at Mila to develop new AI tools appeared first on The Logic .
Score: 35🌐 MovesJun 16, 2026https://thelogic.co/news/exclusive/mila-giesecke-devrient-collaborate-ai-tools/ - AI Security Must Focus On Recovery, Not Just Prevention: Rubrik’s Vipul Nayak
The global B2B technology landscape is at an inflexion point as the sector is now pivoting from traditional software architectures…
Score: 35🌐 MovesJun 16, 2026https://inc42.com/buzz/ai-security-must-focus-on-recovery-not-just-prevention-rubriks-vipul-nayak/ - Anthropic’s Dire Marketing Worked Too Well
Plus, SpaceX makes Cursor deal, and AI struggles to replace stenographers.
Score: 35🌐 MovesJun 16, 2026https://www.wsj.com/tech/ai/anthropics-dire-marketing-worked-too-well-edb5928d?mod=rss_Technology - RAG Questions Need Parsing Too: Turn the User’s String Into Briefs for Retrieval and Generation
Enterprise Document Intelligence [Vol.1 #6a] - Why a user question deserves the same parsing as the document, and how it splits into a retrieval brief and a generation brief before either runs The post RAG Questions Need Parsing Too: Turn the User’s String Into Briefs for Retrieval and Generation appeared first on Towards Data Science .
Score: 35🌐 MovesJun 16, 2026https://towardsdatascience.com/question-parsing-in-rag-structure-before-you-search/ - AppViewX targets ungoverned AI agents with new identity security product
AppViewX Inc. today launched Agent Identity Security, a product that discovers, governs and monitors artificial intelligence agents across enterprise environments as autonomous software increasingly operates on sensitive systems without human oversight. The product extends the AppViewX platform, built on the company’s machine identity and public-key infrastructure tools, into AI agent security. It gives security teams […] The post AppViewX targets ungoverned AI agents with new identity security product appeared first on SiliconANGLE .
Score: 35🌐 MovesJun 16, 2026https://siliconangle.com/2026/06/16/appviewx-targets-ungoverned-ai-agents-new-identity-security-product/