AI News Archive: May 27, 2026 — Part 9

Sourced from 500+ daily AI sources, scored by relevance.

Signos grows foothold in weight loss wave fueled by GLP-1s with its AI health data tracking
Health tech startup Signos announced a $20 million funding round Wednesday, including an expanded partnership with medical device giant Dexcom.
Score: 32🌐 MovesMay 27, 2026https://www.cnbc.com/2026/05/27/ai-health-tech-signos-dexcom.html
Razorpay Brings Payment Command Line Interface to India; Built for Developers and the AI Agent Era
Towards a commitment to building an AI-ready ﬁnancial infrastructure, Razorpay, India’s Omnichannel Payments Platform for Businesses, today announced the launch of the Razorpay Command Line Interface (CLI) in India to let developers manage payments directly from where they write code, without switching tabs or logging into dashboards. In simple terms, developers and AI builders can […] The post Razorpay Brings Payment Command Line Interface to India; Built for Developers and the AI Agent Era appeared first on CXOToday.com .
Score: 32🌐 MovesMay 27, 2026https://cxotoday.com/media-coverage/razorpay-brings-payment-command-line-interface-to-india-built-for-developers-and-the-ai-agent-era/?utm_source=rss&utm_medium=rss&utm_campaign=razorpay-brings-payment-command-line-interface-to-india-built-for-developers-and-the-ai-agent-era
Understanding the density maximum of water with machine-learned potentials
Science Advances, Volume 12, Issue 22, May 2026.
Score: 31🌐 MovesMay 27, 2026https://www.science.org/doi/abs/10.1126/sciadv.aec6748?af=R
‘Lobotomized’: Character.AI Is Showing What AI Enshittification Looks Like
Ads everywhere. Usage limits. Frustrating guardrails. Less model choice. Users of the Character.AI chatbot app are revolting after a series of changes they say have made the app worse.
Score: 31🌐 MovesMay 27, 2026https://www.404media.co/lobotomized-character-ai-is-showing-what-ai-enshittification-looks-like/
AI is often hailed as the hero of app modernisation, but observability is critical
At a roundtable in New York hosted by GlobalData and Hexaware, the challenges and opportunities of enterprise AI were discussed, writes GlobalData's Rena Bhattacharyya.
Score: 31🌐 MovesMay 27, 2026https://www.techmonitor.ai/partner-content/ai-is-often-hailed-as-the-hero-of-app-modernisation-but-observability-is-critical
From video understanding to edge deployment Om AI targets real-world AI
In the current phase where competition in large models is shifting from parameter scale to real-world deployment capability, a group of Chinese companies focused on edge AI is gaining attention, and Om AI Technology is one of them. Founded in 2021, the company has chosen not to pursue extremely large cloud-based models, but instead focuses […]
Score: 31🌐 MovesMay 27, 2026https://technode.com/2026/05/27/from-video-understanding-to-edge-deployment-om-ai-targets-real-world-ai/
Full automation of AI R&D probably yields a large speed up even without a software-only singularity
This is a somewhat technical note. By "software-only singularity", I mean that, after full automation of AI R&D, progress gets faster and faster due to smarter AIs driving increasingly fast rates of improvement in algorithms (overcoming diminishing returns), and that this lasts long enough to yield a large amount of progress (e.g. at least 4 years of progress in 1 year). The equivalent statement in jargon is: r is significantly greater than 1 (implying progress is getting faster and faster) and this remains the case for long enough to get large amounts of progress. For context, see How quick and big would a software intelligence explosion be? Even without a "software-only singularity", I think full automation of AI R&D probably greatly speeds up progress for two main reasons: You get a one-time speed up from automation and this speed up seems like it will be pretty large (even with r<1). See How quick and big would a software intelligence explosion be? for discussion and see the AI Futures Model for an end-to-end model that naturally incorporates this effect. Quantitatively, with my median parameters but r=0.7, the model from How quick and big would a software… indicates you get 3.5 years of progress in the first year after full automation of AI R&D while assuming you aren't scaling up compute at all in this period. This is a huge amount of progress! To be clear, this is somewhat more progress than I actually expect in this situation conditional on this value for r, but it's still relevant evidence about the size of this effect. [1] Even after the one-time speed up, increasing the available quantity of compute now has larger returns than it did when humans were the core source of AI R&D labor. When humans were the bottlenecking source of AI R&D labor, increases in compute let you run more/bigger experiments and train larger AIs. After AIs have fully automated AI R&D, additional compute can still be used for experiments and training, but it also yields improvements to the AI labor force doing AI R&D (making them smarter/faster/cheaper-to-run), which means all the compute will now be better utilized. Also, additional compute can be used to run more of these AI laborers (and potentially run them considerably faster depending on chip improvements). This is a feedback loop: these better AIs do better experiments that yield better AIs and so on. Even if this software-only feedback loop is subcritical (which we're conditioning on in a "no software-only singularity" scenario), it still means every increase in compute will now drive more progress. I haven't yet found a nice and clean way to model this in isolation, but I suspect this effect is large, perhaps doubling, tripling or even quadrupling the rate of progress you would have otherwise seen (without AIs automating AI R&D) given some rate of compute increase. [2] That is, until you get sufficiently close to algorithmic limits that the returns curve looks substantially less favorable. (This will depend on r. If r is close to 1, the feedback loop is almost critical, so a small proportional increase in compute drives a huge amount of additional progress. But even if r is only 0.5, I currently tentatively expect this feedback loop makes progress a bit more than 2x faster assuming my default guesses at some other parameters.) We can also analyze this by looking at an example trajectory in the AI Futures Model that barely misses a software-only singularity and seeing how fast progress is after full automation of AI R&D. This trajectory involves a little over 2 years of progress in the year after full automation of AI R&D (SAR). This corresponds to going from full automation of AI R&D (SAR) to Top-human-Expert-Dominating AI (TEDAI) [3] in a bit less than a year, which is a lot of progress. (Quantitatively, it involves going from a 24x AI R&D software acceleration to a 270x AI R&D software acceleration in a year.) I suspect the AI Futures Model modestly underestimates takeoff speeds and one-time acceleration effects due to effectively acting as though AI speed and quantity don't matter outside of coding automation. [4] There are other (indirect) reasons AI progress might speed up around when AIs automate AI R&D: This level of AI capability might drive above-recent-trend investment and revenue that allows for buying more compute. If one company pulls significantly ahead (and especially if it had fully automated AI R&D [5] ), that company might be able to more easily get the compute of other trailing companies (by buying it from the trailing company or by waiting for those companies to collapse) [6] . AIs might speed up hardware R&D (developing better chip designs, accelerating fab research, building more fabs faster) around this point. One important caveat is that by the time AIs automate AI R&D, the rate of compute scaling may be substantially lower than it is today. Thus, the default/trend rate of AI progress may be lower, so the corresponding acceleration would be relative to a lower baseline. This is directly applicable for the "further compute has increased returns" argument and maybe has a modest effect on the size of the one-time speed up (the size of the one-time speed up is sensitive to how much returns from further labor effort have diminished at a given level of compute). If I remember correctly, this model effectively acts as though you go from no automation acceleration directly to full automation, while in practice earlier AIs will substantially accelerate AI R&D, meaning that returns to effort will already have substantially diminished by the point you reach full automation. As in, full automation will be a large acceleration relative to a human-only baseline, but a relatively smaller acceleration relative to AIs that existed 6 months before full automation, so much of the low-hanging fruit will already be plucked. You can model this in an ad hoc way by reducing the initial speed-up parameter such that it corresponds to the speed-up over AIs that existed 8 months prior to full automation; with my parameter guesses, this yields around 2.5 years of progress in the first year. (There is a gradual boost setting that smooths out the automation returns over a longer period, but I think this period is unrealistically long such that you don't see one-time speed-up effects.) ↩︎ Historically, progress has been driven by both scaling up compute and scaling up labor. However, I expect scaling up labor has been a small fraction of the effect in recent years. Compute for algorithms and training has been scaled up by around 4x per year while company employee count has 3x'd each year. But employee count 3x'ing is way worse than making all employees operate 3x faster due to a diminishing labor pool, (mostly one-time) onboarding costs, and parallelization penalties (while 4x more compute at current margins is pretty close to as good as getting compute that's 4x serially faster). I think the discount from a diminishing labor pool and from onboarding makes the 3x increase in the number of employees roughly as good as a "free" 2x increase in employee count at equal quality. Then, the parallelization penalty further reduces this 2x increase to being as valuable as having existing employees operate ~1.3x faster. Thus, I expect the labor increase is much less important than a 4x increase in compute. So it's fair to model the large majority of recent progress as being driven by increases in compute, where the value mostly comes from being able to run more experiments. ↩︎ TEDAI: AIs which strictly dominate top human experts in virtually all cognitive tasks (i.e., doable via remote work). ↩︎ This is in part because it doesn't model shifting to research directions that are more effective in the low-compute but plentiful-labor regime. ↩︎ Fully automated AI R&D makes moderate advantages more likely to be stable/predictable because now the labor part of AI R&D is likely commoditized and similar between companies (reducing variance). However, maintaining a lead ultimately requires maintaining a compute advantage (a large software lead can probably be converted into a compute advantage): if a trailing company had more compute and was able to hold on to a compute advantage (despite the potentially decisive advantages of the leading company), we should expect them to eventually catch up and overtake because labor is commoditized after full automation. I suspect it will be hard for significantly trailing companies to maintain a compute advantage if the leading company pulls far ahead on software due to speed ups from AI R&D. In the most extreme case, the leading company (or the AIs of the leading company) might literally take over the world, neutralizing prior compute advantages of trailing companies. ↩︎ Investors might be incentivized to pressure the trailing company to sell their compute to the leading company even if the leadership of the company isn't inclined to do this. Investors have limited power so this isn't clearly sufficient, but a deal could be designed to give the leadership of the trailing company additional power or possibly financial upside, so that they are incentivized to sell. Also, the leading company might just end up being extremely powerful, in the limit literally fully taking over the world. ↩︎ Discuss
Score: 30🌐 MovesMay 27, 2026https://www.lesswrong.com/posts/jfwhvd43sbpkGTLyn/full-automation-of-ai-r-and-d-probably-yields-a-large-speed
The Promises and Pitfalls of Deep Kernel Learning
The Promises and Pitfalls of Deep Kernel Learning repository.cam.ac.uk
Score: 30🌐 MovesMay 27, 2026https://www.repository.cam.ac.uk/items/20ad3e06-ff41-417b-bd3a-fac5e91a8697
Sundar’s Secret Edge & Waymo Road Rage
Sundar’s Secret Edge & Waymo Road Rage Puck
Score: 30🌐 MovesMay 27, 2026https://puck.news/newsletter_content/sundars-secret-edge-waymo-road-rage/
US stocks hit fresh records, AI rally pauses
US stocks hit fresh records, AI rally pauses The Straits Times
Score: 30🌐 MovesMay 27, 2026https://www.straitstimes.com/business/companies-markets/us-stocks-hit-fresh-records-ai-rally-pauses?ref=latest
Reproducing Large Speech Foundation Models for Open Science
A:Prof. Shinji Watanabe; TT:Guest lecture; RL:Language, Speech & Vision;
Score: 30🌐 MovesMay 27, 2026https://ai.kuleuven.be/events/reproducing-large-speech-foundation-models-for-open-science-1
West Virginia Agency Rolls Out AEO and AI Services
West Virginia Agency Rolls Out AEO and AI Services USA Today
Score: 30🌐 MovesMay 27, 2026https://www.usatoday.com/press-release/story/33337/west-virginia-agency-rolls-out-aeo-and-ai-services/
The case against AI personhood
The case against AI personhood
Score: 30🌐 MovesMay 27, 2026https://www.politico.com/newsletters/digital-future-daily/2026/05/27/the-case-against-ai-personhood-00938508
Occupy Wall Street Co-Founder Built an AI App to Help Activists Seize the Means of Computation
A chatbot with a library of activist literature in your back pocket.
Score: 30🌐 MovesMay 27, 2026https://gizmodo.com/occupy-wall-street-co-founder-built-an-ai-app-to-help-activists-seize-the-means-of-computation-2000762031
Rocket Doctor AI Announces Global Company Town Hall to Showcase 2025 Milestones; Q1 2026 Milestones and Strategic Vision
Rocket Doctor AI Announces Global Company Town Hall to Showcase 2025 Milestones; Q1 2026 Milestones and Strategic Vision Toronto Star
Score: 30🌐 MovesMay 27, 2026https://www.thestar.com/globenewswire/rocket-doctor-ai-announces-global-company-town-hall-to-showcase-2025-milestones-q1-2026-milestones/article_ac686034-85e9-5f39-b93d-98cd0cc25752.html
Award-winning author Ken Liu is not afraid of AI slop
Liu is fine with AI coding, but finds the notion of it writing stories for him “revolting.” The post Award-winning author Ken Liu is not afraid of AI slop first appeared on BetaKit .
Score: 30🌐 MovesMay 27, 2026https://betakit.com/award-winning-author-ken-liu-is-not-afraid-of-ai-slop/
'Adversaries are no longer just targeting products, they're targeting the developers who build them': CrowdStrike takes down major botnet targeting developers across the world
The Glassworm botnet is no more, thanks to coordinated efforts between CrowdStrike, Google, and the Shadowserver Foundation.
Score: 30🌐 MovesMay 27, 2026https://www.techradar.com/pro/security/adversaries-are-no-longer-just-targeting-products-theyre-targeting-the-developers-who-build-them-crowdstrike-takes-down-major-botnet-targeting-developers-across-the-world
😺 🎙️ Watch: Is Brain-like Computing What's Next?
PLUS: Three new interviews we think you'll love
Score: 30🌐 MovesMay 27, 2026https://www.theneurondaily.com/p/watch-is-brain-like-computing-what-s-next
Flying-Car Startup Volant Raises $147 Million Ahead of Potential IPO
Flying-Car Startup Volant Raises $147 Million Ahead of Potential IPO Caixin Global
Score: 30🌐 MovesMay 27, 2026https://www.caixinglobal.com/2026-05-28/flying-car-startup-volant-raises-147-million-ahead-of-potential-ipo-102448361.html
Peak XV, Activate in talks to invest in Silicon Valley AI dictation startup Wispr Flow
Founded by Stanford graduates Tanay Kothari and Sahaj Garg, Wispr Flow develops a voice dictation tool that allows users to speak into apps and workflows instead of typing. The latest funding is likely to nearly triple its valuation to $2 billion, per people in the know.
Score: 30💰 MoneyMay 27, 2026https://economictimes.indiatimes.com/tech/startups/peak-xv-activate-in-talks-to-invest-in-silicon-valley-ai-dictation-startup-wispr-flow/articleshow/131333228.cms
ElevenLabs is bringing Stan Lee back from the dead with AI voice cloning and digital cameos
ElevenLabs has struck a deal with Stan Lee Universe to add Stan Lee’s voice and likeness to its platform, making the late Marvel co-creator the latest deceased cultural icon to be digitally resurrected by AI. The partnership puts Lee’s voice on the ElevenLabs Iconic Marketplace for commercial licensing and on the Eleven Reader app, where fans […] This story continues at The Next Web
Score: 30🌐 MovesMay 27, 2026https://thenextweb.com/news/elevenlabs-stan-lee-voice-likeness-ai
Long Island University Launches AI and Innovation Center
A private university in New York aims to integrate recent AI initiatives into a cohesive center for education and research, offering different degrees and integrating AI into various fields from healthcare to business.
Score: 30🌐 MovesMay 27, 2026https://www.govtech.com/education/higher-ed/long-island-university-launches-ai-and-innovation-center
The Codex feature that works while you sleep
Watch now | 🎙️I break down the /goal feature in Codex, including a live demo, three real use cases, and the 6-part framework for writing goals that actually run
Score: 30🌐 MovesMay 27, 2026https://www.lennysnewsletter.com/p/the-codex-feature-that-works-while
Announcing Geodesic Research
We 're a Cambridge, UK-based AI safety organisation that’s asking: how can we build the most robust alignment initialisations for capable LLMs ? We’re one of the few non-profit organisations positioned to answer this question empirically. We have the engineering experience, and now the compute, to conduct data intensive interventions across the model training pipeline. This post lays out our research agenda and theory of change, and what we are looking for in technical hires. Applications are open here . Research agenda TLDR: Long-horizon capabilities RL may be the most critical source of misalignment. Misalignment instilled during capabilities RL may be difficult to remove afterwards. Geodesic Research’s mission is to develop the science of providing robustly-aligned initialisations for RL, where alignment priors persist through the remainder of training. Our seminal work on alignment pretraining showed that you can bake alignment priors into base models. Frontier labs are now using these techniques in production: for example, Anthropic's recent work heavily leans on improving alignment priors. But it’s clear that, in the face of production post-training, alignment pretraining is not a one-size-fits-all solution . So now, we are framing pre- and midtraining interventions within the rest of the model training stack. The evidence points towards extended reinforcement learning being a likely cause of alignment failures at the frontier. RL is liable to select for undesired cognitive and behavioural habits, such as metagaming , sycophancy , apparent-success seeking , or taking unsanctioned actions to complete tasks . Models that learn these behaviours may also become broadly misaligned . In fact, these degradations have already been noted in replications of alignment pretraining , and Evan Hubinger lists this as one of the core reasons alignment remains a hard and unsolved problem . Apollo Research's recent update makes a similar diagnosis; they are now studying whether misalignment scales unfavourably with RL. Once a model has learned misaligned behaviours or goals, they may be difficult to remove with subsequent training, and more advanced models may be able to guard them from removal . For this reason, we believe it’s important to avoid their formation in the first place. Our current research focus is on building a robust initialisation for alignment. What we believe the field has not yet seriously tested [1] is the size of the lever that comes before capabilities RL. Specifically: how far does a good initialisation, built through midtraining and early post-training, get us in resisting these failure modes? Can we dig out a basin of alignment prior to RL, and expect it to keep models robust to perturbations that come later? Concretely, we are looking for midtraining and warm-start SFT [2] mixes to create models that: (i) avoid exploring into misaligned cognitive patterns [3] in the first place, (ii) are resistant to learning unwanted cognition when trained on them (iii) avoiding generalising undesired cognition to novel deployment scenarios, and (iv) do not become egregiously misaligned through emergent-misalignment-style entanglements in persona features. To study this question directly, we are beginning a stress-testing exercise of various alignment techniques as robust initialisations to capabilities RL. We replicate the mid-training and SFT stack on large open-weights base models [4] , then subject them to agentic production RL to study their resistance to misalignment. Since alignment pretraining , we have also been studying how we can shape and mitigate misalignment generalisation via mid-training interventions alone. Theory of Change Our theory of change is centered on our impact on training practices used by the frontier labs. We are focused on alignment in short-timeline worlds, in which these major players have an outsized influence on the future of humanity. This shapes the research projects we choose. As a result, we investigate simple, data- and compute-heavy interventions that can be profiled, packaged, and handed off to the labs; we take the shortest path [5] to advising on their training stacks. Our work is enabled by a generous philanthropic grant made by Coefficient Giving (pending final logistics). This grant provides: A non-trivial slice of compute: CG's grant, alongside a compute partnership with the UK AI Security Institute, gives us the runway to do full-stack training of large models; and in turn, make compelling cases to labs on which alignment stacks are most effective. Epistemic independence: we have no commercial stake in any particular alignment method. We can investigate the full picture and publish recommendations for any of the frontier labs to integrate. The team Our founding team consists of: Cameron Tice - executive director Puria Radmard - technical director Alexandra Narin - head of operations Kyle O’Brien - founding member of technical staff Edward James Young - founding member of technical staff We will soon be joined by Nathalie Kirch and Nathaniel Mitrani as members of technical staff and are hiring 4 further MTS. Applications are open here . Broadly, we are excited for candidates with significant ML engineering and research experience who can make rapid empirical research progress and help shape our broader research agenda through their own inside views on alignment. We are advised by Tomek Korbak (OpenAI), Alex Turner (Google DeepMind), and Alex Cloud (Anthropic). David Demtri Africa acts as our research sponsor for the UK AISI. This team helps shape our research directions and experiment design. Concretely, this looks like sharing the salient alignment threat models they and their colleagues have, and how to design experiments that scale ( up and down ) such that they’re sufficiently persuasive to be picked up and studied in-house. We believe that well-executed collaborations with researchers operating at the frontier can enable us to conduct useful research even when external to these organisations. FAQs Q : Where do you source your compute from? A : We have access to the Isambard supercomputer, a cluster of ~5k GH200s that supports UK-based research organisations. We also plan to acquire a supplementary cluster in the coming months. Q : Are members of technical staff required to work from Cambridge? A : We have a strong preference for our team working out of Cambridge. Being early in our organisation, we’ve found in-person collaboration to give a substantial uplift to productivity. We may be open to remote or hybrid roles for exceptional circumstances. Q : Why don’t you work at the frontier labs? A : We considered this! We decided to remain independent because we think this is where we’ll have the greatest counterfactual impact. We think it is important to conduct ambitious alignment research openly. This allows us to share all the details of our research and provide public research artefacts that are useful to both alignment researchers within labs and to the broader community. We think that, if we execute well, the net benefit of openness and transparency outweighs the headwinds of operating outside of the labs. That said, this is a genuine challenge. We aim to address this concern by actively engaging with frontier lab researchers and seeking their input on our research directions. We have been able to shape our approach to empirical research and research taste informed by frontier researchers without access to non-public information. If you happen to be working on alignment research at a frontier lab and have research ideas you would like us to conduct in the open, please get in touch! Q : How excited are you at this stage to pursue a wide portfolio approach of agendas, such as mechanistic interpretability and red-teaming? A : We are inspired by the General Manager / DRI framing. That is, we view the Geodesic mission as solving a specific problem: how to provide the most robust alignment initialisation for RL post-training. We expect that all directions we pursue will be guided by this mission. Therefore, we expect Geodesic to have a relatively narrow focus in the short term. Acknowledgements We’d like to thank our parent organisation Meridian Impact , namely Hannes Whittingham, Adam Reynolds, and Olivia Benoit, for their help in starting up and now spinning out to become our own entity. JueYan Zhang and the AGI Safety Tactical Opportunities Fund provided essential seed funding. AISTOF’s support enabled us to hire Alexandra and Kyle, which was crucial for scaling our research and securing longer-term funding. The ERA:AI fellowship connected Kyle and Nathalie to the rest of the team, and CAISH hosted the talk where Nathaniel met Puria. ^ Teaching Claude Why touches on the possible benefits of preparing a good initialisation for RL, but does not emulate an adversarial RL pipeline or describe their research with enough detail for replication. ^ We’re moving away from full pretraining runs. In alignment pretraining, we found that midtraining can do the job, and unlocks larger, more capable open-weights base models for us to build on. Generally, our research will focus on any stage of training that provides fully off-policy process supervision that yields support for finer control over behaviours and motivations distilled into the model. We also received advice from frontier labs that it is more tractable to integrate midtraining interventions than pretraining, which requires training models from scratch. ^ Here we specifically mean misaligned cognitive patterns that are likely to be rewarded or selected for. These include metagaming, training-gaming (both terminal and instrumental), or broader fitness-seeking . ^ We’re currently mostly using Nemotron 3 Super, a 120B-A12B MoE model with capabilities roughly equivalent to o4-mini. ^ Wow! Discuss
Score: 30🌐 MovesMay 27, 2026https://www.lesswrong.com/posts/xBbYGer8w45kxkaWr/announcing-geodesic-research
Cerebras CEO says AI 'as an industry' has done a terrible job of selling data centers: 'We ought to pay our own way
Cerebras CEO says AI 'as an industry' has done a terrible job of selling data centers: 'We ought to pay our own way Business Insider
Score: 29🌐 MovesMay 27, 2026https://www.businessinsider.com/cerebras-ceo-data-centers-popularity-messaging-2026-5
Eltropy Releases Safe AI Guide for Community Financial Institutions
Eltropy Releases Safe AI Guide for Community Financial Institutions azcentral.com and The Arizona Republic
Score: 29🌐 MovesMay 27, 2026https://www.azcentral.com/press-release/story/75829/eltropy-releases-safe-ai-guide-for-community-financial-institutions/
Meet The Doctor-Turned-Entrepreneur Using AI To Save Lives
Aengus Tran traded medical practice to build AI software that delivers quick and accurate diagnoses of X-rays and scans. Now, the 32-year-old CEO of Sydney-based Harrison.ai and a 30 Under 30 Asia alum, is targeting America’s overstretched healthcare system.
Score: 29🌐 MovesMay 27, 2026https://www.forbes.com/sites/zinnialee/2026/05/27/meet-the-doctor-turned-entrepreneur-using-ai-to-save-lives/
De Anza College Launches Associate Degree in Applied AI
A public community college in California will soon offer half a dozen new AI-focused credentials and an associate degree that covers the basics of AI, with a focus on responsible AI development and ethical practice.
Score: 29🌐 MovesMay 27, 2026https://www.govtech.com/education/higher-ed/de-anza-college-launches-associate-degree-in-applied-ai
Daily Digest: Plans at naval site clear hurdle, Robinhood's new AI stock tool
Meanwhile, the product manager role may face extinction by 2030, according to a new survey.
Score: 29🌐 MovesMay 27, 2026https://www.bizjournals.com/sanfrancisco/news/2026/05/27/sfbt-digest-wednesday-concord-us-navy-robinhood-sf.html?ana=brss_6150
The Sequence AI of the Week #867: Thinking in Latents: Why Sapient's HRM-Text Is a Quiet Rebuke to Chain-of-Thought
One of the most impressive small models recently released.
Score: 29🌐 MovesMay 27, 2026https://thesequence.substack.com/p/the-sequence-ai-of-the-week-867-thinking
Parameter Efficiency in Neural Networks for Speech Recognition and Spoken Language Understanding
A:Pu Wang; TT:PhD defense; RL:Language, Speech & Vision;
Score: 29🌐 MovesMay 27, 2026https://ai.kuleuven.be/events/capsule-networks-for-automatic-speech-recognition
Exploiting graph-based data representations for multispectral remote sensing image completion
Exploiting graph-based data representations for multispectral remote sensing image completion repository.cam.ac.uk
Score: 29🌐 MovesMay 27, 2026https://www.repository.cam.ac.uk/items/6603de9e-c4a1-49f2-b6f4-b1f18a33d055
After choking traffic, AI dents billings for Shiksha: Info Edge Q4 FY26 earnings
Downloads: — Info Edge Q4 investor presentation Shiksha’s website traffic was already collapsing due to AI-powered Search. Now, it is… The post After choking traffic, AI dents billings for Shiksha: Info Edge Q4 FY26 earnings appeared first on MEDIANAMA .
Score: 29🌐 MovesMay 27, 2026https://www.medianama.com/2026/05/223-choking-traffic-ai-dents-billings-shiksha-info-edge-q4-fy26-earnings/
Teachers aren’t getting formal guidance on AI, poll finds
Only 18% of K-12 teachers reported receiving formal guidance from administrators on how AI tools should be used across various classroom tasks.
Score: 29🌐 MovesMay 27, 2026https://www.semafor.com/article/05/27/2026/teachers-arent-getting-formal-guidance-on-ai-poll-finds
Introducing myself – AI-enabled service management
Introducing myself – AI-enabled service management Atlassian Community
Score: 28🌐 MovesMay 27, 2026https://community.atlassian.com/forums/JSM-News-Discussions-discussions/Introducing-myself-AI-enabled-service-management/m-p/3240439
Proof of Payment Verification Using Vision Language Models: A Practical Guide
From OCR to visual reasoning: using VLMs to detect fraudulent and invalid payment documents Continue reading on Towards AI »
Score: 28🌐 MovesMay 27, 2026https://pub.towardsai.net/proof-of-payment-verification-using-vision-language-models-a-practical-guide-a6daceab9c85?source=rss----98111c9905da---4
Don’t throw out BI and data analytics in the race for AI
Don’t throw out BI and data analytics in the race for AI IT Pro
Score: 28🌐 MovesMay 27, 2026https://www.itpro.com/technology/artificial-intelligence/dont-throw-out-bi-and-data-analytics-in-the-race-for-ai
Mint Explainer | Google vs OpenAI vs Anthropic: Who is really leading the AI race?
Google's current AI spending is among the largest in the world, and comes at a time when the battle between OpenAI and Anthropic, the two largest AI startups in the world, inches closer towards a trillion-dollar IPO face-off.
Score: 28🌐 MovesMay 27, 2026https://www.livemint.com/ai/google-ai-race-google-vs-openai-google-gemini-ai-startups-india-11779792924663.html
Motorola Solutions Sets Up AI Software ‘Hub’ in Boston
The public safety tech supplier says the project, focused on emergency response, involves much more than algorithms. Motorola Solutions recently bought a company that uses AI to sort non-emergency 911 calls.
Score: 28🌐 MovesMay 27, 2026https://www.govtech.com/biz/motorola-solutions-sets-up-ai-software-hub-in-boston
Girls Who Code CEO Tarika Barrett says AI skepticism can be a strength
For more than a decade, the nonprofit Girls Who Code has sought to help prepare young women for jobs in the tech industry and push for greater gender parity in computer science. The arrival of artificial intelligence , though, promises a new era of organization, one that involves wrestling with student pessimism about the technology—and a shift in what it even means to code . To say the least, many young graduates aren’t excited about working with AI. Instead, students—primed by tech executives who say their frontier labs stand to automate away many careers—are booing graduation speakers who bring up large language models (LLMs). Even computer science majors who still want to join the ranks of Silicon Valley face an uncertain future, since AI is rapidly reducing the number of coders that companies actually need . Another incommodious dynamic is that women, disproportionately, seem to be biased against using the technology. There are myriad reasons for this apprehension: Many are anxious about AI’s capacity to make errors, or are turned off by AI’s energy demands and its potential to supercharge the already-massive influence of tech billionaires. As a result, there seems to be a gap in AI usage, particularly along gender lines. Tarika Barrett , the outgoing CEO of Girls Who Code, knows her organization sits at the center of many of these tensions. When asked about uneasiness toward AI—particularly among women and girls—she says people shouldn’t disregard their real worries about the tech and should instead harness those concerns to guide their approach. “We have a deeply held belief that the quality of our technology, the future of AI in particular, depends on who’s going to build it,” says Barrett, who will be leaving the organization this summer . “It means that young people should be at the forefront, given its impact on every possible sector of our lives.” This interview has been edited for length and clarity. We hear a lot about vibe coding. When you think about Girls Who Code now, how do you think about coding itself? Among the things we brought to market . . . was actually vibe-coding programming for our young people. We saw it as yet another way to actually invite more people in. . . . This is a moment where we recognize that coding identity is shifting. At Girls Who Code, we’ve always been very nimble and are embracing it all. Yes, we’re going to expose our people to vibe coding, because that’s what you know is in the ether and they should understand what it looks like. We’ve always, as an organization, been more than just code. It was always about the fundamentals of computational thinking. It’s problem-solving, it’s collaboration. . . . Every field requires people who understand how to use technology and how to leverage it ethically and effectively. We know it’s no longer enough to learn how to code to break into tech. So much of it is learning how to learn and having that kind of ethos. Because yes, some of it is vibe coding, but someone has to check that vibe coding, right? How real is the AI gender usage gap? The way we think about it is intention . We know what the data are telling us. We see studies, like from Harvard Business School, that say women are adopting AI tools at 25% lower rate than men. . . . But we also know that it’s not just about not using the tools just because we’re slower or don’t want to do it. Women have reported feeling limited, feeling prohibited, and being uncertain about their employer’s AI policy. What happens when an employer has a policy that isn’t clear? The risk-takers are, like, let’s go. Folks who are approaching it with intention and care—less so, right? This often falls, I think, along gender lines. The other thing that was wild that we saw in our research was that participants were reporting that they were being actively discouraged from pursuing skill development that was unrelated to their work, which is really kind of crazy, because a lot of the AI stuff is out there. You’re kind of trying to get that information where you can. That was another barrier we saw. Think about social capital: Part of why we built this organization and have reached 860,000 students is that we are all about that sisterhood, that social capital, those connections—which we know end up being the ways that people get their first job. But we’re seeing it play out also in AI adoption, and who has knowledge and how that knowledge gets shared. Respondents [to our survey] very much saw the value of mentors, but as they moved up in their career trajectory, it was really challenging to sustain relationships. In the past, we heard a lot from tech companies about being more proactive about making sure that groups that are less represented in tech are getting a foot in the door. We have this new generation of companies like OpenAI, Anthropic, and other LLM labs. Do you feel like they’re similarly focused on getting girls and women into this industry? We had our Alumni Advisory Council come to our office. It’s a group of twentysomething women . . . they end up being kind of a huge resource for us. . . . If I listen to what they share with me about what they’re seeing, we’re not seeing that same passion that we saw before in terms of bringing everyone along. It’s not that I think that folks aren’t aware of it, but I think [it gets lost in] that AI arms race and initial [approach of], let’s just do it as quickly as possibl e. We’ll get to that after. This is not exactly the thinking that’s going to mean we have the kind of technology that we want to see. When we’ve seen young people come to the table—or especially folks from historically underrepresented groups—we have tech that actually meets our needs. That’s why, for us, this past year, it’s been an interesting line to walk. We recognize that young people’s attitudes around AI are very mixed, and we have large swaths of young people who are not that excited. You’re seeing this in the data as well. We recognize that exposure is also really critical. If we’re not careful, we could lose a whole generation of young people who were told that tech was the answer, right? Tech was infused in our schools and in our school system. It was all about partnerships with industry, because this was the future . If we’re not careful, those same young people are the ones who are going to opt out because they don’t think there are viable prospects. Their opting out would mean that we lose the opportunity to have the kind of technology that’s high-impact. Hypothetically, maybe you’re talking as a young woman who is worried about some of the ethical challenges raised by AI. Maybe they have the environment or electricity use in the back of their mind. They’re worried about using AI and making a mistake at their job. What do you tell them? They feel a sort of paralysis about it. I would say to lean into that a bit and know that that kind of paralysis or that concern is your superpower, because you are not going to use it willy-nilly. . . . It says something that a young person is thinking about the environmental impact and is a careful user, consumer, and decipherer of what they’re getting. Not every question is AI-worthy right now. I would tell that person not to judge themselves too harshly around their reticence, because at the end of the day, that reticence is actually discernment. That means that the way they’re going to leverage this tool is going to be thoughtful, and they should actually seek out people who are thoughtfully using it as well. Some of what we’re missing . . . is for young people to have mentors who are bringing them along and actually talking to them about good use cases for whatever element they’re interacting with, or what it can look like to have a game-changing outcome with AI. We also don’t want them to fully opt out. If you’re deciding not to use it right now, maybe that’s for a good reason. But keep your eyes and ears open because the opportunities are there. If we don’t have their voices, we’re in trouble. It sounds like the kind of conclusion that foregrounds your discernment in thinking about use cases. It’s not something that you’re holding back on for no reason. If your gut is telling you, “Hey, I’m concerned,” listen to that. That’s something that—and especially women—are bringing to the table with AI. . . . That will probably be the thing that saves us when we think about deployment.
Score: 28🌐 MovesMay 27, 2026https://www.fastcompany.com/91546658/girls-who-code-ceo-tarika-barrett-says-ai-skepticism-can-be-a-strength?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss
Kidoz Inc. Reports Q1 2026 Financial Results with YoY Revenue Growth, Accelerating Strategic Investment in AI
Kidoz Inc. Reports Q1 2026 Financial Results with YoY Revenue Growth, Accelerating Strategic Investment in AI USA Today
Score: 28🌐 MovesMay 27, 2026https://www.usatoday.com/press-release/story/33456/kidoz-inc-reports-q1-2026-financial-results-with-yoy-revenue-growth-accelerating-strategic-investment-in-ai/
Embodied or virtually represented: Navigating the embodiment debate in human-robot interaction
Science Robotics, Volume 11, Issue 114, May 2026.
Score: 28🌐 MovesMay 27, 2026https://www.science.org/doi/abs/10.1126/scirobotics.aed4569?af=R
Teamwork Graph & Rovo - power AI with your organization's connected work context
Teamwork Graph & Rovo - power AI with your organization's connected work context Atlassian Community
Score: 28🌐 MovesMay 27, 2026https://community.atlassian.com/forums/Enterprise-articles/Teamwork-Graph-amp-Rovo-power-AI-with-your-organization-s/ba-p/3238735
South Africa Has AI Leverage. Its Draft Policy Leaves It Unused
New AI panel has an opportunity to set a continent-wide standard
Score: 28🌐 MovesMay 27, 2026https://spectrum.ieee.org/south-africa-ai-policy
He filmed himself doing household tasks — for AI robots
NPR's Scott Detrow speaks with Reece Rogers of WIRED about a new wave of data collection marketplaces, where users can sell their videos of everyday tasks to AI developers.
Score: 27🌐 MovesMay 27, 2026https://www.npr.org/2026/05/27/nx-s1-5835190/he-filmed-himself-doing-household-tasks-for-ai-robots
Among Boston startups, it’s an AI-mad, mad, mad, mad world
Among Boston startups, it’s an AI-mad, mad, mad, mad world The Boston Globe
Score: 27🌐 MovesMay 27, 2026https://www.bostonglobe.com/2026/05/27/business/boston-startup-ai-tech-week/
Humanoid robots face off in Beijing’s first-ever high school football tournament
44 school teams competed over six weeks in China’s first high school humanoid robot football competition. The robots played fully autonomously, using student-written code to pass, defend and shoot without remote control.
Score: 27🌐 MovesMay 27, 2026http://www.euronews.com/next/2026/05/27/humanoid-robots-face-off-in-beijings-first-ever-high-school-football-tournament
Digital permitting, AI tools top of mind for NIBS panelists
During the National Institute of Building Sciences Building Innovation conference, speakers pointed to how digital tools can improve timelines and reviews.
Score: 27🌐 MovesMay 27, 2026https://www.constructiondive.com/news/nibs-digital-permitting-ai-tools/821260/
I Built a Multi-Agent Insurance Support Streamlit Chatbot on Databricks-Full Code Walkthrough
LangGraph + Claude Sonnet 4.6 + Vector Search + Lakebase PostgreSQL, from first cell to live endpoint Insurance customer support is one of those domains that sounds simple until you try to automate it properly. A customer asks about their billing-you need their policy number, the current pending bill, the premium frequency, and then a clean human-readable answer. Another customer asks what life insurance covers in general-no account lookup needed, just a well-grounded FAQ response. A third says they want to speak to a human. Different questions, different data sources, different handling logic, all landing in the same chat window. A single LLM chain with a big system prompt cannot handle this cleanly. You end up with prompt bloat, tool confusion, and an agent that either asks too many clarifying questions or retrieves data it doesn’t need. This is exactly where multi-agent architectures earn their keep-and exactly why I chose this problem for the Databricks Hackathon on building intelligent apps with Data + AI. This article is a complete code walkthrough. I’ll cover every section of the notebook-data layer, Vector Search setup, LLM client, all six agent nodes, LangGraph graph compilation, the deployment pipeline, the multimodal vision pipeline-and then the Streamlit chatbot app in detail. Relevant code snippets and flow diagrams are included throughout. Full code: GitHub - abhirup93/Databricks-Hackathon-Build-intelligent-Apps-with-Data-AI: Building Intelligent Apps with Data + AI Architecture High Level Architecture Diagram The system runs across six layers. The user talks to a Streamlit chatbot on Databricks Apps. Every message hits a Databricks Model Serving endpoint wrapping a self-contained MLflow PyFunc-InsuranceAgentModel. Inside, a Supervisor agent (Claude Sonnet 4.6) reads the full conversation history, extracts identifiers, and routes to one of five specialist agents. Each specialist has its own tools-SQL against Unity Catalog Delta tables, or a Vector Search semantic query against the FAQ index. A Final Answer Composer polishes the specialist response before it goes back to the user. Multi-turn memory is handled by Lakebase PostgreSQL via LangGraph’s PostgresSaver. End-to-End Process Flow End-to-End Process Flow Diagram The nine-step flow maps cleanly to the code: Steps 1–2 are the Streamlit app (app.py). Step 3 is InsuranceAgentModel.predict() on the serving endpoint. Steps 4–6 are the LangGraph agent nodes and their tool functions. Step 7 is the Unity Catalog data layer-Delta tables, Vector Search index, UC Volume, and the optional Lakebase PostgreSQL checkpoint. Steps 8–9 are the Final Answer Composer node and the response path back through the endpoint to Streamlit. The three lower panels map to: (A) the platform prerequisites in Cells 1–4; (B) the deployment pipeline in Cells D1–D10; © the synthetic data and HuggingFace FAQ dataset in Cells 5–12. End-to-end request flow: User types message └─► Streamlit App (Databricks Apps · OAuth via WorkspaceClient) └─► Model Serving Endpoint (MLflow PyFunc · InsuranceAgentModel) └─► predict() ├─► Step 1: Reconstruct history · Extract entities (POL/CUST/CLM regex) ├─► Step 2: Clarification bypass? ──► Direct Specialist ──► Final Answer └─► Step 3: Supervisor routing loop (max 5 iters) └─► Supervisor Agent (Claude Sonnet 4.6 · JSON routing) ├─► Policy Agent ──► get_policy_details / get_auto_policy_details ├─► Billing Agent ──► get_billing_info / get_payment_history ├─► Claims Agent ──► get_claim_status ├─► General Help ──► retrieve_faq → Vector Search ├─► Human Escalation ──► empathetic handoff → END └─► next_agent="end" ──► Final Answer Composer → END Section 1-Environment Setup (Cells 1-4) Cell 1: Authentication & Configuration from databricks.sdk import WorkspaceClient from databricks.vector_search.client import VectorSearchClient w = WorkspaceClient() _host = w.config.host DATABRICKS_HOST = _host if _host.startswith("https://") else f"https://{_host}" DATABRICKS_TOKEN = w.config.token CATALOG = " " SCHEMA = " " FULL_SCHEMA = f"{CATALOG}.{SCHEMA}" LLM_ENDPOINT = "databricks-claude-sonnet-4-6" EMBEDDING_ENDPOINT = "databricks-gte-large-en" vsc = VectorSearchClient(disable_notice=True) VS_ENDPOINT_NAME = " " VS_INDEX_NAME = f"{FULL_SCHEMA}. " VS_SOURCE_TABLE = f"{FULL_SCHEMA}. " VOLUME_NAME = " " VOLUME_PATH = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME_NAME}" HF_CACHE = f"{VOLUME_PATH}/hf_cache" SECRETS_SCOPE = " " WorkspaceClient() picks up host and token from the cluster environment-no credentials hardcoded. The host is normalised to always have https:// because downstream REST API calls fail silently on malformed URLs. The UC Volume path is defined early because it doubles as the HuggingFace download cache, making dataset downloads persistent across cluster restarts. Cell 3: Discover Lakebase & Store Credentials in Secrets Rather than hardcoding the PostgreSQL host, the SDK’s w.postgres APIs dynamically discover the endpoint: projects = list(w.postgres.list_projects()) project = next((p for p in projects if LAKEBASE_INSTANCE_NAME in p.name), projects[0]) branches = list(w.postgres.list_branches(parent=project.name)) endpoints = list(w.postgres.list_endpoints(parent=branches[0].name)) endpoint = endpoints[0] pg_host = endpoint.status.hosts.host pg_endpoint = endpoint.name pg_username = w.current_user.me().user_name # Clean-slate secrets — delete scope if it exists, recreate fresh try: for s in w.secrets.list_secrets(scope=SECRETS_SCOPE): w.secrets.delete_secret(scope=SECRETS_SCOPE, key=s.key) w.secrets.delete_scope(scope=SECRETS_SCOPE) except Exception: pass w.secrets.create_scope(scope=SECRETS_SCOPE) # Create a 1-year PAT for Lakebase auth token_resp = w.tokens.create(comment="Insurance Support Lakebase Auth", lifetime_seconds=60*60*24*365) pg_password = token_resp.token_value for key, value in {"pg_host": pg_host, "pg_endpoint": pg_endpoint, "pg_dbname": "databricks_postgres", "pg_username": pg_username, "pg_password": pg_password}.items(): w.secrets.put_secret(scope=SECRETS_SCOPE, key=key, string_value=value) del pg_password, token_resp # clear from memory immediately The “clean slate” approach-deleting and recreating the scope-prevents stale credentials from causing confusion on repeated runs. del pg_password, token_resp immediately after storage is a simple security practice: don’t leave credentials in notebook memory longer than necessary. Cell 4: Connect to Lakebase + Initialise PostgresSaver from langgraph.checkpoint.postgres import PostgresSaver import psycopg from urllib.parse import quote PG_HOST = dbutils.secrets.get(scope=SECRETS_SCOPE, key="pg_host") PG_ENDPOINT = dbutils.secrets.get(scope=SECRETS_SCOPE, key="pg_endpoint") PG_DBNAME = dbutils.secrets.get(scope=SECRETS_SCOPE, key="pg_dbname") PG_USERNAME = dbutils.secrets.get(scope=SECRETS_SCOPE, key="pg_username") cred = w.postgres.generate_database_credential(endpoint=PG_ENDPOINT) DB_URI = ( f"postgresql://{quote(PG_USERNAME, safe='')}:{quote(cred.token, safe='')}" f"@{PG_HOST}:5432/{PG_DBNAME}?sslmode=require" ) # Step 1: Create checkpoint tables (must use context manager) with PostgresSaver.from_conn_string(DB_URI) as tmp: tmp.setup() # Step 2: Persistent connection for LangGraph pg_conn = psycopg.connect(DB_URI, autocommit=True) checkpointer = PostgresSaver(pg_conn) Two-step pattern that catches people out: from_conn_string(…).setup() in a context manager creates the three checkpoint tables (checkpoints, checkpoint_blobs, checkpoint_writes). Without this, the persistent connection will fail immediately with a “relation does not exist” Postgres error. The URL-encoded username and token handle the @ in email addresses that would otherwise break the connection string parser. OAuth credentials are generated fresh per session because they’re short-lived. Section 2-Data Layer (Cells 5-7) Cell 5: Synthetic Data Generation Six tables are generated in memory with fixed random seeds: customers = pd.DataFrame({ "customer_id": [f"CUST{str(i).zfill(5)}" for i in range(1, 1001)], ... }) policies = pd.DataFrame({ "policy_number": [f"POL{str(i).zfill(6)}" for i in range(1, 1501)], "policy_type": [random.choice(["auto", "home", "life"]) for _ in range(1500)], ... }) billing = pd.DataFrame({ "bill_id": [f"BILL{str(i).zfill(6)}" for i in range(1, 5001)], "policy_number": [random.choice(policies["policy_number"]) for _ in range(5000)], "status": [random.choice(["paid", "pending", "overdue"]) for _ in range(5000)], ... }) Billing records are randomly assigned to any policy-meaning some policies will have multiple records and some will have none. This random assignment is the root cause of the INNER JOIN bug discovered during testing (more on this in the tools section). Cell 6: Write to Delta def write_delta_table(df_pandas, table_name, mode="overwrite"): full_table = f"{FULL_SCHEMA}.{table_name}" (spark.createDataFrame(df_pandas) .write.format("delta") .mode(mode) .option("overwriteSchema", "true") .saveAsTable(full_table)) for table_name, df in sample_data.items(): write_delta_table(df, table_name) overwriteSchema=true replaces the schema on each run-useful during development when iterating on the data structure. Section 3-Vector Search & FAQ Knowledge Base (Cells 8-14) Cell 8: Download & Prepare FAQ Dataset from huggingface_hub import snapshot_download local_path = snapshot_download( repo_id="deccan-ai/insuranceQA-v2", repo_type="dataset", local_dir=HF_CACHE, local_dir_use_symlinks=False, ) df_faq = pd.concat([pd.read_parquet(f) for f in parquet_files], ignore_index=True) df_faq = df_faq.rename(columns={"input": "question", "output": "answer"}) df_faq["combined"] = "Question: " + df_faq["question"] + " \nAnswer: " + df_faq["answer"] df_faq = df_faq.sample(500, random_state=42).reset_index(drop=True) df_faq.insert(0, "id", range(1, len(df_faq) + 1)) Writing to the UC Volume (HF_CACHE) makes the download persistent across cluster restarts. The combined field concatenates question and answer-this is what GTE Large embeds for the Vector Search index. An integer id column is inserted as a primary key because Vector Search requires one for managed Delta Sync indexes. Cells 9-12: Delta Table->Vector Search Index # Cell 9: Write to Delta with Change Data Feed enabled (spark.createDataFrame(df_faq) .write.format("delta").mode("overwrite") .option("delta.enableChangeDataFeed", "true") .saveAsTable(FAQ_TABLE)) # Cell 11: Create VS endpoint vsc.create_endpoint_and_wait(name=VS_ENDPOINT_NAME, endpoint_type="STANDARD") # Cell 12: Create Delta Sync index vsc.create_delta_sync_index_and_wait( endpoint_name=VS_ENDPOINT_NAME, index_name=VS_INDEX_NAME, source_table_name=VS_SOURCE_TABLE, pipeline_type="TRIGGERED", primary_key="id", embedding_source_column="combined", embedding_model_endpoint_name=EMBEDDING_ENDPOINT, ) CDF (delta.enableChangeDataFeed) is a hard requirement for Delta Sync indexes-without it Vector Search cannot track incremental changes. Databricks fires the initial embedding sync automatically on index creation, so a manual idx.sync() call is redundant at first setup. Cell 14-VS Smoke Test: idx = vsc.get_index(endpoint_name=VS_ENDPOINT_NAME, index_name=VS_INDEX_NAME) test_response = idx.similarity_search( query_text="What does life insurance cover?", columns=["id", "question", "answer"], num_results=3, ) col_names = [c["name"] for c in test_response["manifest"]["columns"]] rows = test_response["result"]["data_array"] # column names come from manifest, data from result — zip them together The column names come from manifest, not from result-zip them with each data array row to build dicts. If this returns three populated records, the embedding pipeline is working end to end. Section 4-LLM Client, Tool Functions & Prompts (Cells 15-18) Cell 15: LLM Client from mlflow.deployments import get_deploy_client deploy_client = get_deploy_client("databricks") _test = deploy_client.predict( endpoint=LLM_ENDPOINT, inputs={"messages": [{"role": "user", "content": "Say OK in 2 words."}], "max_tokens": 10}, ) Cell 16: Two-Pass Tool Calling-run_llm() def run_llm(prompt, tools=None, tool_functions=None, model=LLM_ENDPOINT): inputs = { "messages": [ {"role": "system", "content": prompt}, {"role": "user", "content": "Please process the above instructions and respond."}, ], "max_tokens": 2048, } if tools: inputs["tools"] = tools inputs["tool_choice"] = "auto" # ── Pass 1 ──────────────────────────────────────────────────── response = deploy_client.predict(endpoint=model, inputs=inputs) message = response["choices"][0]["message"] if not message.get("tool_calls"): return message.get("content") or "" # no tools called → return immediately # ── Execute tools locally ───────────────────────────────────── tool_messages = [] for tc in message["tool_calls"]: func_name = tc["function"]["name"] args = json.loads(tc["function"].get("arguments") or "{}") tool_fn = tool_functions.get(func_name) try: result = tool_fn(**args) if tool_fn else {"error": f"Tool '{func_name}' not found"} except Exception as e: result = {"error": str(e)} tool_messages.append({"role": "tool", "tool_call_id": tc["id"], "content": json.dumps(result)}) # ── Pass 2: send tool results back ──────────────────────────── followup = [ {"role": "system", "content": prompt}, {"role": "user", "content": "Please process the above instructions and use the available tools."}, {"role": "assistant", "content": message.get("content"), "tool_calls": message["tool_calls"]}, *tool_messages, ] final = deploy_client.predict(endpoint=model, inputs={"messages": followup, "max_tokens": 2048}) return final["choices"][0]["message"].get("content") or "" Two-pass flow: Pass 1 prompt + tools ──► Claude ──► tool_calls? ──No──► return content │ Yes ▼ Execute each tool locally │ Pass 2 prompt + tool results ──► Claude ──► final natural language answer Every tool function returns {“error”: str(e)} on exception-a consistent contract that lets Claude give a graceful response rather than crashing the agent. Cell 17: Tool Functions (Spark DataFrame API) def get_policy_details(policy_number: str) -> Dict[str, Any]: df = (spark.table(f"{FULL_SCHEMA}.policies") .join(spark.table(f"{FULL_SCHEMA}.customers"), on="customer_id", how="inner") .filter(F.col("policy_number") == policy_number)) row = df.first() return row.asDict() if row else {"error": f"Policy {policy_number} not found"} def get_billing_info(policy_number=None, customer_id=None) -> Dict[str, Any]: billing_df = spark.table(f"{FULL_SCHEMA}.billing") policies_df = spark.table(f"{FULL_SCHEMA}.policies") joined = billing_df.join(policies_df, on="policy_number", how="inner") # ⚠️ INNER JOIN bug if policy_number: filtered = joined.filter((F.col("billing.policy_number") == policy_number) & (F.col("billing.status") == "pending")) ... row = filtered.orderBy(F.col("due_date").desc()).first() return row.asDict() if row else {"error": "No pending billing information found"} def retrieve_faq(query_text: str, num_results: int = 3) -> List[Dict]: idx = vsc.get_index(endpoint_name=VS_ENDPOINT_NAME, index_name=VS_INDEX_NAME) response = idx.similarity_search(query_text=query_text, columns=["id", "question", "answer"], num_results=num_results) col_names = [c["name"] for c in response["manifest"]["columns"]] return [dict(zip(col_names, row)) for row in (response["result"].get("data_array") or [])] ⚠️ The INNER JOIN bug: get_billing_info() joins billing and policies with how=”inner” and filters for status=’pending’. If a policy has no pending billing records, the join returns empty-the tool returns {“error”: “No pending billing information found”} and Claude gives the user a polite apology. The deployed model fixes this with a LEFT JOIN so premium_amount is always returned from the policies table. The notebook and deployed model are inconsistent on this point-a known limitation documented in the debug section. Cell 18: Prompt Templates The Supervisor prompt is the most critical. Key excerpts: You are the SUPERVISOR AGENT managing a team of insurance support specialists. CRITICAL RULES: - If policy number is already available, DO NOT ask for it again. - If customer ID is already available, DO NOT ask for it again. - Only use ask_user tool if ESSENTIAL information is missing. SPECIALIST AGENTS: - policy_agent → policy details, coverage, endorsements, auto policy specifics - billing_agent → billing, payments, premium questions - claims_agent → claim filing, tracking, settlements - general_help_agent → general insurance questions (no policy number needed) - human_escalation_agent → complex or sensitive cases Respond ONLY in JSON: { "next_agent": " ", "task": " ", "justification": " " } Routing logic lives in the prompt, not in hard-coded keyword matching. The supervisor makes decisions based on the full conversation-ambiguous queries like “my car is making a noise, will my insurance cover it?” get routed to general_help_agent for a FAQ search, not misrouted to claims_agent. Section 5-LangGraph Agent System (Cells 19-24) Cell 19: GraphState from langgraph.graph import StateGraph, END, add_messages from typing import TypedDict, Annotated, Optional class GraphState(TypedDict): # ── LangGraph messages accumulator ─────────────────────────── messages: Annotated[List[Any], add_messages] # ── Context (persists across turns via Lakebase checkpoint) ── user_input: str conversation_history: Optional[str] policy_number: Optional[str] customer_id: Optional[str] claim_id: Optional[str] # ── Routing (reset each turn) ───────────────────────────────── next_agent: Optional[str] task: Optional[str] n_iteration: Optional[int] end_conversation: Optional[bool] requires_human_escalation: bool # ── Clarification ───────────────────────────────────────────── needs_clarification: Optional[bool] clarification_question: Optional[str] user_clarification: Optional[str] # ── Billing-specific ────────────────────────────────────────── billing_amount: Optional[float] payment_method: Optional[str] billing_frequency: Optional[str] final_answer: Optional[str] add_messages on the messages field means LangGraph appends rather than replaces on each state update. Context fields (policy_number, customer_id, conversation_history) are never reset between turns-they persist via Lakebase. Routing fields are reset at the start of each run_query() call. Cell 20: Tool Schemas (OpenAI Function-Calling Format) SUPERVISOR_TOOLS = [{ "type": "function", "function": { "name": "ask_user", "description": "Ask the user for essential missing information", "parameters": { "type": "object", "properties": { "question": {"type": "string"}, "missing_info": {"type": "string"}, }, "required": ["question", "missing_info"], }, }, }] BILLING_TOOLS = [ {"type": "function", "function": { "name": "get_billing_info", "description": "Retrieve current billing information including balance and due date", "parameters": {"type": "object", "properties": { "policy_number": {"type": "string"}, "customer_id": {"type": "string"}, }}, }}, {"type": "function", "function": { "name": "get_payment_history", "description": "Fetch the most recent payment records for a policy", "parameters": {"type": "object", "properties": {"policy_number": {"type": "string"}}, "required": ["policy_number"]}, }}, ] SUPERVISOR_TOOLS has only ask_user-the supervisor’s only action is clarification. All retrieval tools belong to specialist agents. The General Help agent has no tool schema at all-its retrieval is done inline before the run_llm() call. Cell 21: Agent Node Implementations Supervisor node- the core of the system: def supervisor_agent(state: GraphState) -> GraphState: n_iter = (state.get("n_iteration") or 0) + 1 state = {**state, "n_iteration": n_iter} # ── Force escalation at max iterations ──────────────────────── if n_iter >= 6: return {**state, "requires_human_escalation": True, "next_agent": "human_escalation_agent"} # ── Process clarification response ──────────────────────────── if state.get("needs_clarification"): updated_conv = (state.get("conversation_history") or "") + \ f"\nAssistant: {state['clarification_question']}\nUser: {state['user_clarification']}" state = {**state, "needs_clarification": False, "conversation_history": updated_conv, "clarification_question": None} # ── Routing decision ────────────────────────────────────────── response = deploy_client.predict(endpoint=LLM_ENDPOINT, inputs={ "messages": [ {"role": "system", "content": SUPERVISOR_PROMPT.format( conversation_history=state.get("conversation_history"))}, {"role": "user", "content": "Analyze and decide the next action."}, ], "tools": SUPERVISOR_TOOLS, "tool_choice": "auto", "max_tokens": 1024, }) message = response["choices"][0]["message"] # ── Handle ask_user tool call ────────────────────────────────── if message.get("tool_calls"): for tc in message["tool_calls"]: if tc["function"]["name"] == "ask_user": args = json.loads(tc["function"].get("arguments") or "{}") question = args.get("question", "Can you provide more details?") user_resp_data = ask_user(question, args.get("missing_info", "")) return {**state, "needs_clarification": True, "clarification_question": question, "user_clarification": user_resp_data["context"]} # ── Parse JSON routing decision ─────────────────────────────── raw = re.sub(r"^```(?:json)?\s*\n?", "", (message.get("content") or "{}").strip()) raw = re.sub(r"\n?```\s*$", "", raw) parsed = json.loads(raw) return {**state, "next_agent": parsed.get("next_agent", "general_help_agent"), "task": parsed.get("task", ""), "justification": parsed.get("justification", "")} Supervisor execution flow: supervisor_agent() │ ├─► n_iteration >= 6? ──Yes──► force human_escalation_agent │ ├─► needs_clarification? ──Yes──► fold user answer into history → continue │ ├─► Call Claude with SUPERVISOR_TOOLS │ │ │ ├─► tool_calls? (ask_user) ──► call ask_user() ──► set needs_clarification=True ──► return │ │ │ └─► JSON response ──► strip fences ──► parse ──► set next_agent, task, justification │ └─► return updated state Policy agent node (all specialist nodes follow this pattern): def policy_agent_node(state: GraphState) -> GraphState: prompt = POLICY_AGENT_PROMPT.format( task=state.get("task"), policy_number=state.get("policy_number") or "Not provided", customer_id=state.get("customer_id") or "Not provided", conversation_history=state.get("conversation_history") or "", ) result = run_llm(prompt, tools=POLICY_TOOLS, tool_functions={ "get_policy_details": get_policy_details, "get_auto_policy_details": get_auto_policy_details, }) current_history = state.get("conversation_history") or "" return {**state, "messages": [("assistant", result)], "conversation_history": current_history + f"\nPolicy Agent: {result}"} General Help agent- uses inline RAG, not tool calling: def general_help_agent_node(state: GraphState) -> GraphState: # Retrieve FAQs BEFORE building the prompt faq_results = retrieve_faq(query_text=state.get("user_input") or "", num_results=3) faq_context = "" for i, item in enumerate(faq_results, 1): faq_context += f"FAQ {i}:\nQ: {item.get('question','')}\nA: {item.get('answer','')}\n\n" if not faq_context: faq_context = "No relevant FAQs were found in the knowledge base." prompt = GENERAL_HELP_PROMPT.format( task=state.get("task") or "General insurance support", conversation_history=state.get("conversation_history") or "", faq_context=faq_context, ) result = run_llm(prompt) # no tools — context already injected into prompt ... Cell 22: Routing Function def decide_next_agent(state: GraphState) -> str: if state.get("needs_clarification"): return "supervisor_agent" if state.get("end_conversation"): return "end" if state.get("requires_human_escalation"):return "human_escalation_agent" return state.get("next_agent") or "general_help_agent" Four checks, priority order. Clarification loops back to the supervisor. End routes to the LangGraph END sentinel. Escalation bypasses the final answer agent entirely. Cell 23: Build & Compile the Graph workflow = StateGraph(GraphState) # Register all nodes workflow.add_node("supervisor_agent", supervisor_agent) workflow.add_node("policy_agent", policy_agent_node) workflow.add_node("billing_agent", billing_agent_node) workflow.add_node("claims_agent", claims_agent_node) workflow.add_node("general_help_agent", general_help_agent_node) workflow.add_node("human_escalation_agent", human_escalation_node) workflow.add_node("final_answer_agent", final_answer_agent) workflow.set_entry_point("supervisor_agent") # Supervisor conditional routing workflow.add_conditional_edges("supervisor_agent", decide_next_agent, { "supervisor_agent": "supervisor_agent", # clarification self-loop "policy_agent": "policy_agent", "billing_agent": "billing_agent", "claims_agent": "claims_agent", "general_help_agent": "general_help_agent", "human_escalation_agent": "human_escalation_agent", "end": "final_answer_agent", }) # Back-edges: all specialists return to supervisor for re-evaluation for specialist in ["policy_agent", "billing_agent", "claims_agent", "general_help_agent"]: workflow.add_edge(specialist, "supervisor_agent") # Terminal edges workflow.add_edge("final_answer_agent", END) workflow.add_edge("human_escalation_agent", END) # Compile with Lakebase checkpointer app = workflow.compile(checkpointer=checkpointer) LangGraph topology: ┌─────────────────────────────────┐ │ supervisor_agent │◄──────────────────────┐ │ (entry point · conditional out) │◄──────────────────┐ │ └──────────────┬──────────────────-─┘ │ │ ┌──────────────────┼─────────────────────────────────┐ │ │ ▼ ▼ ▼ ▼ │ │ policy_agent billing_agent claims_agent general_help_agent│ │ │ │ │ │ │ │ └──────────────────┴──────────────┴───────────────────┘ back-edges │ │ supervisor ─► "end" ──► final_answer_agent ──► END │ │ supervisor ──────────► human_escalation_agent ──► END │ │ supervisor ─────────────────────────────────────────► self (clarification) Section 6-Testing (Cells 25-32) Cell 25: Test Runner-run_query() _TURN_RESET_FIELDS = { "n_iteration": 0, "end_conversation": False, "final_answer": "", "needs_clarification": False, "clarification_question": None, "user_clarification": None, "next_agent": "supervisor_agent", "requires_human_escalation": False, "messages": [], } def run_query(query: str, thread_id: str = None) -> tuple: config = {"configurable": {"thread_id": thread_id or str(uuid.uuid4())}} thread_id = config["configurable"]["thread_id"] snapshot = app.get_state(config) if snapshot.values: # Follow-up turn: resume checkpoint, reset only operational fields prior_history = snapshot.values.get("conversation_history", "") state_input = { **snapshot.values, # ← full checkpoint (policy_number etc preserved) **_TURN_RESET_FIELDS, # ← reset iteration counter, flags, final_answer "user_input": query, "conversation_history": prior_history + f"\nUser: {query}", } else: # First turn: blank state state_input = {**_BLANK_STATE, "user_input": query, "conversation_history": f"User: {query}"} final_state = app.invoke(state_input, config=config) print(final_state.get("final_answer") or "No final answer generated.") return final_state, thread_id Multi-turn memory flow: Turn 1 (thread_id=T1) snapshot.values → empty → blank state user asks: "What is my auto insurance premium?" supervisor → ask_user → user provides POL000066 billing_agent retrieves data → final_answer → Lakebase checkpoint saved Turn 2 (thread_id=T1, same session) snapshot.values → loaded from Lakebase policy_number = "POL000066" already in state ← no re-asking user asks: "What about my payment history?" supervisor → billing_agent (knows policy_number) → final_answer Test scenarios covered: | Cell | Query | Expected flow | |---|---|---| | 26 | "What is the premium of my auto insurance policy?" | Supervisor → ask_user → billing_agent → final_answer | | 27 | "In general, what does life insurance cover?" | Supervisor → general_help_agent (VS RAG) → final_answer | | 28 | "I want to speak to a human executive." | Supervisor → human_escalation_agent → END | | 29 | "What type of policy do I have and when does it expire?" | Supervisor → ask_user → policy_agent → final_answer | | 30 | "What is the premium for that policy?" (thread4 resume) | Checkpoint loaded → billing_agent (no re-ask) → final_answer | | 32 | "What is the status of my recent claim?" | Supervisor → ask_user → claims_agent → final_answer | Section 7-Vision Pipeline (Cells V1-V8) This is independent of the LangGraph graph-a sequential five-step multimodal claim processing pipeline using Claude Sonnet 4.6 vision. Sample images are in Sample Images for Claim Processing/. Pipeline flow: Car damage image ──► V3: Damage extraction JSON DL image ──► V4: OCR extraction JSON Claim form image ──► V5: Form extraction JSON (22 fields) │ V6: Cross-document consistency checks ├─► vehicle details: image vs form (fuzzy match) ├─► DL details: license vs form └─► policy_end_date >= incident_date? │ all_passed? ├── No ──► Claim Rejected └── Yes ──► V7: Vector Search coverage lookup └─► Route to Claim Handler Cell V2: Vision Helpers def encode_image(path: str) -> tuple: suffix = Path(path).suffix.lower() media_type = {".png": "image/png", ".jpg": "image/jpeg"}.get(suffix, "image/png") with open(path, "rb") as f: b64 = base64.standard_b64encode(f.read()).decode("utf-8") return b64, media_type def vision_call(image_path: str, system_prompt: str, user_prompt: str) -> str: b64_data, media_type = encode_image(image_path) data_url = f"data:{media_type};base64,{b64_data}" # OpenAI-compatible data URL response = deploy_client.predict(endpoint=LLM_ENDPOINT, inputs={ "messages": [ {"role": "system", "content": system_prompt}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": data_url}}, {"type": "text", "text": user_prompt}, ]}, ], "max_tokens": 1024, }) return response["choices"][0]["message"]["content"] def parse_llm_json(raw: str) -> dict: cleaned = re.sub(r"^```(?:json)?\s*\n?", "", raw.strip()) cleaned = re.sub(r"\n?```\s*$", "", cleaned) return json.loads(cleaned) Cell V3: Car Damage Extraction SYSTEM_CAR = """You are a vehicle damage assessment specialist. Analyze the car image and extract structured information. Respond ONLY with a valid JSON object — no markdown, no explanation.""" USER_CAR = """Return a JSON with these exact fields: { "vehicle_make": "...", "vehicle_model": "...", "vehicle_color": "...", "damage_location": "...", "damage_severity": "minor/moderate/severe", "damage_description": "...", "additional_observations": "..." }""" car_data = parse_llm_json(vision_call(CAR_IMAGE_PATH, SYSTEM_CAR, USER_CAR)) Cell V6: Consistency Checks def fuzzy_match(a: str, b: str) -> bool: clean = lambda s: re.sub(r"[^a-z0-9]", "", s.lower()) a_c, b_c = clean(a), clean(b) return a_c == b_c or a_c in b_c or b_c in a_c # Check 1: Car image vs Claim form for field, (val_a, val_b) in { "vehicle_color": (car_data["vehicle_color"], form_data["vehicle_color"]), "vehicle_make": (car_data["vehicle_make"], form_data["vehicle_make"]), }.items(): results[field] = fuzzy_match(val_a, val_b) # Check 3: Policy validity at incident date def parse_date_flexible(date_str): formats = ["%d-%b-%Y", "%Y-%m-%d", "%B %d, %Y", "%d/%m/%Y", ...] for fmt in formats: try: return datetime.strptime(date_str.strip(), fmt) except ValueError: continue try: from dateutil import parser as dp return dp.parse(date_str.strip()) except: return None policy_end = parse_date_flexible(form_data.get("policy_end_date", "")) incident_dt = parse_date_flexible(form_data.get("incident_date", "")) results["policy_valid_at_incident"] = policy_end >= incident_dt if (policy_end and incident_dt) else False fuzzy_match normalises both strings by lowercasing and stripping all non-alphanumeric characters-handling differences like “Toyota” vs “TOYOTA” or “Camry” vs “CAMRY”. The flexible date parser tries ten strptime formats before falling back to dateutil.parser.parse, handling the variety of date representations Claude returns across different document types. Section 8-Deployment (Cells D1-D10) Cell D2: Dual Execution Context class AgentClarificationNeeded(Exception): def __init__(self, question: str): self.question = question super().__init__(question) def ask_user(question: str, missing_info: str = "") -> Dict[str, Any]: if os.environ.get("INSURANCE_AGENT_DEPLOYED") == "true": raise AgentClarificationNeeded(question) # deployed: signal via exception # notebook mode: call input() exactly as before answer = input(f"{question}: ") return {"context": answer, "source": "User Input"} One function, two behaviours, controlled by a single environment variable. The same supervisor_agent node works correctly in both execution contexts without modification. Cell D4: InsuranceAgentModel-The Self-Contained PyFunc The deployed model is written as a standalone Python file and syntax-checked before logging: AGENT_FILE = "/tmp/insurance_agent_model.py" agent_code = '''...full class definition as a string...''' with open(AGENT_FILE, "w") as f: f.write(agent_code) # Syntax verify before logging to MLflow with open(AGENT_FILE) as f: ast.parse(f.read()) # raises SyntaxError immediately if code is broken _sql_exec(): SQL REST API with cold-start polling: def _sql_exec(self, query: str, params: tuple = ()) -> List[Dict]: # Substitute %s placeholders manually (no bind params in REST API) bound = query for p in params: safe = str(p).replace("'", "''") bound = bound.replace("%s", "'" + safe + "'", 1) # Submit with short initial wait, CONTINUE on timeout resp = requests.post( f"https://{self._db_host}/api/2.0/sql/statements", headers={"Authorization": f"Bearer {self._db_token}"}, json={"statement": bound, "warehouse_id": self._wh_id, "wait_timeout": "10s", "on_wait_timeout": "CONTINUE", "disposition": "INLINE", "format": "JSON_ARRAY"}, timeout=15, ) data = resp.json() state = data.get("status", {}).get("state", "UNKNOWN") statement_id = data.get("statement_id", "") # Poll until terminal — handles warehouse cold start (30s+) poll_count = 0 while state in ("PENDING", "RUNNING") and poll_count < 10: time.sleep(6) poll_resp = requests.get( f"https://{self._db_host}/api/2.0/sql/statements/{statement_id}", headers={"Authorization": f"Bearer {self._db_token}"}, timeout=15) data = poll_resp.json() state = data.get("status", {}).get("state", "UNKNOWN") poll_count += 1 if state != "SUCCEEDED": raise Exception(f"SQL state={state} error={data.get('status',{}).get('error',{})}") rows = (data.get("result") or {}).get("data_array") or [] cols = [c["name"] for c in (data.get("manifest") or {}).get("schema", {}).get("columns", [])] return [dict(zip(cols, row)) for row in rows] on_wait_timeout=CONTINUE is critical. A blocking wait_timeout=30s would cause a CANCEL status when the SQL warehouse is cold (30+ second start time). The polling loop retries every 6 seconds for up to 10 rounds. _get_billing_info(): LEFT JOIN fix in the deployed model: def _get_billing_info(self, policy_number=None, customer_id=None): where_col = "p.policy_number" if policy_number else "p.customer_id" param = (policy_number or customer_id,) # LEFT JOIN — premium_amount always returned even with no pending bills rows = self._sql_exec( "SELECT p.policy_number, p.premium_amount, p.billing_frequency, " "p.status AS policy_status, " "b.bill_id, b.due_date, b.amount_due, b.status AS billing_status " "FROM " + FULL_SCHEMA + ".policies p " "LEFT JOIN " + FULL_SCHEMA + ".billing b " "ON p.policy_number = b.policy_number AND b.status = 'pending' " "WHERE " + where_col + " = %s ORDER BY b.due_date DESC LIMIT 1", param, ) return rows[0] if rows else {"error": "Policy not found"} predict(): three-step execution: def predict(self, context, model_input, params=None): messages = model_input.get("messages", []) # Step 1: Reconstruct state from message list conversation_history = self._reconstruct_history(messages) user_query = next((m["content"] for m in reversed(messages) if m["role"]=="user"), "") policy_number = self._extract_entity(conversation_history, "POL[0-9]{6}") customer_id = self._extract_entity(conversation_history, "CUST[0-9]{5}") claim_id = self._extract_entity(conversation_history, "CLM[0-9]{6}") # Step 2: Clarification bypass if len(messages) >= 3: last_msg = messages[-1] prev_msg = messages[-2] raw_id = last_msg.get("content", "").strip() is_bare_id = bool(re.match(r"^(POL[0-9]{6}|CUST[0-9]{5}|CLM[0-9]{6})$", raw_id, re.IGNORECASE)) is_clarification = prev_msg.get("role") == "assistant" and any( kw in prev_msg.get("content","").lower() for kw in ["policy number","customer id","provide","please","share"]) if is_bare_id and is_clarification: # Classify original query → direct to specialist (no supervisor round-trip) original_q = next((m["content"] for m in messages if m["role"]=="user"), "") if any(kw in original_q.lower() for kw in ["premium","billing","payment"]): direct_agent = "billing_agent" elif any(kw in original_q.lower() for kw in ["claim","status","accident"]): direct_agent = "claims_agent" else: direct_agent = "policy_agent" # SQL connectivity test BEFORE LLM call try: self._sql_exec("SELECT 1 AS diag_test", ()) except Exception as sql_ex: return {"choices": [{"message": {"role": "assistant", "content": "DIAG_SQL_ERROR: " + str(sql_ex)[:400]}}]} specialist_response = self._run_specialist( direct_agent, f"Retrieve info for {raw_id}", conversation_history, policy_number, customer_id, claim_id) final_answer = self._generate_final_answer(user_query, specialist_response) return {"choices": [{"message": {"role": "assistant", "content": final_answer}}]} # Step 3: Normal supervisor routing loop for iteration in range(MAX_ROUTING_ITERS): routing = self._run_supervisor(conversation_history) next_agent = routing.get("next_agent", "general_help_agent") task = routing.get("task", "Assist the user.") if next_agent == "end": break specialist_response = self._run_specialist(next_agent, task, conversation_history, policy_number, customer_id, claim_id) conversation_history += f"\n{next_agent}: {specialist_response}" if next_agent == "human_escalation_agent": return {"choices": [{"message": {"role": "assistant", "content": specialist_response}}]} final_answer = self._generate_final_answer(user_query, specialist_response) return {"choices": [{"message": {"role": "assistant", "content": final_answer}}]} predict() flow: predict(messages) │ ├─► Step 1: reconstruct history · regex extract POL/CUST/CLM │ ├─► Step 2: last msg = bare ID + prev msg = clarification? │ YES ──► SQL SELECT 1 test ──► direct specialist ──► final answer ──► return │ NO ──► continue │ └─► Step 3: routing loop (max 5) ├─► _run_supervisor() → JSON routing │ ├─► next_agent="end" ──► break │ └─► specialist name ──► _run_specialist() │ └─► next_agent="human_escalation_agent" ──► return immediately └─► _generate_final_answer() ──► return The SELECT 1 connectivity test before the specialist call is important-without it, a token or warehouse ID problem would propagate through the two-pass run_llm() call, get swallowed as {“error”: str(e)} by the tool error handler, and come back as a polite “sorry, I couldn’t retrieve that” from Claude. The raw DIAG_SQL_ERROR: prefix in the response makes the failure visible. Cells D5-D7: Log, Register & Deploy # Cell D5: Log model using code-based approach with mlflow.start_run(run_name=" ") as run: model_info = mlflow.pyfunc.log_model( artifact_path="agent", python_model=AGENT_FILE, # filepath, not an instance signature=signature, registered_model_name=UC_MODEL_NAME, pip_requirements=[ # no LangGraph, no psycopg — not needed at serving "databricks-sdk>=0.89.0", "databricks-vectorsearch", "databricks-agents", "mlflow", "requests", "openai==1.82.0", ], ) # Cell D6: Get latest registered version client = mlflow.tracking.MlflowClient() versions = client.search_model_versions(f"name='{UC_MODEL_NAME}'") latest = sorted(versions, key=lambda v: int(v.version), reverse=True)[0] MODEL_VERSION = int(latest.version) # Cell D7: Deploy with environment variables from databricks import agents sql_pat = dbutils.secrets.get(scope=SECRETS_SCOPE, key="pg_password") deployment = agents.deploy( model_name=UC_MODEL_NAME, model_version=MODEL_VERSION, environment_vars={ "INSURANCE_AGENT_DEPLOYED": "true", "DATABRICKS_HOST": DATABRICKS_HOST, "SQL_WAREHOUSE_HTTP_PATH": SQL_WH_HTTP_PATH, "SQL_PAT": sql_pat, # long-lived PAT — short-lived OAuth rejected by SQL REST API }, ) Why SQL_PAT and not DATABRICKS_TOKEN? Model Serving auto-injects DATABRICKS_TOKEN-a short-lived OAuth credential. It works for workspace API calls but the SQL Statement Execution REST API rejects it. A long-lived PAT stored in Secrets and injected via environment_vars is the reliable path. Cell D8-Version verification gotcha: # DO NOT use this to verify MLflow version: ep.config.served_entities[0].entity_version # internal slot counter, always starts at 1 # USE this instead: registry_client = mlflow.tracking.MlflowClient() versions = registry_client.search_model_versions(f"name='{UC_MODEL_NAME}'") latest = sorted(versions, key=lambda v: int(v.version), reverse=True)[0] # latest.version is the real MLflow registry version Cell D10: Multi-Turn Smoke Test # Turn 1 turn1_result = deploy_client.predict(endpoint=endpoint_name, inputs={ "messages": [{"role": "user", "content": "What is the premium of my auto insurance policy?"}] }) turn1_reply = turn1_result["choices"][0]["message"]["content"] # Validate: response must contain clarification keywords turn1_pass = any(kw in turn1_reply.lower() for kw in ["policy number", "customer id", "provide", "please"]) # Turn 2 turn2_result = deploy_client.predict(endpoint=endpoint_name, inputs={ "messages": [ {"role": "user", "content": "What is the premium of my auto insurance policy?"}, {"role": "assistant", "content": turn1_reply}, {"role": "user", "content": "POL000066"}, ] }) turn2_reply = turn2_result["choices"][0]["message"]["content"] # Validate: response must contain a dollar amount (real DB data, not an apology) has_dollar = bool(re.search(r"\$[\d,]+\.?\d*|\d+\.\d{2}\s*(USD|per|/)", turn2_reply)) A dollar amount in Turn 2 is proof the billing agent queried the database and returned real data. An escalation message or a polite “sorry” would not match the regex. The Streamlit App-Full Walkthrough The app went through a significant upgrade after the initial deployment. The original version stored conversation history only in Streamlit session state, which meant everything was lost on page refresh or re-login. The updated version persists every conversation to Lakebase PostgreSQL and restores it on the next login-scoped to the logged-in user so no one sees anyone else’s history. app.py structure app.py ├── WorkspaceClient() — app SP OAuth, handled by Databricks Apps runtime ├── APP_SP_NAME — app SP UUID (PostgreSQL role for DB auth) ├── _get_current_user() — actual logged-in user email from request headers ├── CURRENT_USER — email used for per-user data isolation ├── ENDPOINT_NAME — from os.environ["AGENT_ENDPOINT_NAME"] (app.yaml) ├── LAKEBASE_ENDPOINT — Lakebase resource path (app.yaml) ├── LAKEBASE_HOST — Lakebase host (app.yaml) ├── _get_lakebase_token() — OAuth token for app SP, cached 50 min ├── _get_conn() — psycopg2 connection as app SP ├── _ensure_table() — creates conversation_metadata on first run ├── db_load/save/delete — Lakebase persistence helpers ├── _init_session() — bootstrap session state keys ├── save/load/delete conversation helpers (session + DB) ├── call_agent(messages) — SDK query to serving endpoint ├── check_endpoint_health() — verify READY state via SDK ├── Sidebar — user identity, session list, example questions └── Main chat area — message rendering, input, response loop The Identity Problem in Databricks Apps This is a subtle but important distinction. In Databricks Apps, WorkspaceClient() runs as the app’s service principal -not as the individual logged-in user. This matters for two separate concerns. DB connection auth: _w.postgres.generate_database_credential() generates an OAuth token for whoever _w represents-the app SP. The PostgreSQL user in the connection must match that identity exactly. If you pass the logged-in user’s email as the PostgreSQL user but authenticate with the app SP’s token, PostgreSQL rejects it. Per-user data isolation: The actual logged-in user’s email comes from request headers injected by Databricks’ OAuth proxy-not from the SDK. You read it from st.context.headers. _w = WorkspaceClient() APP_SP_NAME = _w.current_user.me().user_name # app SP UUID — PostgreSQL role for DB auth def _get_current_user() -> str: """Real logged-in user email — from Databricks Apps request headers.""" try: headers = st.context.headers email = ( headers.get("X-Forwarded-Email") or headers.get("X-Databricks-User-Email") or headers.get("X-Forwarded-User") or "" ) if email: return email except Exception: pass return APP_SP_NAME # fallback to app SP if headers unavailable CURRENT_USER = _get_current_user() # used only for data isolation, not DB auth APP_SP_NAME authenticates to PostgreSQL. CURRENT_USER appears in the user_email column of conversation_metadata and is used for all WHERE user_email = %s filters. Different users see only their own history. Lakebase Connection The OAuth token for the app SP is cached for 50 minutes via @st.cache_data. The Lakebase token expires at 60 minutes-the cache TTL ensures a refresh before expiry without hitting the API on every page render. @st.cache_data(ttl=3000, show_spinner=False) def _get_lakebase_token(_key: str = "app_sp") -> str: cred = _w.postgres.generate_database_credential(endpoint=LAKEBASE_ENDPOINT) return cred.token def _get_conn() -> psycopg2.extensions.connection: return psycopg2.connect( host = LAKEBASE_HOST, port = 5432, dbname = "databricks_postgres", user = APP_SP_NAME, # app SP UUID — must match the token password = _get_lakebase_token(), sslmode = "require", ) Lakebase Setup-Grants Required Before the app can connect and create tables, three things need to be in place in Lakebase. Step 1-Add the app SP as an OAuth role via the Lakebase UI: production branch → Roles & Databases → Add role → OAuth tab → select the app SP from the dropdown → Add Step 2-Give it CAN USE on the project via Lakebase Settings → Permissions. Step 3-Run the following grants directly in the Lakebase SQL Editor (navigate to production branch → SQL Editor, select the databricks_postgres database): -- Replace with the UUID shown under -- "Logged in as" in the app sidebar when first deployed -- (e.g. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GRANT CREATE ON SCHEMA public TO " "; GRANT USAGE ON SCHEMA public TO " "; -- After the first app load creates conversation_metadata, run this too: GRANT SELECT, INSERT, UPDATE, DELETE ON conversation_metadata TO " "; -- Checkpoint table access (for delete propagation) GRANT SELECT, DELETE ON checkpoint_writes TO " "; GRANT SELECT, DELETE ON checkpoint_blobs TO " "; GRANT SELECT, DELETE ON checkpoints TO " "; The CREATE ON SCHEMA public grant is what allows _ensure_table() to create conversation_metadata on first run. Without it, _init_db_once() silently returns False and the sidebar shows the ⚠️ History: DB unavailable (session only) warning. Table Auto-Creation @st.cache_resource ensures the table creation runs exactly once per app deployment, not once per user session: @st.cache_resource def _init_db_once(): try: _ensure_table() return True except Exception: return False _db_ready = _init_db_once() The conversation_metadata schema: CREATE TABLE IF NOT EXISTS conversation_metadata ( session_id TEXT PRIMARY KEY, user_email TEXT NOT NULL, title TEXT, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), turn_count INTEGER DEFAULT 0, messages TEXT -- full JSON of [{role, content, ts}, ...] ); CREATE INDEX IF NOT EXISTS idx_conv_user ON conversation_metadata (user_email); Persistence-Save, Load, Delete Every agent response triggers an upsert via ON CONFLICT (session_id) DO UPDATE. If the same session ID already exists in the table, only title, turn_count, messages, and updated_at are refreshed: def db_save_conversation(conv: dict, user_email: str): with conn.cursor() as cur: cur.execute(""" INSERT INTO conversation_metadata (session_id, user_email, title, turn_count, messages, updated_at) VALUES (%s, %s, %s, %s, %s, NOW()) ON CONFLICT (session_id) DO UPDATE SET title = EXCLUDED.title, turn_count = EXCLUDED.turn_count, messages = EXCLUDED.messages, updated_at = NOW() """, (conv["id"], user_email, conv["title"], conv["turn_count"], json.dumps(conv["messages"]))) conn.commit() On fresh login, conversations are loaded once per session via a db_loaded flag in session state. This avoids a round-trip to Lakebase on every Streamlit re-render: if not st.session_state["db_loaded"]: persisted = db_load_conversations(CURRENT_USER) st.session_state["conversations"].update(persisted) st.session_state["db_loaded"] = True Delete operations-both single-conversation and delete-all-propagate to Lakebase and also clean up the corresponding LangGraph checkpoint rows using the session_id as thread_id: def db_delete_conversation(session_id: str): with conn.cursor() as cur: cur.execute("DELETE FROM conversation_metadata WHERE session_id = %s", (session_id,)) for tbl in ("checkpoint_writes", "checkpoint_blobs", "checkpoints"): cur.execute(f"DELETE FROM public.{tbl} WHERE thread_id = %s", (session_id,)) conn.commit() Calling the Agent The ts timestamp field stored in session messages must be stripped before forwarding to the endpoint-predict() only accepts role and content: def call_agent(messages): sdk_msgs = [ ChatMessage(role=_ROLE_MAP.get(m["role"], ChatMessageRole.USER), content=m["content"]) # ts field intentionally excluded for m in messages ] resp = _w.serving_endpoints.query(name=ENDPOINT_NAME, messages=sdk_msgs) if resp.choices: return resp.choices[0].message.content return f"Unexpected response format: {str(resp)[:300]}" Endpoint Health Check The health check calls the SDK’s get method and inspects the actual ready state-not just checking whether the endpoint name string is non-empty, which would always pass: def check_endpoint_health(): try: ep = _w.serving_endpoints.get(name=ENDPOINT_NAME) state = str(ep.state.ready) if ep.state else "" return "READY" in state.upper() except Exception: return False The result is cached in session state at startup and shown as a green or red indicator in the sidebar, along with a retry button for when the endpoint is updating. The Pending Query Pattern Calling the agent from inside a sidebar button callback causes Streamlit state mutation issues mid-render. The deferred pattern sets pending_query in the callback and consumes it in the main script body: # Sidebar: set deferred query for ex in EXAMPLE_QUESTIONS: if st.button(ex, key=f"ex_{hash(ex)}", use_container_width=True): st.session_state["pending_query"] = ex # Main body: consume it user_query = ( st.session_state.pop("pending_query", None) or st.chat_input("Type your question here...") ) pop clears the pending query in a single operation — no separate cleanup needed. app.yaml and requirements.txt command: ["sh", "-c", "streamlit run app.py --server.port $DATABRICKS_APP_PORT --server.headless true"] env: - name: AGENT_ENDPOINT_NAME value: " " - name: LAKEBASE_ENDPOINT value: " " - name: LAKEBASE_HOST value: " " streamlit>=1.35.0 requests>=2.31.0 databricks-sdk>=0.89.0 psycopg2-binary>=2.9.0 psycopg2-binary is the only addition from the original requirements. Endpoint name, Lakebase host, and Lakebase resource path are all deployment config values in app.yaml-changing where the app points requires no code changes. Streamlit App in Action: https://medium.com/media/319db7d6d476e2ac77e26e8f4c51747d/href Key Design Decisions and What I’d Do Differently Self-contained deployed model. The deployed InsuranceAgentModel has no LangGraph or Spark-both are unavailable in Model Serving. The tradeoff is that the routing loop in predict() is a hand-rolled reimplementation of the LangGraph graph. In production I’d invest in making both paths identical. Prompt duplication. Every prompt template is defined twice-Cell 18 for the notebook, and again inside the agent_code string in Cell D4. A shared constants file loaded at runtime is the right fix. Conversation history unbounded. conversation_history accumulates all turns as a concatenated string with no truncation. A rolling window-keeping the last N turns or summarising older history-is the production fix. INNER JOIN vs LEFT JOIN. The notebook get_billing_info() uses INNER JOIN and silently fails for policies with no pending bills. The deployed model uses LEFT JOIN and always returns premium_amount. They should be consistent. Wrapping Up Building a multi-agent system on Databricks is genuinely different from building one with a standalone LangChain/LangGraph stack. You get Unity Catalog governance across data and models, managed Vector Search with delta-sync embeddings, Lakebase for persistent conversation memory, and Databricks Apps for zero-friction deployment — all inside one platform with one auth model. The non-obvious constraints: Spark is not available at serving time, short-lived OAuth tokens don’t work for SQL REST API calls, agents.deploy() internal slot versions are not MLflow registry versions, PostgresSaver.setup() must run in a context manager before the persistent connection. Every one of these is documented in the debug section of the notebook and surfaced with the exact error it produces. Full code-notebook, Streamlit app, sample images, architecture diagram-is in the GitHub repo. GitHub: GitHub - abhirup93/Databricks-Hackathon-Build-intelligent-Apps-with-Data-AI: Building Intelligent Apps with Data + AI Tech Stack: LangGraph 0.3.5 Claude Sonnet 4.6 Databricks Vector Search (GTE Large) Lakebase PostgreSQL MLflow PyFunc Databricks Model Serving (Serverless V5) Databricks Apps Unity Catalog Streamlit If you learned something useful here, a few claps 👏 go a long way. Follow me on Medium:- Abhirup Pal and LinkedIn:- Abhirup Pal for more data and AI engineering content. I Built a Multi-Agent Insurance Support Streamlit Chatbot on Databricks-Full Code Walkthrough was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Score: 27🌐 MovesMay 27, 2026https://pub.towardsai.net/i-built-a-multi-agent-insurance-support-streamlit-chatbot-on-databricks-full-code-walkthrough-1c2f64349659?source=rss----98111c9905da---4
The Moon needs robots
Science Robotics, Volume 11, Issue 114, May 2026.
Score: 27🌐 MovesMay 27, 2026https://www.science.org/doi/abs/10.1126/scirobotics.aei3965?af=R