I Crammed RAG, a Vector Database, and a Gemma LLM into a Mobile App. Here’s What Happened.

No cloud. No API keys. No excuses. The full on-device pipeline — from writing a note to getting an answer. Nothing in this flow touches a network after the initial model download. It started with a paranoid thought. I was taking meeting notes on my phone — project decisions, research fragments, half-formed ideas — and I wanted to ask a question across all of them. What did I decide about the API design last month? What did that article say about PostgreSQL indexing? The obvious answer was to pipe everything into one of the big AI APIs. But then I thought: every note I’ve ever written would be traveling to someone else’s server. My research. My half-baked ideas. My meeting minutes. I didn’t like that. So I did what any reasonable developer does when they don’t like the obvious answer: I spent considerably more time building the unreasonable one. The Constraint That Changed Everything I gave myself one rule: after the initial model download, the app works with the network radio completel

Read Original Article →

Source

https://pub.towardsai.net/i-crammed-rag-a-vector-database-and-a-gemma-llm-into-a-mobile-app-heres-what-happened-6e1a270e6d44?source=rss----98111c9905da---4