Claude Code Free with Local AI

5 STEPS TO ZERO-COST CLAUDE CODE → Step 1 · Install Claude Code → Step 2 · Install Ollama → Step 3 · Pull the Right Model → Step 4 · Connect Claude Code to Your Local Model → Step 5 · Expand the Context Window to 64K Tokens Source: Image by the author. Claude’s standard subscription costs $17/month, while the Max plan hits $100/month. But even with an active subscription, context limits run out fast when you are working inside large codebases. The moment your session hits its ceiling, you’re stuck either waiting or paying more. The better way is that you can run Claude Code using a completely free, open-source model “ Qwen 3.5 9B” ,running locally on your own machine via Ollama. This means no cloud bills, no rate limits eating into your workflow and completely private code handling. Note: You still need to be on at least a Claude Pro plan to access and initialize the Claude Code tool itself; what this local setup eliminates is the heavy API model usage/inference costs. System Prerequisites Before diving into the terminal commands please ensure your hardware environment meets the baseline operational demands. Local LLMs are heavily dependent on system memory and compute capabilities. OS: Windows 10 or 11 (PowerShell 7+ recommended). RAM: 16GB minimum. If you have 8GB, the model will offload aggressively to system storage, causing massive performance lag. Storage: At least 10GB of free SSD storage space to safely host the weights and system dependencies. Step 1: Install Claude Code & Configure Your Path Open PowerShell and run the installation command for Claude Code. During the installation process, pay close attention to the output terminal. It will display a specific folder path where the executable is being stored, looking something like this: C:\Users\ \.local\bin Crucial Missing Step - The installer does not always automatically append this to your system environment variables. If you skip this, your terminal will throw a “command not found” or “path not recognized” error the first time you try to launch it. Press the Windows Key , type “Environment Variables” , and select Edit the system environment variables . Click the Environment Variables button at the bottom. Under User variables , select Path and click Edit . Click New and paste your path: C:\Users\ \.local\bin (make sure to replace with your actual Windows account name). Click Move Up to push it to the top of the list so it executes cleanly. Click OK to save and close out of the windows. Step 2: Install and Verify Ollama Download and run the Ollama installer for Windows. Ollama installer for Windows : https://ollama.com/download/windows Once the installation finishes, open a fresh PowerShell terminal and verify that the Ollama background service is responsive by typing: ollama If it successfully returns a list of available commands, it’s alive and ready. You can check if you have any models already downloaded by running: ollama list Source: Image by the author. Step 3: Pull and Test the Base Model For a standard 16GB RAM machine, Qwen 3.5 9B is the absolute sweet spot. It is highly capable at handling complex coding syntax, refactoring, and logical reasoning without choking your physical hardware. To download the model, run: ollama pull qwen3.5:9b (The download is roughly 6.6 gb, so give it a few minutes depending on your internet speed). Once finished, give it a quick sanity check to make sure it’s interacting properly : ollama run qwen3.5:9b Type a quick prompt (e.g., "Write a fast sorting function in Python"). Once it responds correctly, exit the model prompt using Ctrl + D or by typing /exit. Step 4: Creating a Custom Modelfile Here is a critical catch that almost every guide on the internet skips: If you just run the base model, Ollama caps its default context length at only 16,384 tokens . While 16K is fine for a basic chat interface, it is a death sentence for a terminal agent like Claude Code. Claude Code needs to read entire directory structures, parse multiple files simultaneously, and look through historical context blocks. If you use the default limit, the model will quickly lose track mid-task and start hallucinating or failing. Qwen 3.5 natively supports up to 256K tokens, so we are going to expand our local context to a comfortable 64K tokens . Open up a plain text editor (like Notepad) and paste the following two lines: FROM qwen3.5:9b PARAMETER num_ctx 65536 2. Save this file exactly as Modelfile (make sure it has no .txt file extension) inside your active working directory. 3. Back in your terminal, build your new, superpower-backed model by running: ollama create qwen3.5-9b-64k -f Modelfile Verify it compiled correctly by typing ollama list. Your new qwen3.5-9b-64k model should show up right at the top of the list. Note: If you are tight on storage space, you can safely remove the original base model using ollama rm qwen3.5:9b since your new custom build handles the heavy lifting. Step 5: Launch Claude Code with Your Local Model Now it’s time to link them up. Navigate to the local coding project you want to work on using the cd command, and launch Claude Code pointed directly to your high-context local model: ollama launch claude-model qwen3.5-9b-64k Claude Code will initialize right inside your project directory. It will prompt a quick security safety check asking if you trust the project folder — confirm it, and you’re officially in. How to Verify It’s Working: To ensure Anthropic’s cloud isn’t secretly receiving your requests and billing your account, type the following command inside the Claude Code interface: /models You will see a matrix showing that all model slots — including Sonnet, Opus, and Haiku — are actively pointing back to your local qwen3.5-9b-64k model. Your local machine is now driving the entire developer environment. Real-World Performance If you want to track how your hardware is coping while you code, open up a separate, parallel terminal window and run: ollama ps This utility gives you a live look at your active local models and their exact VRAM/RAM footprints. On a mid-tier 16GB system, expect your resources to average roughly 45% CPU utilization and 55% GPU utilization during intensive generation cycles. Because it’s running locally, the model will take longer to “think” when executing complex repository sweeps. For instance, if you paste a massive GitHub URL repository and ask it to clone the codebase, map its entire architectural layout, find critical code flaws, and calculate deployment specifications, it can take up to 10 or 11 minutes of continuous processing. But the trade-off is unmatched: the output is incredibly thorough, entirely private, completely unlimited, and costs exactly $0 in API fees . Troubleshooting Common Windows Challenges Source: Image by the author. Even if it appears that you have every aspect set up as intended, running local AI tools on Windows can at times put a wrench in your workflow. If your terminal offers you trouble, then here are the most typical technical hurdles and specifically how to fix them: 1. “Command Not Found” (Even After Updating the Path) If you have meticulously added C:\Users\ \.local\bin to your environment variables but typing claude still throws an error, don't panic. Windows terminals do not automatically refresh their system environment map in active sessions. Fix: Simply close your current PowerShell window entirely and open a brand-new instance. If you are using an integrated terminal inside an IDE like VS Code, you will need to restart the entire IDE for the changes to take effect. 2. The Invisible .txt Extension Trap When creating your Modelfile, Windows File Explorer loves to be "helpful" by secretly saving it as Modelfile.txt while hiding known extensions from your view. If Ollama tells you it can't find or read the file during the compile step, this hidden extension is almost always the culprit. Fix: Open File Explorer, click on View (or Options depending on your Windows version), and check the box for File name extensions . If you see Modelfile.txt, right-click it, select rename, and delete the .txt entirely so it stands alone as just Modelfile. 3. Sluggish Performance and Intense System Lag If your entire machine locks up, freezes, or responds at a snail’s pace while Claude Code is processing a request, your system memory is likely hitting a hard bottleneck. This happens when your hardware is forced to offload model weights from your fast VRAM/RAM onto your slower local system drive. Fix: Close heavy background apps like Google Chrome tabs, Discord, or Docker containers before launching your local model. If you are on a strict 16GB RAM limit and resource contention persists, consider dropping the context limit in your Modelfile from 65536 down to 32768 (32K). This still gives you double the default capacity while significantly reducing the active memory footprint. 4. The “Connection Refused” Ollama Error If Claude Code throws an error saying it cannot establish a connection to your local backend, the Ollama background host daemon has either crashed or hasn’t initialized yet. Fix: Look at your Windows taskbar system tray (the small arrow icon in the bottom right corner) and verify that the Ollama llama icon is visible. If it isn’t there, search for Ollama in your Windows Start Menu and launch it manually to wake up the local server API before trying to run Claude Code again. Thanks for sticking around to the end. If you found these checkpoints helpful, a few claps go a long way. Let me know your thoughts regarding this article in the comments below. I am a tech writer and the founder of AnalystHQ, where I simplify data analytics and AI workflows. If you want to dive deeper into practical data skills, you can check out my complete playbook, the SQL & Excel Data Analyst Course . Claude Code Free with Local AI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Original Article →

Source

https://pub.towardsai.net/claude-code-free-with-local-ai-ffbc92208006?source=rss----98111c9905da---4