The Rundown AI / Guides / General / Run Your Own Coding Agent on Your Laptop (for Free)
Run Your Own Coding Agent on Your Laptop (for Free)

Run Your Own Coding Agent on Your Laptop (for Free)

Try Text

The Rundown In this guide, you will learn how to download a coding LLM to your laptop using Ollama and wire it into Claude Code or Codex with a single command. You will be able to run your coding agent for free on simple tasks and keep proprietary code entirely on your machine. Who This Is Useful For Solo devs and indie founders  paying $100-200/mo for Claude Code Max or Codex Pro on work that does not need frontier reasoning Students, hobbyists, and new coders  who want a working agentic setup to practice on for free Anyone working on proprietary or NDA code  who wants the whole loop to stay on their laptop What You Will Build A local Claude Code (or Codex, or OpenCode) session pointed at a free Ollama model running on your own hardware. Same agent, same interface, zero per-token cost, nothing leaving your machine. What You Need to Get Started Ollama installed. If you do not have it, follow  our Ollama install guide first Claude Code signed in and working, or OpenCode (installer in Step 5) A terminal and a project folder you actually want to work in 16 GB of RAM is ideal. 8 GB works with smaller models. 32 GB+ is better. Step 1 Check Your Hardware and Ask an LLM What to Run Before you pull anything, know what your machine can handle. On Mac: Apple menu >  About This Mac  > screenshot the specs panel. On Windows:  Settings  >  System  >  About  > screenshot. Drop that screenshot into Claude, ChatGPT, or any LLM, and ask: "Which Ollama coder models can I realistically run on this machine with Claude Code or Codex?" It will read the RAM and chip off your screenshot and give you a short list. Rough guide to what fits where: Your RAM Recommended size Example 8 GB 3B params or smaller qwen3-coder:3b 12 GB 4-7B params gemma4:e2b , qwen3-coder:7b 16 GB 7-12B params qwen3-coder:7b , gemma4:e4b 32 GB+ or GPU 20B+ qwen3-coder:32b , gpt-oss:20b , gemma4:26b Pro tip:  Download the biggest version you can reasonably fit. Bigger means more reliable tool calls, which is the thing that actually breaks small models inside Claude Code. Step 2 Pick a Coder Model That Supports Agentic Tools Browse  ollama.com/search?q=coder  and open the page for a model your LLM recommended.  Critical check:  scroll to the Applications section on the model page and confirm it lists Claude Code, Codex, OpenCode, or OpenClaw. If it lists none of them, the model does not support the tool calls agentic coding requires. Skip it. The three that held up best in our testing: qwen3-coder  is the purpose-built coder pick. Strong raw code generation at its size. gemma4  has explicit tool-use and thinking training. Handles multi-step tool chains more reliably. gpt-oss  is OpenAI's open-weights MoE model with strong agentic support. Pro tip:  If you cannot choose between two candidates, pull both. Disk is cheap and you can swap between them with the  --model  flag. Keep the one that actually holds up on your workflow. Step 3 Launch Your Coding Agent With the Local Model From the model's Ollama page, copy the launch command. It looks like: ollama launch claude --model gemma4:e4b Open a terminal in your actual project folder, paste the command, and hit enter. Confirm the download when it prompts you, then wait for the weights to pull the first time. After that, Ollama drops you into Claude Code pointed at the local model instead of Anthropic's API. Inside the session, type  /model  to confirm which model is wired in. Every response from here costs zero tokens. Pro tip:  Run  ollama ps  in a second terminal to see what is actually running. It shows the active model, RAM in use, and GPU utilization. 100% GPU means you are fully accelerated. Anything lower means part of the model is spilling to CPU, and responses will be slower. Step 4 Bump the Context Window Before You Do Anything Serious This is the single most important setting in the whole setup. By default, Ollama allocates only  4K of context  per model, which is way too small for agentic coding. Claude Code will read one file, fill the buffer, and immediately start forgetting the rest of the conversation. Fix it once and never think about it again: Open the  Ollama app Ollama menu >  Settings Find the  Context  slider Bump it to  32K  to start, or higher if your specs allow Pro tip:  Ask your LLM what context size your specs can safely handle. Maxing it out can push the model past your GPU's limits and crash things. Start at 32K, verify with  ollama ps , then raise if there is headroom. Step 5 Try OpenCode If Claude Code Feels Too Heavy Claude Code is a sophisticated harness with a lot of tools exposed at once. Small local models sometimes get confused by the volume of choices.  OpenCode  is a lighter-weight coding agent built for this case, and it uses the same  ollama launch  pattern. Install it on Mac with one line: curl -fsSL https://opencode.ai/install | bash Then launch it the same way: ollama launch opencode --model gemma4:e4b Pro tip:  Smaller models work better when you ask them to think less. Hit  Shift+Tab  inside Claude Code or OpenCode to toggle plan mode (the agent writes its approach before touching files). Lower the reasoning effort in Settings if the agent keeps over-thinking simple tasks. Going Further The single highest-leverage move once the basics are working is a  hybrid setup . Keep paying for Claude Code or Codex as your main agent, but configure your local model as a cheap subagent it orchestrates. The frontier model handles architecture and planning. The local model grinds through the boilerplate. You keep the smart work and offload the cheap work, and the token meter slows way down. Two more places to take this: Remote-control your local session from your phone.  Run  claude remote-control  inside the session and scan the QR code with the Claude mobile app. You can kick off a task at your desk and supervise it from anywhere. Put the model on a dedicated machine.  An old Mac mini or repurposed PC makes a great always-on local inference box. Hit it from your laptop over your network and stop fighting your daily driver for RAM. Local models are not yet a full replacement for frontier cloud coding agents. Treat this as a practice environment, a privacy layer for sensitive projects, and a budget hedge on the tasks that do not need frontier reasoning. The gap closes every few months.

Save 15% on Thoughtly by joining The Runway University

Through our vast network of over 100 partnerships with major AI companies, we also offer University members free access or major discounts to the many popular AI tools along with AI certified courses, daily guides on AI tools and many more.

Tools

No items found.

AI training for the future of work.

Get access to all our AI certificate courses, hundreds of real-world AI use cases, live expert-led workshops, an exclusive network of AI early adopters, and more.

General