summaryrefslogtreecommitdiff
path: root/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
diff options
context:
space:
mode:
Diffstat (limited to 'gemfeed/2025-08-05-local-coding-llm-with-ollama.md')
-rw-r--r--gemfeed/2025-08-05-local-coding-llm-with-ollama.md157
1 files changed, 0 insertions, 157 deletions
diff --git a/gemfeed/2025-08-05-local-coding-llm-with-ollama.md b/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
index 3e4a06de..bb8f11a5 100644
--- a/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
+++ b/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
@@ -155,163 +155,6 @@ aider --model ollama_chat/qwen2.5-coder:14b-instruct
[https://aider.chat](https://aider.chat)
[https://opencode.ai](https://opencode.ai)
-# Local LLM for Coding with Ollama on macOS
-
-> Published at 2025-08-04T16:43:39+03:00
-
-```
- [::]
- _| |_
- / o o \ |
- | ∆ | <-- Ollama / \
- | \___/ | / \
- \_______/ LLM --> / 30B \
- | | / Qwen3 \
- /| |\ / Coder \
- /_| |_\_________________/ quantised \
-```
-
-## Table of Contents
-
-* [⇢ Local LLM for Coding with Ollama on macOS](#local-llm-for-coding-with-ollama-on-macos)
-* [⇢ ⇢ Why Local LLMs?](#why-local-llms)
-* [⇢ ⇢ Hardware Considerations](#hardware-considerations)
-* [⇢ ⇢ Basic Setup and Manual Code Prompting](#basic-setup-and-manual-code-prompting)
-* [⇢ ⇢ ⇢ Installing Ollama and a Model](#installing-ollama-and-a-model)
-* [⇢ ⇢ ⇢ Example Usage](#example-usage)
-* [⇢ ⇢ Agentic Coding with Aider](#agentic-coding-with-aider)
-* [⇢ ⇢ ⇢ Installation](#installation)
-* [⇢ ⇢ ⇢ Agentic coding prompt](#agentic-coding-prompt)
-* [⇢ ⇢ ⇢ Compilation & Execution](#compilation--execution)
-* [⇢ ⇢ ⇢ The code](#the-code)
-* [⇢ ⇢ In-Editor Code Completion](#in-editor-code-completion)
-* [⇢ ⇢ ⇢ Installation of `lsp-ai`](#installation-of-lsp-ai)
-* [⇢ ⇢ ⇢ Helix Configuration](#helix-configuration)
-* [⇢ ⇢ ⇢ Code completion in action](#code-completion-in-action)
-* [⇢ ⇢ Conclusion](#conclusion)
-
-With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I decided to explore the capabilities of local large language models (LLMs) using Ollama.
-
-Ollama is a powerful tool that brings local AI capabilities directly to your local hardware. By running AI models locally, you can enjoy the benefits of intelligent assistance without relying on cloud services. This document outlines my initial setup and experiences with Ollama, with a focus on coding tasks and agentic coding.
-
-[https://ollama.com/](https://ollama.com/)
-
-## Why Local LLMs?
-
-Using local AI models through Ollama offers several advantages:
-
-* Data Privacy: Keep your code and data completely private by processing everything locally.
-* Cost-Effective: Reduce reliance on expensive cloud API calls.
-* Reliability: Works seamlessly even with spotty internet or offline.
-* Speed: Avoid network latency and enjoy instant responses while coding. Although I mostly found Ollama slower than commercial LLM providers. However, that may change with the evolution of models and hardware.
-
-## Hardware Considerations
-
-Running large language models locally is currently limited by consumer hardware capabilities:
-
-* GPU Memory: Most consumer-grade GPUs (even in 2025) top out at 16–24GB of VRAM, making it challenging to run larger models like the 30B (30 billion) parameter LLMs (they go up to the 100 billion and more).
-* RAM Constraints: On my MacBook Pro with M3 CPU and 36GB RAM, I chose a 14B model (`qwen2.5-coder:14b-instruct`) as it represents a practical balance between capability and resource requirements.
-
-For reference, here are some key points about running large LLMs locally:
-
-* Models larger than 30B: I don't even think about running them locally. One (e.g. from Qwen, Deepseek or Kimi K2) with several hundred billion parameters could match the "performance" of commercial LLMs (Claude Sonnet 4, etc). Still, for personal use, the hardware demands are just too high (or temporarily "rent" it via the public cloud?).
-* 30B models: Require at least 48GB of GPU VRAM for full inference without quantisation. Currently only feasible on high-end professional GPUs (or an Apple-silicone Mac with enough unified RAM).
-* 14B models: Can run with 16-24GB GPU memory (VRAM), suitable for consumer-grade hardware (or use a quantised larger model)
-* 7B-13B models: Best fit for mainstream consumer hardware, requiring minimal VRAM and running smoothly on mid-range GPUs, but with limited capabilities compared to larger models and more hallucinations.
-
-The model I'll be mainly using in this blog post (`qwen2.5-coder:14b-instruct`) is particularly interesting as:
-
-* `instruct`: Indicates this is the instruction-tuned variant, optimised for diverse tasks including coding
-* `coder`: Tells me that this model was trained on a mix of code and text data, making it especially effective for programming assistance
-
-[https://ollama.com/library/qwen2.5-coder](https://ollama.com/library/qwen2.5-coder)
-[https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct)
-
-For general thinking tasks, I found `deepseek-r1:14b` to be useful (in the future, I also want to try other `qwen` models here). For instance, I utilised `deepseek-r1:14b` to format this blog post and correct some English errors, demonstrating its effectiveness in natural language processing tasks. Additionally, it has proven invaluable for adding context and enhancing clarity in technical explanations, all while running locally on the MacBook Pro. Admittedly, it was a lot slower than "just using ChatGPT", but still within a minute or so.
-
-[https://ollama.com/library/deepseek-r1:14b](https://ollama.com/library/deepseek-r1:14b)
-[https://huggingface.co/deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)
-
-A quantised (as mentioned above) LLM which has been converted from high-precision connection (typically 16- or 32-bit floating point) representations to lower-precision formats, such as 8-bit integers. This reduces the overall memory footprint of the model, making it significantly smaller and enabling it to run more efficiently on hardware with limited resources or to allow higher throughput on GPUs and CPUs. The benefits of quantisation include reduced storage and faster inference times due to simpler computations and better memory bandwidth utilisation. However, quantisation can introduce a drop in model accuracy because the lower numerical precision means the model cannot represent parameter values as precisely. In some cases, it may lead to instability or unexpected outputs in specific tasks or edge cases.
-
-## Basic Setup and Manual Code Prompting
-
-### Installing Ollama and a Model
-
-To install Ollama, performed these steps (this assumes that you have already installed Homebrew on your macOS system):
-
-```sh
-brew install ollama
-rehash
-ollama serve
-```
-
-Which started up the Ollama server with something like this (the screenshots shows already some requests made):
-
-[![Ollama serving](./local-coding-LLM-with-ollama/ollama-serve.png "Ollama serving")](./local-coding-LLM-with-ollama/ollama-serve.png)
-
-And then, in a new terminal, I pulled the model with:
-
-```sh
-ollama pull qwen2.5-coder:14b-instruct
-```
-
-Now, I was ready to go! It wasn't so difficult. Now, let's see how I used this model for coding tasks.
-
-### Example Usage
-
-I run the following command to get a Go function for calculating Fibonacci numbers:
-
-```sh
-time echo "Write a function in golang to print out the Nth fibonacci number, \
- only the function without the boilerplate" | ollama run qwen2.5-coder:14b-instruct
-
-Output:
-
-func fibonacci(n int) int {
- if n <= 1 {
- return n
- }
- a, b := 0, 1
- for i := 2; i <= n; i++ {
- a, b = b, a+b
- }
- return b
-}
-
-Execution Metrics:
-
-Executed in 4.90 secs fish external
- usr time 15.54 millis 0.31 millis 15.24 millis
- sys time 19.68 millis 1.02 millis 18.66 millis
-```
-
-> Note, after having written this blog post, I tried the same with the newer model `qwen3-coder:30b-a3b-q4_K_M` (which "just" came out, and it's a quantised 30B model), and it was much faster:
-
-```
-Executed in 1.83 secs fish external
- usr time 17.82 millis 4.40 millis 13.42 millis
- sys time 17.07 millis 1.57 millis 15.50 millis
-```
-
-[https://ollama.com/library/qwen3-coder:30b-a3b-q4_K_M](https://ollama.com/library/qwen3-coder:30b-a3b-q4_K_M)
-
-## Agentic Coding with Aider
-
-### Installation
-
-Aider is a tool that enables agentic coding by leveraging AI models (also local ones, as in our case). While setting up OpenAI Codex and OpenCode with Ollama proved challenging (those tools either didn't know how to work with the "tools" (the capability to execute external commands or to edit files for example) or didn't connect at all to Ollama for some reason), Aider worked smoothly.
-
-To get started, the only thing I had to do was to install it via Homebrew, initialise a Git repository, and then start Aider with the Ollama model `ollama_chat/qwen2.5-coder:14b-instruct`:
-
-```sh
-brew install aider
-mkdir -p ~/git/aitest && cd ~/git/aitest && git init
-aider --model ollama_chat/qwen2.5-coder:14b-instruct
-```
-
-[https://aider.chat](https://aider.chat)
-[https://opencode.ai](https://opencode.ai)
[https://github.com/openai/codex](https://github.com/openai/codex)
### Agentic coding prompt