summaryrefslogtreecommitdiff
path: root/gemfeed
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2025-08-05 09:56:25 +0300
committerPaul Buetow <paul@buetow.org>2025-08-05 09:56:25 +0300
commitecfde341e7194246956790f25e07190b16abb1bc (patch)
tree0c21e106e7bd2311a4629a30a07e4416584552cb /gemfeed
parent9e56919e6d38cd12902f90881616b5ac6387193b (diff)
Update content for md
Diffstat (limited to 'gemfeed')
-rw-r--r--gemfeed/2025-08-05-local-coding-llm-with-ollama.md14
1 files changed, 9 insertions, 5 deletions
diff --git a/gemfeed/2025-08-05-local-coding-llm-with-ollama.md b/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
index d5102faf..4a2d819f 100644
--- a/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
+++ b/gemfeed/2025-08-05-local-coding-llm-with-ollama.md
@@ -64,14 +64,16 @@ For reference, here are some key points about running large LLMs locally:
The model I'll be mainly using in this blog post (`qwen2.5-coder:14b-instruct`) is particularly interesting as:
-* `instruct`: Indicates this is the instruction-tuned variant of QWE, optimised for diverse tasks including coding
+* `instruct`: Indicates this is the instruction-tuned variant, optimised for diverse tasks including coding
* `coder`: Tells me that this model was trained on a mix of code and text data, making it especially effective for programming assistance
+[https://ollama.com/library/qwen2.5-coder](https://ollama.com/library/qwen2.5-coder)
[https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct)
-For general thinking tasks, I found `deepseek-r1:14b` to be useful. For instance, I utilised `deepseek-r1:14b` to format this blog post and correct some English errors, demonstrating its effectiveness in natural language processing tasks. Additionally, it has proven invaluable for adding context and enhancing clarity in technical explanations, all while running locally on the MacBook Pro. Admittedly, it was a lot slower than "just using ChatGPT", but still within minutes.
+For general thinking tasks, I found `deepseek-r1:14b` to be useful (in the future, I also want to try other `qwen` models here). For instance, I utilised `deepseek-r1:14b` to format this blog post and correct some English errors, demonstrating its effectiveness in natural language processing tasks. Additionally, it has proven invaluable for adding context and enhancing clarity in technical explanations, all while running locally on the MacBook Pro. Admittedly, it was a lot slower than "just using ChatGPT", but still within a minute or so.
[https://ollama.com/library/deepseek-r1:14b](https://ollama.com/library/deepseek-r1:14b)
+[https://huggingface.co/deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)
A quantised (as mentioned above) LLM which has been converted from high-precision connection (typically 16- or 32-bit floating point) representations to lower-precision formats, such as 8-bit integers. This reduces the overall memory footprint of the model, making it significantly smaller and enabling it to run more efficiently on hardware with limited resources or to allow higher throughput on GPUs and CPUs. The benefits of quantisation include reduced storage and faster inference times due to simpler computations and better memory bandwidth utilisation. However, quantisation can introduce a drop in model accuracy because the lower numerical precision means the model cannot represent parameter values as precisely. In some cases, it may lead to instability or unexpected outputs in specific tasks or edge cases.
@@ -396,6 +398,8 @@ content = "{CODE}"
As you can see, I have also added other models, such as Mistral Nemo and DeepSeek R1, so that I can switch between them in Helix. Other than that, the completion parameters are interesting. They define how the LLM should interact with the text in the text editor based on the given examples.
+If you want to see more `lsp-ai` configuration examples, they are some for Vim and Helix in the `lsp-ai` git repository!
+
### Code completion in action
The screenshot shows how Ollama's `qwen2.5-coder` model provides code completion suggestions within the Helix editor. The LSP auto-completion is triggered by typing `<CURSOR>` in the code snippet, and Ollama responds with relevant completions based on the context.
@@ -406,15 +410,15 @@ In the LSP auto-completion, the one prefixed with `ai - ` was generated by `qwen
I found GitHub Copilot to be still faster than `qwen2.5-coder:14b`, but the local LLM one is actually workable for me already. And, as mentioned earlier, things will likely improve in the future regarding local LLMs. So I am excited about the future of local LLMs and coding tools like Ollama and Helix.
-After trying `qwen3-coder:30b-a3b-q4_K_M` (following the publication of this blog post), I found it to be significantly faster and more capable than the previous model, making it a promising option for local coding tasks. Experimentation reveals that even current local setups are surprisingly effective for routine coding tasks, offering a glimpse into the future of on-machine AI assistance.
+> After trying `qwen3-coder:30b-a3b-q4_K_M` (following the publication of this blog post), I found it to be significantly faster and more capable than the previous model, making it a promising option for local coding tasks. Experimentation reveals that even current local setups are surprisingly effective for routine coding tasks, offering a glimpse into the future of on-machine AI assistance.
## Conclusion
-Will there ever be a time we can run larger models (60B, 100B, ...and larger) on consumer hardware, or even on our phones? We are not quite there yet, but I am optimistic that we will see significant improvements in the next few years. As hardware capabilities improve and/or become cheaper, and more efficient models are developed, the landscape of local AI coding assistants will continue to evolve.
+Will there ever be a time we can run larger models (60B, 100B, ...and larger) on consumer hardware, or even on our phones? We are not quite there yet, but I am optimistic that we will see improvements in the next few years. As hardware capabilities improve and/or become cheaper, and more efficient models are developed (or new techniques will be invented to make language models more effective), the landscape of local AI coding assistants will continue to evolve.
For now, even the models listed in this blog post are very promising already, and they run on consumer-grade hardware (at least in the realm of the initial tests I've performed... the ones in this blog post are overly simplistic, though! But they were good for getting started with Ollama and initial demonstration)! I will continue experimenting with Ollama and other local LLMs to see how they can enhance my coding experience. I may cancel my Copilot subscription, which I currently use only for in-editor auto-completion, at some point.
-However, truth be told, I don't think the setup described in this blog post currently matches the performance of commercial models like Claude Code (Sonnet 4, Opus 4), Gemini 2.5 Pro, and others. Maybe we could get close if we had the high-end hardware needed to run the largest Qwen Coder model available. But, as mentioned already, that is out of reach for occasional coders like me. Furthermore, I want to continue coding manually to some degree, as otherwise I will start to forget how to write for-loops, which can be awkward... However, do we always need the best model when AI can help generate boilerplate or repetitive tasks even with smaller models?
+However, truth be told, I don't think the setup described in this blog post currently matches the performance of commercial models like Claude Code (Sonnet 4, Opus 4), Gemini 2.5 Pro, the OpenAI models and others. Maybe we could get close if we had the high-end hardware needed to run the largest Qwen Coder model available. But, as mentioned already, that is out of reach for occasional coders like me. Furthermore, I want to continue coding manually to some degree, as otherwise I will start to forget how to write for-loops, which can be awkward... However, do we always need the best model when AI can help generate boilerplate or repetitive tasks even with smaller models?
E-Mail your comments to `paul@nospam.buetow.org` :-)