Best local LLM setup for Coding on a MacBook Air M1 (8GB RAM)?

Posted by Foxtor@reddit | LocalLLaMA | View on Reddit | 4 comments

Hey everyone,

I’m looking to set up a local LLM environment on my MacBook Air M1 with only 8GB of RAM, specifically for coding assistance (Python, JS, etc.).

I know 8GB is the absolute bare minimum and swap memory will be an issue, so I’m looking for the most efficient setup possible that won't brick my VS Code while running.

My main questions:

Which app/backend should I use? I've heard about Ollama, LM Studio, and llama.cpp. Since I have Apple Silicon, is it worth hunting for MLX-native apps, or is Ollama’s metal support enough for 8GB?
Best models for code (under 8B)? I’m looking for models that punch above their weight. Is DeepSeek-Coder-V2-Lite-Instruct (MoE) viable here, or should I stick to something like Llama-3.1-8B or Stable-Code?
Quantization tips: For 8GB, should I strictly stay at Q4_K_M or can I push to Q5 if the model is small enough?
Workflow: What’s the best way to integrate this into VS Code? (Continue.dev? Codeium?)

Any tips on how to manage the RAM of these models so I can still have a browser and a code editor open would be greatly appreciated!

Thanks in advance!

[-]

United_Razzmatazz769@reddit

https://huggingface.co/prism-ml/Bonsai-8B-mlx-1bit.

It's based on qwen3 and it is guite coherent. Modelpage has link to mlx_lm fork to support this model.

Foxtor@reddit (OP)

Are you using it? I saw the page and I'm thinking about trying it out. One more question, have you ever tried creating a plan with a large model and then using a tiny model to execute the steps?

winna-zhang@reddit

for 8GB on M1 I’d keep it simple and optimize for stability over max quality

what worked for me:

– backend: Ollama (Metal is good enough, less setup pain than MLX)

– models: stick to 6–7B range, DeepSeek-Coder Lite or Llama 3.1 8B quantized

– quant: Q4_K_M is the safe spot, Q5 starts to push RAM too hard

the biggest thing though is workflow:

don’t try to run it like a full coding assistant — use it for smaller, scoped tasks (functions, snippets, debugging ideas)

once you keep context small, it feels way more usable on 8GB

spaceman_@reddit

Not feasible - even 16GB is very, very limited, and a base M1 is simply not fast enough.

You could get code completion with a small Qwen2.5-1.5B but that's about it.

You are not going to get meaningful coding assistance out of that hardware. You can barely get it to output anything coherent at unreasonably slow speeds.

Not worth the effort to try.

At the very least, to get ANYWHERE reasonable even just with chat (no agentic, tool calling), you need a 24GB Mac and preferably with a much faster processor (an M4 or M5, or an older Pro or Max).

It really only starts getting good at 48GB and above for Apple silicon devices.