Best local LLM setup for Coding on a MacBook Air M1 (8GB RAM)?

Posted by Foxtor@reddit | LocalLLaMA | View on Reddit | 4 comments

Hey everyone,

I’m looking to set up a local LLM environment on my MacBook Air M1 with only 8GB of RAM, specifically for coding assistance (Python, JS, etc.).

I know 8GB is the absolute bare minimum and swap memory will be an issue, so I’m looking for the most efficient setup possible that won't brick my VS Code while running.

My main questions:

  1. Which app/backend should I use? I've heard about Ollama, LM Studio, and llama.cpp. Since I have Apple Silicon, is it worth hunting for MLX-native apps, or is Ollama’s metal support enough for 8GB?
  2. Best models for code (under 8B)? I’m looking for models that punch above their weight. Is DeepSeek-Coder-V2-Lite-Instruct (MoE) viable here, or should I stick to something like Llama-3.1-8B or Stable-Code?
  3. Quantization tips: For 8GB, should I strictly stay at Q4_K_M or can I push to Q5 if the model is small enough?
  4. Workflow: What’s the best way to integrate this into VS Code? (Continue.dev? Codeium?)

Any tips on how to manage the RAM of these models so I can still have a browser and a code editor open would be greatly appreciated!

Thanks in advance!