How to connect Claude Code CLI to a local llama.cpp server

Posted by StrikeOner@reddit | LocalLLaMA | View on Reddit | 44 comments

How to connect Claude Code CLI to a local llama.cpp server

I’ve seen a lot of people struggling to get Claude Code working with a local llama.cpp setup, so here’s a quick guide that worked for me.


1. CLI (Terminal)

Add this to your .bashrc (or .zshrc):

export ANTHROPIC_AUTH_TOKEN="not_set"
export ANTHROPIC_API_KEY="not_set_either!"
export ANTHROPIC_BASE_URL="http://<your-llama.cpp-server>:8080"

Reload your shell:

source ~/.bashrc

and run the cli with the model argument:

claude --model Qwen3.5-35B-Thinking

2. VS Code setup with the Claude Code extension installed

Edit:

$HOME/.config/Code/User/settings.json

Add:

"claudeCode.environmentVariables": [
  {
    "name": "ANTHROPIC_BASE_URL",
    "value": "http://<your-llama.cpp-server>:8080"
  },
  {
    "name": "ANTHROPIC_AUTH_TOKEN",
    "value": "dummy"
  },
  {
    "name": "ANTHROPIC_API_KEY",
    "value": "sk-no-key-required"
  },
  {
    "name": "ANTHROPIC_MODEL",
    "value": "gpt-oss-20b"
  },
  {
    "name": "ANTHROPIC_DEFAULT_SONNET_MODEL",
    "value": "Qwen3.5-35B-Thinking-Coding"
  },
  {
    "name": "ANTHROPIC_DEFAULT_OPUS_MODEL",
    "value": "Qwen3.5-27B-Thinking-Coding"
  },
  {
    "name": "ANTHROPIC_DEFAULT_HAIKU_MODEL",
    "value": "gpt-oss-20b"
  },
  {
    "name": "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC",
    "value": "1"
  }
],
"claudeCode.disableLoginPrompt": true

Notes