Claude code can now connect directly to llama.cpp server

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 15 comments

Anthropic messages API was merged today and allows claude code to connect to llama-server: https://github.com/ggml-org/llama.cpp/pull/17570

I've been playing with claude code + gpt-oss 120b and it seems to work well at 700 pp and 60 t/s. I don't recommend trying slower LLMs because the prompt processing time is going to kill the experience.