siegevjorn

Qwen 3.6 coding choice–27B vs 35B quants

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 96 comments
Qwen 35b a3b surprises me

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 38 comments
Which inference engine to choose for mlx?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 17 comments
"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but NOT anywhere near the ratios people have suggested)" says Anthropic's CEO

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 446 comments
What agentic cli do you use for local models ?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 12 comments
Llama.cpp: vlm access via llama-server causes cuda OOM error after processing 15k images.

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
llm-compressor: vLLM AWQ quant with multiple GPUs keep causing errors

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
Claude code Max vs. Mac Studio M4 Max 128gb running open code

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 40 comments
Fine-tuning llms on dgx spark from nvidia webpage

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
Analyzing email thread: hallucination

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 10 comments
M2 to PCIEx16 adaptor safety

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
'My Productivity Is At Zero': Meme Frenzy On Social Media As ChatGPT Goes Down Globally

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 6 comments
Curious how AMD (Radeon) GPUs can handle LLMs

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 34 comments
Dual 3090 configurations—Are used 3090s reliable enough?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 3 comments
Myth about nvlink

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 10 comments
Why is it hard to find LLM size that fits consumer-grade GPUs?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 57 comments
Everyone and their mother knows about DeepSeek

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 360 comments
Open-R1: a fully open reproduction of DeepSeek-R1 from huggingface

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 8 comments
Laptop inference speed on Llama 3.3 70B

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 75 comments
Red Hat Announces Definitive Agreement to Acquire Neural Magic (vLLM)

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 50 comments
GPU poor's dilemma: 3060 12GB vs. 4060 Ti 16GB

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 27 comments
Boy is 5090 beautiful

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 1 comments
Is 7900 xt basically idential to M2 ultra chips in terms of token generation speed?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
Model size increase in ollama when context size increases

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 16 comments
What is your acceptable TG speed?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 50 comments
New year (2025) poll: At what MSRP would you purchase RTX 5090 32GB?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 37 comments
What is your acceptable PP speed?

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 9 comments