siegevjorn
-
Qwen 3.6 coding choice–27B vs 35B quants
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 96 comments
-
Qwen 35b a3b surprises me
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 38 comments
-
Which inference engine to choose for mlx?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 17 comments
-
"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but NOT anywhere near the ratios people have suggested)" says Anthropic's CEO
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 446 comments
-
What agentic cli do you use for local models ?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Llama.cpp: vlm access via llama-server causes cuda OOM error after processing 15k images.
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
-
llm-compressor: vLLM AWQ quant with multiple GPUs keep causing errors
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Claude code Max vs. Mac Studio M4 Max 128gb running open code
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 40 comments
-
Fine-tuning llms on dgx spark from nvidia webpage
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Analyzing email thread: hallucination
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 10 comments
-
M2 to PCIEx16 adaptor safety
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
-
'My Productivity Is At Zero': Meme Frenzy On Social Media As ChatGPT Goes Down Globally
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Curious how AMD (Radeon) GPUs can handle LLMs
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 34 comments
-
Dual 3090 configurations—Are used 3090s reliable enough?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Myth about nvlink
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Why is it hard to find LLM size that fits consumer-grade GPUs?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 57 comments
-
Everyone and their mother knows about DeepSeek
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 360 comments
-
Open-R1: a fully open reproduction of DeepSeek-R1 from huggingface
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Laptop inference speed on Llama 3.3 70B
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 75 comments
-
Red Hat Announces Definitive Agreement to Acquire Neural Magic (vLLM)
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 50 comments
-
GPU poor's dilemma: 3060 12GB vs. 4060 Ti 16GB
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 27 comments
-
Boy is 5090 beautiful
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Is 7900 xt basically idential to M2 ultra chips in terms of token generation speed?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Model size increase in ollama when context size increases
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 16 comments
-
What is your acceptable TG speed?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 50 comments
-
New year (2025) poll: At what MSRP would you purchase RTX 5090 32GB?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 37 comments
-
What is your acceptable PP speed?
Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 9 comments