Secure_Reflection409

I can't believe it actually runs - Qwen 235b @ 16GB VRAM

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 123 comments
vLLM is kinda awesome

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 50 comments
DGX NVLINK/RPC benchmarks

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 0 comments
GLM-4.6-UD-IQ2_M b0rked?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 2 comments
Qwen 480 speed check

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 3 comments
Initial results with gpt120 after rehousing 2 x 3090 into 7532

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 10 comments
Qwen Next vLLM fail @ 48GB

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 20 comments
vLLM - What are your preferred launch args for Qwen?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 24 comments
Llama.cpp --verbose

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 17 comments
roo tested and top models: 24 - 48GB VRAM

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 19 comments
Llama.cpp - so we're not fully offloading to GPU?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 10 comments
Do all models crash when looking at chat templates?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 6 comments
vscode + roo + Qwen3-30B-A3B-Thinking-2507-Q6_K_L = superb

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 31 comments
4090 48GB for UK - Where?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 13 comments
Llama.cpp - non-AVX processors?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 14 comments
2 cards, 1 quant

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 3 comments
Do we need a new 0.6b (2507) draft model for Qwen?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 9 comments
3090Ti - 38 tokens/sec?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 9 comments
Is a heavily quantised Q235b any better than Q32b?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 43 comments
Qwen 235b @ 16GB VRAM - specdec - 9.8t/s gen

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 20 comments
New top of the table - MMLU-Pro

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 2 comments
5090 benchmarks - where are they?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 33 comments
Qwen3 in LMStudio @ 128k

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 9 comments
What's the best mobile handset for donkeying with LLMs atm?

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 4 comments
DeepSeek dethroned on MMLU-Pro leaderboard

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 1 comments
This why you're never getting a 5090

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 6 comments
Llama3.2_1b_coder_instruct

Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 5 comments