Secure_Reflection409
-
I can't believe it actually runs - Qwen 235b @ 16GB VRAM
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 123 comments
-
vLLM is kinda awesome
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 50 comments
-
DGX NVLINK/RPC benchmarks
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 0 comments
-
GLM-4.6-UD-IQ2_M b0rked?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Qwen 480 speed check
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Initial results with gpt120 after rehousing 2 x 3090 into 7532
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Qwen Next vLLM fail @ 48GB
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 20 comments
-
vLLM - What are your preferred launch args for Qwen?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 24 comments
-
Llama.cpp --verbose
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 17 comments
-
roo tested and top models: 24 - 48GB VRAM
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 19 comments
-
Llama.cpp - so we're not fully offloading to GPU?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Do all models crash when looking at chat templates?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 6 comments
-
vscode + roo + Qwen3-30B-A3B-Thinking-2507-Q6_K_L = superb
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 31 comments
-
4090 48GB for UK - Where?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 13 comments
-
Llama.cpp - non-AVX processors?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 14 comments
-
2 cards, 1 quant
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Do we need a new 0.6b (2507) draft model for Qwen?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 9 comments
-
3090Ti - 38 tokens/sec?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 9 comments
-
Is a heavily quantised Q235b any better than Q32b?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 43 comments
-
Qwen 235b @ 16GB VRAM - specdec - 9.8t/s gen
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 20 comments
-
New top of the table - MMLU-Pro
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 2 comments
-
5090 benchmarks - where are they?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 33 comments
-
Qwen3 in LMStudio @ 128k
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 9 comments
-
What's the best mobile handset for donkeying with LLMs atm?
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 4 comments
-
DeepSeek dethroned on MMLU-Pro leaderboard
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 1 comments
-
This why you're never getting a 5090
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Llama3.2_1b_coder_instruct
Posted by Secure_Reflection409@reddit | LocalLLaMA | View on Reddit | 5 comments