lordekeen

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 79 comments

lordekeen@reddit

In your current setup you could reduce the context to 128k with q8\_0/q5\_1 kv instead o q4, that alone would give you better results imediately.

Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to?

Posted by opoot_@reddit | LocalLLaMA | View on Reddit | 42 comments

What's this sub geebral opinion on quantisizing the KV cache

Posted by misanthrophiccunt@reddit | LocalLLaMA | View on Reddit | 91 comments

Step 3.7 Flash passes the car wash test

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 45 comments

lordekeen@reddit

"At a distance of **50 meters** (about 160 feet), you should definitely **walk** to the car wash rather than drive. It takes less than a minute to walk that distance, and driving such a short length doesn't allow your car's engine to warm up properly, which can cause unnecessary wear over time. Plus, since you are going to *wash* the car, driving it there and back might just get it dirty again! As for the strawberries on your way back: The word **"strawberry"** contains exactly **3** "r"s (st**r**awbe**rr**y). Therefore, you will be paying **3** one-dollar bills ($3.00) for the package." Aaaaaaand thats Gemini 3.5 Flash answer.

Step 3.7 Flash passes the car wash test

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 45 comments

lordekeen@reddit

"Let’s break it down: # 🚶‍♂️ Walk or drive? The car wash is only **50 meters away**. That’s extremely close (about half a football field). Driving would actually take more effort than just walking (getting in the car, starting it, etc.), so: 👉 **You should walk.** \-------------- 🍓 Strawberry price puzzle We count how many **“r”** letters are in **“strawberries”**: **s t r a w b e r r i e s** There are **3 “r”s**. 👉 So the price is **3 one-dollar bills = $3"** **ChatGPT's answer.**

Step 3.7 Flash passes the car wash test

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 45 comments

lordekeen@reddit

"I need to wash my car and the car wash is 50 m away. Should i walk or drive? In the way back i need to buy strawberries, if the strawberry package costs a number of 1 dolar bills equal to the number of r's in its name, how much am i paying?"

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche

Posted by Anbeeld@reddit | LocalLLaMA | View on Reddit | 71 comments

Q4_K_M is fine for chat and a trap for agents. Here is math mathing.

Posted by Napster3301@reddit | LocalLLaMA | View on Reddit | 55 comments

lordekeen@reddit

The harness and prompt may have a high impact in the error rate. I've noticed that when my prompt is very straight forward the result almost never misses.

What would 2x RTX 3060 12GB get me?

Posted by ObjectiveActuator8@reddit | LocalLLaMA | View on Reddit | 64 comments

lordekeen@reddit

I could only fit 48k context with mine, would you please share your configs? I'm using the IQ4\_XS version with MTP and got around 25 t/s.

What would 2x RTX 3060 12GB get me?

Posted by ObjectiveActuator8@reddit | LocalLLaMA | View on Reddit | 64 comments

lordekeen@reddit

I run this setup cause i already had one 3060, just grabbed the other one second hand. Its quite capable, but its better to get a single card with more vram than two cards, it will be faster.

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Posted by gaztrab@reddit | LocalLLaMA | View on Reddit | 92 comments

Dual GPU llama.cpp speedup

Posted by Legitimate-Dog5690@reddit | LocalLLaMA | View on Reddit | 51 comments

lordekeen@reddit

I was running -sm tensor on my dual 3060 setup already with the main llama.cpp, it always gives me more t/s than -sm layer (+-18 t/s vs +-25 ts/s), the only issue is that it uses a lot of system's ram. Now with mtp i'm getting around 30 t/s (Qwen3.6 27B).

MTP support merged into llama.cpp

Posted by tacticaltweaker@reddit | LocalLLaMA | View on Reddit | 108 comments

New Linux user, need help compiling llamacpp

Posted by Spiderboyz1@reddit | LocalLLaMA | View on Reddit | 32 comments

Is using vLLM actually worth it if you aren't serving the model to other people?

Posted by ayylmaonade@reddit | LocalLLaMA | View on Reddit | 98 comments

lordekeen@reddit

Hijacking the thread, i see that i can make better use of my two RTX 3060s with vLLM, can anyone point me in the right direction in how to setup this? llama.cpp docker image its just so easy to deploy.

The Qwen 3.6 35B A3B hype is real!!!

Posted by The_Paradoxy@reddit | LocalLLaMA | View on Reddit | 149 comments

"Hardware is the only moat" - Should we buy new hardware now or wait?

Posted by Alan_Silva_TI@reddit | LocalLLaMA | View on Reddit | 177 comments

lordekeen@reddit

I've said this in another thread: I firmly believe we are seeing the downfall of good consumer grade/afordable hardware, the massive amount of computers in the next years will be constructed around the idea of using cloud/ai services and subscription plans.

What's that one god damn app you need but won't work on Linux for no reason

Posted by Rough-Pen8792@reddit | linux | View on Reddit | 502 comments

Which distro are you using?

Posted by ukm_array@reddit | linux | View on Reddit | 778 comments