lordekeen

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 79 comments

[-]

lordekeen@reddit

In your current setup you could reduce the context to 128k with q8\_0/q5\_1 kv instead o q4, that alone would give you better results imediately.

Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to?

Posted by opoot_@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

lordekeen@reddit

Gemma 4 is good for non coding tasks, like summarization, information retrieval, web search etc.

What's this sub geebral opinion on quantisizing the KV cache

Posted by misanthrophiccunt@reddit | LocalLLaMA | View on Reddit | 91 comments

[-]

lordekeen@reddit

I'm using q8\_0/q5\_1 with unsloth's IQ4\_NL on this model. Managed to squeeze 64k context on two 3060's, its working good so far.

Step 3.7 Flash passes the car wash test

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

"At a distance of **50 meters** (about 160 feet), you should definitely **walk** to the car wash rather than drive. It takes less than a minute to walk that distance, and driving such a short length doesn't allow your car's engine to warm up properly, which can cause unnecessary wear over time. Plus, since you are going to *wash* the car, driving it there and back might just get it dirty again! As for the strawberries on your way back: The word **"strawberry"** contains exactly **3** "r"s (st**r**awbe**rr**y). Therefore, you will be paying **3** one-dollar bills ($3.00) for the package." Aaaaaaand thats Gemini 3.5 Flash answer.

Step 3.7 Flash passes the car wash test

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

lordekeen@reddit

"Let’s break it down: # 🚶‍♂️ Walk or drive? The car wash is only **50 meters away**. That’s extremely close (about half a football field). Driving would actually take more effort than just walking (getting in the car, starting it, etc.), so: 👉 **You should walk.** \-------------- 🍓 Strawberry price puzzle We count how many **“r”** letters are in **“strawberries”**: **s t r a w b e r r i e s** There are **3 “r”s**. 👉 So the price is **3 one-dollar bills = $3"** **ChatGPT's answer.**

Step 3.7 Flash passes the car wash test

Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

lordekeen@reddit

"I need to wash my car and the car wash is 50 m away. Should i walk or drive? In the way back i need to buy strawberries, if the strawberry package costs a number of 1 dolar bills equal to the number of r's in its name, how much am i paying?"

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche

Posted by Anbeeld@reddit | LocalLLaMA | View on Reddit | 71 comments

[-]

lordekeen@reddit

Llama router makes it easy to change models back and forth and trying multiple configs.

Q4_K_M is fine for chat and a trap for agents. Here is math mathing.

Posted by Napster3301@reddit | LocalLLaMA | View on Reddit | 55 comments

[-]

lordekeen@reddit

The harness and prompt may have a high impact in the error rate. I've noticed that when my prompt is very straight forward the result almost never misses.

What would 2x RTX 3060 12GB get me?

Posted by ObjectiveActuator8@reddit | LocalLLaMA | View on Reddit | 64 comments

[-]

lordekeen@reddit

I could only fit 48k context with mine, would you please share your configs? I'm using the IQ4\_XS version with MTP and got around 25 t/s.

What would 2x RTX 3060 12GB get me?

Posted by ObjectiveActuator8@reddit | LocalLLaMA | View on Reddit | 64 comments

[-]

lordekeen@reddit

I run this setup cause i already had one 3060, just grabbed the other one second hand. Its quite capable, but its better to get a single card with more vram than two cards, it will be faster.

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Posted by gaztrab@reddit | LocalLLaMA | View on Reddit | 92 comments

[-]

lordekeen@reddit

Did you notice any difference with or without the reasoning flags?

Dual GPU llama.cpp speedup

Posted by Legitimate-Dog5690@reddit | LocalLLaMA | View on Reddit | 51 comments

[-]

lordekeen@reddit

I was running -sm tensor on my dual 3060 setup already with the main llama.cpp, it always gives me more t/s than -sm layer (+-18 t/s vs +-25 ts/s), the only issue is that it uses a lot of system's ram. Now with mtp i'm getting around 30 t/s (Qwen3.6 27B).

MTP support merged into llama.cpp

Posted by tacticaltweaker@reddit | LocalLLaMA | View on Reddit | 108 comments

[-]

lordekeen@reddit

Finally! After i've spent three ours trying to compile the docker image from the PR haha

New Linux user, need help compiling llamacpp

Posted by Spiderboyz1@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

lordekeen@reddit

No need to compile, use the docker image.

Is using vLLM actually worth it if you aren't serving the model to other people?

Posted by ayylmaonade@reddit | LocalLLaMA | View on Reddit | 98 comments

[-]

lordekeen@reddit

Hijacking the thread, i see that i can make better use of my two RTX 3060s with vLLM, can anyone point me in the right direction in how to setup this? llama.cpp docker image its just so easy to deploy.

The Qwen 3.6 35B A3B hype is real!!!

Posted by The_Paradoxy@reddit | LocalLLaMA | View on Reddit | 149 comments

[-]

lordekeen@reddit

Have you experimented with the MTP model or just the DFlash?

"Hardware is the only moat" - Should we buy new hardware now or wait?

Posted by Alan_Silva_TI@reddit | LocalLLaMA | View on Reddit | 177 comments

[-]

lordekeen@reddit

I've said this in another thread: I firmly believe we are seeing the downfall of good consumer grade/afordable hardware, the massive amount of computers in the next years will be constructed around the idea of using cloud/ai services and subscription plans.

What's that one god damn app you need but won't work on Linux for no reason

Posted by Rough-Pen8792@reddit | linux | View on Reddit | 502 comments

[-]

lordekeen@reddit

After Effects, no good alternative for it yet.

Which distro are you using?

Posted by ukm_array@reddit | linux | View on Reddit | 778 comments

[-]

lordekeen@reddit

I'm on Rocky Linux 9, coming from CentOS 7