1080 Ti in 2026 - 11GB is still (barely) enough to stay relevant

Posted by srodland01@reddit | LocalLLaMA | View on Reddit | 11 comments

I’m still daily driving a 1080 Ti. Not because I’m a masochist, I just haven't been able to justify a 4090/5090 upgrade yet.

For anyone wondering how this holds up:

Qwen 2.5 7B and Llama 3.2 8B (Q4_K_M) still get me about 8-9 tokens per second. It’s not "fast", but for reading speed it’s fine. I can even run Mistral 7B at Q5_K_S fully on the card if I keep the context window short.

The 11GB VRAM is the only reason this card isn't in a bin. But the limits are getting obvious:

- Anything 13B or larger requires heavy offloading, and the speed falls off a cliff immediately.

- Context is the real killer. Past 4k tokens, the memory pressure makes the whole system crawl.

- No tensor cores means no fancy optimizations that the newer cards get.

It’s fine for a basic daily driver if you stick to the small stuff, but the second you want to do more than one thing at a time or run a decent sized prompt, it feels its age.

Who else is still holding onto "old" mid-tier VRAM cards (2060 12GB, 3060, even old AMD stuff)?

What’s your actual daily-use model right now, and what was the specific moment you realized the hardware was finally holding you back?

[-]

nakitastic@reddit

I have an ancient XPS laptop reinstalled with Linux, with 16GB RAM, a GTX1050 mobile 4gb VRAM, and with llama.cpp get over 20tps using Qwen3.5-2B.Q4_K_M.gguf , 64k context, reasoning on.

Mistral 7B Q4KM with 32k context, 6tps.

If you got almost triple my VRAM you should be able to beat that substantially. Ida thunk.

JaffyCaledonia@reddit

I have dual 1080Tis in my machine and even a single GPU is getting leagues better performance than yours.

I'm using Unsloth's Qwen 3.5 9B IQ3XXS (I wanted to squeeze it alongside another model in vram for a home assistant persona) and that managed a pretty consistent 40t/s using just 5.5GB of VRAM.

Can you share your LLM config and setup? Surely there will be some way to improve what you have there.

Qwen 2.5 7B and Llama 3.2 8B (Q4_K_M) still get me about 8-9 tokens per second.

Looks slow for 1080? I expected like, 30-50 t/s?