1080 Ti in 2026 - 11GB is still (barely) enough to stay relevant
Posted by srodland01@reddit | LocalLLaMA | View on Reddit | 11 comments
I’m still daily driving a 1080 Ti. Not because I’m a masochist, I just haven't been able to justify a 4090/5090 upgrade yet.
For anyone wondering how this holds up:
Qwen 2.5 7B and Llama 3.2 8B (Q4_K_M) still get me about 8-9 tokens per second. It’s not "fast", but for reading speed it’s fine. I can even run Mistral 7B at Q5_K_S fully on the card if I keep the context window short.
The 11GB VRAM is the only reason this card isn't in a bin. But the limits are getting obvious:
- Anything 13B or larger requires heavy offloading, and the speed falls off a cliff immediately.
- Context is the real killer. Past 4k tokens, the memory pressure makes the whole system crawl.
- No tensor cores means no fancy optimizations that the newer cards get.
It’s fine for a basic daily driver if you stick to the small stuff, but the second you want to do more than one thing at a time or run a decent sized prompt, it feels its age.
Who else is still holding onto "old" mid-tier VRAM cards (2060 12GB, 3060, even old AMD stuff)?
What’s your actual daily-use model right now, and what was the specific moment you realized the hardware was finally holding you back?
nakitastic@reddit
I have an ancient XPS laptop reinstalled with Linux, with 16GB RAM, a GTX1050 mobile 4gb VRAM, and with llama.cpp get over 20tps using Qwen3.5-2B.Q4_K_M.gguf , 64k context, reasoning on.
Mistral 7B Q4KM with 32k context, 6tps.
If you got almost triple my VRAM you should be able to beat that substantially. Ida thunk.
JaffyCaledonia@reddit
I have dual 1080Tis in my machine and even a single GPU is getting leagues better performance than yours.
I'm using Unsloth's Qwen 3.5 9B IQ3XXS (I wanted to squeeze it alongside another model in vram for a home assistant persona) and that managed a pretty consistent 40t/s using just 5.5GB of VRAM.
Can you share your LLM config and setup? Surely there will be some way to improve what you have there.
MR_-_501@reddit
These models are ancient though. Maybe try something like qwen 3.5 9B?
gandhi_theft@reddit
Gemma 4 E4B is nice too.
My_Unbiased_Opinion@reddit
Just for fun, you should try Qwen 3.6 27B at UD IQ2XXS. Set KVcache to Q8 and see how much context you can fit all in VRAM.
I have a 24GB p40 that is 100% viable for quantized Qwen 3.6 35B.
AppealSame4367@reddit
I get 16 tps at first token and 7 tps at 1000 token output on a laptop with rtx2060 and 6gb ram + 32gb ram - with Qwen3.6 35B in ik_llama
Something wrong with your configuration, my card is much slower.
Prudent-Ad4509@reddit
This was a great GPU for old style 4k 60FPS games. I have two of those with SLI link. Maybe, just maybe, I will put them into my work PC for some local inference. But the inference on them is pretty slow.
Client_Hello@reddit
Something is wrong, you should get 5x more tps on those small models.
sterby92@reddit
Not sure if you did something wrong, but last year I still run gemma3-12b (unsloth q4_xl) on a 1080ti, and got more like 20 tokens per second. So maybe you could go faster
Equivalent_Bit_461@reddit
You could try a dual build of two medium GPUs, not easy but you can get enough vram or even more than one expensive GPU, while not ideal for speed still can be cheaper
uti24@reddit
Most certainly for games, not for LLM
Looks slow for 1080? I expected like, 30-50 t/s?