Anyone was able to run gpt-oss 20b on a 5090?

[-]

anzzax@reddit

works fine in updated LM Studio, openai/gpt-oss-20b, 205.70 tok/sec on simple prompt

Reply

[-]

Great_Guidance_8448@reddit

I am getting around 90 tok/sec when generating code and 330 tok/sec summarizing text on my laptop 5090 24gig

Reply

[-]

Chance-Studio-8242@reddit

I am curious how the thermals are when running llms on laptop 5090

Reply

[-]

Chance-Studio-8242@reddit

Any luck here with 5090 and vllm?

Reply

[-]

Great_Guidance_8448@reddit

Works fine with the latest LM Studio on my laptop' RTX 5090 24 gig

Reply

[-]

Sorry_Ad191@reddit

thats llama.cpp backend, we are trying to get it to work in vllm or sglang because the throughput is 10x compared to llama.cpp

Reply

[-]

mxforest@reddit

How much context are you able to fit on a single 5090? I mean total context across multiple users?

Reply

[-]

Sorry_Ad191@reddit

Any news? other than llama.cpp working? I mean for vLLM or Sglang?

Reply

[-]

celsowm@reddit (OP)

No :( only llamacpp and ollama yet, the problem is triton and flash attention

Reply

[-]

Sorry_Ad191@reddit

someone somewhere must have figured it out :-) if not soonTM

Reply

[-]

sleepingsysadmin@reddit

In my testing on lm studio, you cant have flash attention enabled. This model has a different sort of attention system going on. The lm studio blog explains it.

Reply

[-]

Green-Ad-3964@reddit

I get an error with ollama (5090 here)

Reply

[-]

ForsookComparison@reddit

Llama CPP's update to run gpt-oss models was only merged into Ollama 3 hours ago. If you haven't updated since then, you'll need to do so.

Reply

[-]

Green-Ad-3964@reddit

I updated before testing...yet I get the error https://preview.redd.it/mzkq0j131ahf1.png?width=1204&format=png&auto=webp&s=84ad904e61c218d9b4f37c7aafbee958ca4fd93e

Reply

[-]

ForsookComparison@reddit

download llama cpp so you can see in the foreground what's actually failing

Reply

[-]

Failing on my Pro 6000 too. Seems blackwell support isn't working like they announced :( Especially frustrating since the model was designed to leverage the fp4 compute blackwell has [https://blog.vllm.ai/2025/08/05/gpt-oss.html](https://blog.vllm.ai/2025/08/05/gpt-oss.html)

Reply

Anyone was able to run gpt-oss 20b on a 5090?

Reply to Post

18 Comments

anzzax@reddit

Great_Guidance_8448@reddit

Chance-Studio-8242@reddit

General-Cookie6794@reddit

Chance-Studio-8242@reddit

celsowm@reddit (OP)

Great_Guidance_8448@reddit

Sorry_Ad191@reddit

mxforest@reddit

Sorry_Ad191@reddit

celsowm@reddit (OP)

Sorry_Ad191@reddit

sleepingsysadmin@reddit

Green-Ad-3964@reddit

ForsookComparison@reddit

Green-Ad-3964@reddit

ForsookComparison@reddit

Prestigious_Thing797@reddit