Anyone was able to run gpt-oss 20b on a 5090?
Posted by celsowm@reddit | LocalLLaMA | View on Reddit | 18 comments
Hi!
I tried using the new one vllm docker image but I got "Sinks are only supported in FlashAttention 3"
Any hints?
Posted by celsowm@reddit | LocalLLaMA | View on Reddit | 18 comments
18 Comments
anzzax@reddit
Great_Guidance_8448@reddit
Chance-Studio-8242@reddit
General-Cookie6794@reddit
Chance-Studio-8242@reddit
celsowm@reddit (OP)
Great_Guidance_8448@reddit
Sorry_Ad191@reddit
mxforest@reddit
Sorry_Ad191@reddit
celsowm@reddit (OP)
Sorry_Ad191@reddit
sleepingsysadmin@reddit
Green-Ad-3964@reddit
ForsookComparison@reddit
Green-Ad-3964@reddit
ForsookComparison@reddit
Prestigious_Thing797@reddit