GPT-OSS-120B vs DGX Spark
Posted by AdamLangePL@reddit | LocalLLaMA | View on Reddit | 18 comments
Just curious what are your best speeds with that model. The max peak that i have using vllm is 32tps (out) on i think Q4 k\_s. Any way to make it faster without loosing response quality ?
18 Comments
AdamLangePL@reddit (OP)
hurdurdur7@reddit
AdamLangePL@reddit (OP)
hurdurdur7@reddit
AdamLangePL@reddit (OP)
hurdurdur7@reddit
Odd-Ordinary-5922@reddit
AdamLangePL@reddit (OP)
AdamLangePL@reddit (OP)
Odd-Ordinary-5922@reddit
prescorn@reddit
inevitabledeath3@reddit
Ok_Appearance3584@reddit
Narrow-Belt-5030@reddit
pmttyji@reddit
ImportancePitiful795@reddit
AdamLangePL@reddit (OP)
pontostroy@reddit