luckyj

Can't get over 250TPS on RTX5090 with Qwen3.5-4B

Posted by luckyj@reddit | LocalLLaMA | View on Reddit | 30 comments
Problem parsing thinking tokens on Openwebui with qwen3.6 on LM Studio

Posted by luckyj@reddit | LocalLLaMA | View on Reddit | 6 comments