Chuyito

FYI llamacpp server can hot swap models now-a-days in under 30sec

Posted by Chuyito@reddit | LocalLLaMA | View on Reddit | 5 comments
125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

Posted by Chuyito@reddit | LocalLLaMA | View on Reddit | 101 comments
Adding asyncio.sleep(0) made my data pipeline (150 ms) not spike to (5500 ms)

Posted by Chuyito@reddit | Python | View on Reddit | 39 comments
Flux.1 on a 16GB 4060ti @ 20-25sec/image

Posted by Chuyito@reddit | LocalLLaMA | View on Reddit | 53 comments