ailee43

Qwen3.5-27B-IQ3_M, 5070ti 16GB, 32k context: ~50t/s

Posted by ailee43@reddit | LocalLLaMA | View on Reddit | 32 comments
Model parallelism for inference?

Posted by ailee43@reddit | LocalLLaMA | View on Reddit | 3 comments