TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 23 comments

14+ independent validators now across Metal, CUDA, HIP, Vulkan, and MLX. Apple Silicon, NVIDIA (4090, 5090, H100, A100, V100, 1080 Ti), AMD (RX 9070 XT, RX 6600). from M1 to Blackwell.
this is what open source research looks like. the data converges.

- u/Pidtom

That's an all-in-one thread to check all discussions & benchmarks on TurboQuant.