Qwen3 27B FP8 + TurboQuant on RTX 5090 - anyone tried?

Posted by Clasyc@reddit | LocalLLaMA | View on Reddit | 23 comments

Do I understand correctly, based on this comment, that I can potentially fit Qwen 3 27B FP8 precision model and have around 256K context available and fit it fully in my RTX 5090 VRAM? Of course with the help of TurboQuant compression, at what state is it now in llama.cpp, is it usable, has anyone tried?