Current SOTA Text to Text LLM?

Posted by 1GewinnerTwitch@reddit | LocalLLaMA | View on Reddit | 8 comments

What is the best Model I can run on my 4090 for non coding tasks. What models in quants can you recommend for 24GB VRAM?

[-]

Serveurperso@reddit

Sans oublier GLM 4 32B que les gens oublient à cause de GLM 4.5 Air (à faire tourner en DDR5 mini car déborde de nos taille de VRAM) le 32B rentre avec le bon qwant (je le fais tourner en Q6 mais j'ai 32GB), très très bon.

[-]

lly0571@reddit

https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct

https://huggingface.co/Qwen/Qwen3-32B

https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507

https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507

https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506

Maybe IQ4_XS for Seed-36B, Q4_K_M/Q4_K_XL/official AWQ quant for Qwen-32B, Q5 for Qwen3-30B with a 4090.

You can also try Mistral Small 3.2 or Gemma3-27B, which could be better for writing compared to Qwen-32B. Maybe use Q5 for Gemma3 or Q6 for Mistral?

[-]

1GewinnerTwitch@reddit (OP)

This seems very good thanks.

[-]