Current SOTA Text to Text LLM?
Posted by 1GewinnerTwitch@reddit | LocalLLaMA | View on Reddit | 8 comments
What is the best Model I can run on my 4090 for non coding tasks. What models in quants can you recommend for 24GB VRAM?
Serveurperso@reddit
Sans oublier GLM 4 32B que les gens oublient à cause de GLM 4.5 Air (à faire tourner en DDR5 mini car déborde de nos taille de VRAM) le 32B rentre avec le bon qwant (je le fais tourner en Q6 mais j'ai 32GB), très très bon.
lly0571@reddit
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct
https://huggingface.co/Qwen/Qwen3-32B
https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507
https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507
https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
Maybe IQ4_XS for Seed-36B, Q4_K_M/Q4_K_XL/official AWQ quant for Qwen-32B, Q5 for Qwen3-30B with a 4090.
You can also try Mistral Small 3.2 or Gemma3-27B, which could be better for writing compared to Qwen-32B. Maybe use Q5 for Gemma3 or Q6 for Mistral?
1GewinnerTwitch@reddit (OP)
This seems very good thanks.
bjodah@reddit
What languages?
marisaandherthings@reddit
...hmmm,i guess qwen3 coder with 6bit quantisation could fit in your gpu vram and run at a relatively good speed...
AtomicDouche@reddit
https://huggingface.co/openai-community/gpt2
Namra_7@reddit
😂😂🙏
marisaandherthings@reddit
??? 😭