Questions about parameter size & quantization

Posted by LeastExperience1579@reddit | LocalLLaMA | View on Reddit | 6 comments

If I run two models under same VRAM usage (e.g. Gemma 3 4b in Q8 and Gemma3 12b in Q2)

Which would be smarter / faster ? What are the strengths of the two?