Mixed Precision Quants

Posted by nikgeo25@reddit | LocalLLaMA | View on Reddit | 5 comments

Is anybody using mixed precision quantizations on the regular? Like having one part of the model at 8 bit and another at 4 bit fp.

What methods are you using for deciding which layers / experts should be higher precision?