Every new large model release for cheapos...

Posted by Vektor-Mem@reddit | LocalLLaMA | View on Reddit | 10 comments

[-]

Specter_Origin@reddit

Okay, that was just unexpected and uncalled for xD

[-]

bucolucas@reddit

12GB still feels like poverty eh? Guess I'll wait till I can spring for the 29-35B capable cards

[-]

SkyFeistyLlama8@reddit

There's VRAM and then there's weird janky-ass inference using barely supported laptop iGPUs or obscure CPUs.

Whatever works!

[-]

Like I know you can measure KL divergence from unsloth's posts comparing their quants against other's, but I genuinely don't think I am able to tell that a quant is 5% or 10% "better" or closer to the bf16 version.

The moment I know which model is which confirmation bias is going to hijack my subjective perception. The theoretically better model is going to feel like it's performing better and vice versa for the technically worse model. And then I am going to have my memory further hijacked by that one star use case that left a random lasting impression on me for whatever reason.

Like maybe I can tell a q1 model from a q8, but I have never used a q1 or a q8 anyways. I can't fit the q8 and the q1 is probably just a waste of time.