examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

[-]

llama-impersonator@reddit

it would be nice if lcpp supported echo so lm-eval could work directly without some bs transformer integration.

Reply

[-]

Organic_Scarcity_495@reddit

having a standardized eval script inside llama.cpp itself is great. saves everyone from setting up their own janky benchmark pipeline that measures different things

Reply

[-]

Lol, I was literally spending last week creating my own benchmarking repo with some nut tasks to see how fast and how accurate the models were. At least it was a fun experiment to see how some models reason.

Reply

[-]

ttkciar@reddit

Can attest to the truth of this, having written my own janky benchmark pipeline that measures weird things.

Reply

[-]

lumos675@reddit

I really don't care about the output time. Cause think about. Maximum how many line of code you need to write in one go? 3000 lines? Still it's not as time consuming as prefill of 150k context.

Reply

[-]

wektor420@reddit

Good find, something similiar for vllm would be cool

Reply

[-]

StorageHungry8380@reddit

-c 4194304 -np 256 That's not your grandpa's GPU... Not that it requires it, just... not the parameters I run at home. Very cool addition, been wanting to run benches easily at home while tinkering.

Reply

[-]

spaceman_@reddit

Likely running on CPU, given the high `np` value, no?

Reply

[-]

PANIC_EXCEPTION@reddit

why big gpu when many cpu do trick?

Reply

[-]

fiery_prometheus@reddit

Someone like him likely has donated datacenter GPUs, can't imagine he wouldn't have those at this point

Reply

[-]

perkia@reddit

The real home was the datacenters we slept in along the way

Reply

[-]

coherentspoon@reddit

Thanks for making us aware.

Reply

[-]

TheBlueMatt@reddit

Hopefully this leads to more formal (even if benchmaxxed) results for quantized models - just looking at divergence may or may not capture the quality of a quantization fully and this might help.

Reply

[-]

Eyelbee@reddit

Doesn't seem very good. Isn't aime datasets proprietary? Also why do we need llm as a judge for aime? Catn't see the loglikelihood scoring too

Reply

[-]

computehungry@reddit

Oh this is nice. Although it might look trivial, when I tried to bench some models, I found that so many benchmarks just ask for "API_KEY" without any (local) server option. Sure it's not too hard to vibe-hook them, but still pretty great to have out of the box.

Reply

[-]

Zc5Gwu@reddit

I hope it brings a little more rigor to people’s vibes about different quants.

Reply

[-]

ketosoy@reddit

Having fought with lm-eval for many days, I look forward to having an eval tool with some gg level elegance.

Reply

[-]

Dany0@reddit

ggs my friend

Reply

[-]

Chromix_@reddit

"now you can evaluate your models at home" -> now you can heat your home ;-) (Maybe slightly less when [restricting power usage](https://www.reddit.com/r/LocalLLaMA/comments/1tayu5t/stop_wasting_electricity/) and undervolting a bit) It's also nice that there is now a single, fixed way of evaluation. No more oddness with everyone adapting an existing benchmark to local models in a different way, running it with different versions of dependencies, and so on. The scores of the same model differed quite a bit, depending on how it was evaluated, as I found with the SuperGPQA benchmark, and I'm not even talking about the regular variation between runs here.

Reply

[-]

Far-Low-4705@reddit

this was very very much needed

Reply

[-]

RIP26770@reddit

Dope 😎

Reply

[-]

a_beautiful_rhind@reddit

The tests take a while but it's a good benchmark to see if your LLM is underperforming. I had to reduce the simultaneous requests from the ridiculous number it does by default.

Reply

examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

Reply to Post

22 Comments

llama-impersonator@reddit

Organic_Scarcity_495@reddit

Luigi_Boy_96@reddit

ttkciar@reddit

lumos675@reddit

wektor420@reddit

StorageHungry8380@reddit

spaceman_@reddit

PANIC_EXCEPTION@reddit

fiery_prometheus@reddit

perkia@reddit

coherentspoon@reddit

TheBlueMatt@reddit

Eyelbee@reddit

computehungry@reddit

Zc5Gwu@reddit

ketosoy@reddit

Dany0@reddit

Chromix_@reddit

Far-Low-4705@reddit

RIP26770@reddit

a_beautiful_rhind@reddit