here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.

Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 21 comments

https://benchmark-yourself.streamlit.app/

BBQ is 🔥

Rule 4: Limit Self-Promotion - this is not self promotion
The 1/10th rule is a good guideline: self-promotion should not be more than 10% of your content. - my content is high quality and diversified
Affiliation must be disclosed: No engagement farming, No “I found this..”, etc. - I am not affiliated with streamline or oMLX or anything.

[-]

FatheredPuma81@reddit

I'm going to use AI to cheat to make myself look smart.

[-]

JLeonsarmiento@reddit (OP)

Do it and paste your score card here.

[-]

BitGreen1270@reddit

LoL my first question was on gene sequencing. I noped out of there so fast 😁

[-]

JLeonsarmiento@reddit (OP)

Try bbq, that one is easy until getting weird.

[-]

JLeonsarmiento@reddit (OP)

Qwen3.6-2b is taking your job at the gene sequencing factory.

[-]

Hey can you build in something that aggregates the human scores so we get an human average? This would actually be really nice for scientific work because then we know the human baseline we compare to.

[-]

JLeonsarmiento@reddit (OP)

I thought about that, but I don’t want to collect anyone’s data, neither having to add an “accept” terms button to this.

[-]

Noxusequal@reddit

I see I mean you could just store the overall score annonymosed and mixed in with all other users in that case there is no real data being stored that is in any way traceable. I guess otherwise one just has to go through a thread like this one and aggregate the scores.

[-]

JLeonsarmiento@reddit (OP)

Could be, but I cannot tell who’s answering the test directly or using any agent to go trough the test, so it could be polluted in the end

[-]

Noxusequal@reddit

Ah good point ^^

[-]

Obvious-Ad-2454@reddit

[-]

JLeonsarmiento@reddit (OP)

good score meatbag.

[-]

Ieafeator@reddit

Jesus christ MMLU is hard. Especially with all the missing context and weird units and USisms. At least I beat Qwen 3.5-2B with my 46.7%. Also lol at the random super simple question in the middle.

[-]

JLeonsarmiento@reddit (OP)

Yes, that one’s hard sometimes. Try BBQ , that one is difficult in an awkward way.

[-]

Borkato@reddit

What’s BBQ

[-]

JLeonsarmiento@reddit (OP)

Social bias and prejudice test that I barely passed 😅

[-]

JLeonsarmiento@reddit (OP)

Social bias benchmark.

[-]

killerstreak976@reddit

I'm so goddam excited to do this