here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.
Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 21 comments
https://benchmark-yourself.streamlit.app/
BBQ is 🔥
- Rule 4: Limit Self-Promotion - this is not self promotion
- The 1/10th rule is a good guideline: self-promotion should not be more than 10% of your content. - my content is high quality and diversified
- Affiliation must be disclosed: No engagement farming, No “I found this..”, etc. - I am not affiliated with streamline or oMLX or anything.
FatheredPuma81@reddit
I'm going to use AI to cheat to make myself look smart.
JLeonsarmiento@reddit (OP)
How did it went?
JLeonsarmiento@reddit (OP)
Do it and paste your score card here.
BitGreen1270@reddit
LoL my first question was on gene sequencing. I noped out of there so fast 😁
DeltaSqueezer@reddit
Same!
JLeonsarmiento@reddit (OP)
Try bbq, that one is easy until getting weird.
JLeonsarmiento@reddit (OP)
Try bbq. That one is fun.
JLeonsarmiento@reddit (OP)
Qwen3.6-2b is taking your job at the gene sequencing factory.
Noxusequal@reddit
Hey can you build in something that aggregates the human scores so we get an human average? This would actually be really nice for scientific work because then we know the human baseline we compare to.
JLeonsarmiento@reddit (OP)
I thought about that, but I don’t want to collect anyone’s data, neither having to add an “accept” terms button to this.
Noxusequal@reddit
I see I mean you could just store the overall score annonymosed and mixed in with all other users in that case there is no real data being stored that is in any way traceable. I guess otherwise one just has to go through a thread like this one and aggregate the scores.
JLeonsarmiento@reddit (OP)
Could be, but I cannot tell who’s answering the test directly or using any agent to go trough the test, so it could be polluted in the end
Noxusequal@reddit
Ah good point ^^
Obvious-Ad-2454@reddit
JLeonsarmiento@reddit (OP)
good score meatbag.
Ieafeator@reddit
Jesus christ MMLU is hard. Especially with all the missing context and weird units and USisms. At least I beat Qwen 3.5-2B with my 46.7%. Also lol at the random super simple question in the middle.
JLeonsarmiento@reddit (OP)
Yes, that one’s hard sometimes. Try BBQ , that one is difficult in an awkward way.
Borkato@reddit
What’s BBQ
JLeonsarmiento@reddit (OP)
Social bias and prejudice test that I barely passed 😅
JLeonsarmiento@reddit (OP)
Social bias benchmark.
killerstreak976@reddit
I'm so goddam excited to do this