Gave Maverick another shot (much better!)
Posted by Conscious_Cut_6144@reddit | LocalLLaMA | View on Reddit | 56 comments
For some reason Maverick was hit particularly hard on my multiple choice cyber security benchmark by the llama.cpp inference bug.
Went from one of the worst models to one of the best.
1st - GPT-4.5 - 95.01% - $3.87
**2nd - Llama-4-Maverick-UD-Q4-GGUF-latest-Llama.cpp 94.06%**
3rd - Claude-3.7 - 92.87% - $0.30
3rd - Claude-3.5-October - 92.87%
**5th - Meta-Llama3.1-405b-FP8 - 92.64%**
6th - GPT-4o - 92.40%
6th - Mistral-Large-123b-2411-FP16 92.40%
8th - Deepseek-v3-api - 91.92% - $0.03
9th - GPT-4o-mini - 91.75%
10th - DeepSeek-v2.5-1210-BF16 - 90.50%
11th - Meta-LLama3.3-70b-FP8 - 90.26%
12th - Qwen-2.5-72b-FP8 - 90.09%
13th - Meta-Llama3.1-70b-FP8 - 89.15%
14th - Llama-4-scout-Lambda-Last-Week - 88.6%
14th - Phi-4-GGUF-Fixed-Q4 - 88.6%
16th - Hunyuan-Large-389b-FP8 - 88.60%
17th - Qwen-2.5-14b-awq - 85.75%
18th - Qwen2.5-7B-FP16 - 83.73%
19th - IBM-Granite-3.1-8b-FP16 - 82.19%
20th - Meta-Llama3.1-8b-FP16 - 81.37%
**\*\*\* - Llama-4-Maverick-UD-Q4-GGUF-Old-Llama.cpp 77.44%**
**\*\*\* - Llama-4-Maverick-FP8-Lambda-Last-Week- 77.2%**
21st - IBM-Granite-3.0-8b-FP16 - 73.82%
Not sure how much faith I put in the bouncing balls test, but it does still struggle with that one.
So guessing this is still not going to be a go-to for coding.
Still this at least gives me a lot more hope for the L4 reasoner.
56 Comments
danielhanchen@reddit
az226@reddit
Admirable-Star7088@reddit
No_Afternoon_4260@reddit
yoracale@reddit
No_Afternoon_4260@reddit
yoracale@reddit
No_Afternoon_4260@reddit
dampflokfreund@reddit
yoracale@reddit
Admirable-Star7088@reddit
Devonance@reddit
segmond@reddit
UltrMgns@reddit
FullstackSensei@reddit
brahh85@reddit
Conscious_Cut_6144@reddit (OP)
brahh85@reddit
Conscious_Cut_6144@reddit (OP)
Admirable-Star7088@reddit
yoracale@reddit
Admirable-Star7088@reddit
yoracale@reddit
Admirable-Star7088@reddit
yoracale@reddit
Conscious_Cut_6144@reddit (OP)
wehtammai@reddit
ezjakes@reddit
dampflokfreund@reddit
emprahsFury@reddit
Conscious_Cut_6144@reddit (OP)
davewolfs@reddit
Hoodfu@reddit
Conscious_Cut_6144@reddit (OP)
Hoodfu@reddit
Conscious_Cut_6144@reddit (OP)
__JockY__@reddit
Conscious_Cut_6144@reddit (OP)
__JockY__@reddit
Conscious_Cut_6144@reddit (OP)
segmond@reddit
Conscious_Cut_6144@reddit (OP)
celsowm@reddit
Expensive-Apricot-25@reddit
Distinct-Target7503@reddit
Conscious_Cut_6144@reddit (OP)
yoracale@reddit
AuthorCritical2895@reddit
Conscious_Cut_6144@reddit (OP)
b3081a@reddit
AuthorCritical2895@reddit
TheRealGentlefox@reddit
Emport1@reddit
maddogawl@reddit
yoracale@reddit
MatterMean5176@reddit