How does gemma3:4b-it-qat fare against OpenAI models on MMLU-Pro benchmark? Try for yourself in Excel
Posted by Kapperfar@reddit | LocalLLaMA | View on Reddit | 26 comments
I made an Excel add-in that lets you run a prompt on thousands of rows of tasks. Might be useful for some of you to quickly benchmark new models when they come out. In the video I ran gemma3:4b-it-qat, gpt-4.1-mini, and o4-mini on a (admittedly tiny) subset of the MMLU Pro benchmark. I think I understand now why OpenAI didn't include MMLU Pro in their gpt-4.1-mini announcement blog post :D
To try for yourself, clone the git repo at https://github.com/getcellm/cellm/, build with Visual Studio, and run the installer Cellm-AddIn-Release-x64.msi in src\Cellm.Installers\bin\x64\Release\en-US.
--Tintin@reddit
Is there a macOS alternative with the use of local LLMs?
Kapperfar@reddit (OP)
Not that I am aware of, unfortunately. Say it also worked on macOS, what would you have used it for? Benchmarking models or something else?
--Tintin@reddit
I’ve once used a closed product with closed LLMs in excel. I indeed use it to ease some tasks which would otherwise be hard to solve. Say you have full address data in a cell and you just need the city name. =LLM(A1,“Only extract the city name“). Quite handy. But I stopped because using it because on the closed manner of the process.
Kapperfar@reddit (OP)
What do you mean closed manner? That it is difficult to know how LLMs make decisions? Or the product was closed? If so, how was the product closed and how could it have been better?
--Tintin@reddit
Yes, sorry. I was a little unclear. I just didn’t liked that the LLM was some OpenAI model at that time and I wanted to use local models instead due to costs and privacy reasons.
Kapperfar@reddit (OP)
Ok, for sure, that makes sense. Did you ever find a way to use local models?
--Tintin@reddit
No, unfortunately not.
Kapperfar@reddit (OP)
Ok, well, now you have, this tool supports local models.
--Tintin@reddit
Sure, but only on windows. And I run macOS.
Kapperfar@reddit (OP)
Oh, yeah you mentioned that, I forgot. There is also gptforwork.com which I think supports mac
asdfghvj@reddit
Whether we need api key to run this or can run locally?
zeth0s@reddit
Appreciate the effort, but there's no way I open excel unless I am paid very well. Even if paid, I would most likely use python to export a csv...
Kapperfar@reddit (OP)
Because you don’t like Excel or because it is easier for you to quickly make a script?
zeth0s@reddit
Because excel is good as a spreadsheet, but sheets are extremely difficult to maintain when complex logic and code is added.
I unfortunately had my fair share of how excel is used in the real world, until I decided to make it clear that I don't work with excel.
Kapperfar@reddit (OP)
Yeah, and we haven’t even talked about version control yet. But what real world use made you go “never again”?
zeth0s@reddit
Almost all times I had to use it in industry... As soon as I see a if/else or vlookup, I get scared.
Local_Artichoke_7134@reddit
is it the performance you hate? or uncertainty of data outputs?
zeth0s@reddit
That is a spreadsheet used to do basic scientific computing/applied statistics. Literally everywhere. Spreadsheet are supposed to a handy calculator replacement with basic data entry and visualization features.
People use it for building features of real complex applications, and they then complain that it doesn't work. Or worst expect you to deal with it. It is impossibile to manage.
It's a fault of the software, that allows too much, while being too fragile.
I am happy that many people feel empowered by so many features, as long as they give me the data, I won't touch their sheets
Kapperfar@reddit (OP)
😆
YearZero@reddit
I have excel doing this natively without any addons. Just ask a large model to give you VBA code that gives you an excel function which takes in any text as a prompt or a cell reference as a prompt. Host the model on llamacpp and tell the large model the API endpoint. It works exactly like yours using VBA that's part of excel, no need for an addon.
Kapperfar@reddit (OP)
Oh, that is very clever. What do you use it for?
YearZero@reddit
Same as you actually, benchmarks lol. I use it for SimpleQA at the moment actually, it’s just so easy without having to work with python etc as everything stays in excel.
But I’m sure if I ever had a messy list of things in excel that needed some data extraction it will come in handy.
TheRealMasonMac@reddit
Now I wonder if it's possible to store an LLM as a spreadsheet file...
SkyFeistyLlama8@reddit
Somebody made GPT2 in an Excel file.
Crafty-Struggle7810@reddit
This looks like something teachers would use to grade student responses.
Kapperfar@reddit (OP)
Link to Excel sheet: https://docs.google.com/spreadsheets/d/1E0u2Ise7_Ne93Lm49SgQbxaVUeysMJlE/export?format=xlsx