GRaPE 2 Model Family

Posted by SweaterDog_YT@reddit | LocalLLaMA | View on Reddit | 16 comments

Today I announce the first two models I am posting on here! First off, hello all of r/LocalLLaMA, nice to join. But I would love to show off the General Reasoning Agent for Project Exploration, dubbed as GRaPE. GRaPE is on the second generation, and has two models

GRaPE Mini
GRaPE Flash

These models are 5B and 9B respectively, and support 6 thinking modes to allocate budgets, so you don't get overthinking like in the Qwen3.5 models. All of which is detailed in the Huggingface repo at the end of this post. I have generally found medium / low is the sweet spot, but minimal exists if you cannot bear thinking at all.

GRaPE 2 was trained with lots and lots of examples of being an agent, so code agent, browser agent, etc; And the models has decent coding performance!

Huge thanks to r/unsloth for making GRaPE 2 possible.

https://huggingface.co/SL-AI/GRaPE-2-Mini

https://huggingface.co/SL-AI/GRaPE-2-Flash

[-]

logic_prevails@reddit

I would give my life for pakistan

[-]

No-Pineapple-6656@reddit

Bro, we're inside a goat

[-]

Chromix_@reddit

There are currently no benchmarks comparing this to the Qwen models these finetunes are based on, but it's written that it's being worked on. It'd be especially interesting to not just have each model in each benchmark once, but once for each of the 6 reasoning settings to see the token vs score trade-off

For the system prompt:

The user is ALWAYS right.

That will contribute to this model getting a rather low (or high, depending on how you see it) score on SpiralBench.

[-]

SweaterDog_YT@reddit (OP)

I haven't completed the bench suite yet for the models, I figured it wouldn't be a gigantic deal breaker since they're small models. If you didn't like them it's not like you just downloaded a 50GB package.

[-]

Chromix_@reddit

It's not about whether or not I like them, but what their benefit is in practice. You introduced fine-grained reasoning levels. That sounds useful. Yet there are many things that sound useful.

So, if you show a diagram with average reasoning tokens and score per benchmark of the base mode, and the same for the fine-tuned models for each reasoning setting, then one would clearly see the effect, the trade-off from using each of those settings and better know when to use what.

[-]