It's making a mess in LM Studio, and I've tried a bunch of different settings, which is weird because it's not the same at all on hugging face testing page.
Reminds me when I was a university student and I trained a neural network to determine if a number was prime or not. I was so excited when I saw 90%+ accuracy on millions of samples, then i realised it learned to just guess "not prime" for every number lol
the sleeper spec is 131k context on a 1.08B model, with only ~680M non-embedding params. that makes it more interesting as a local tool router than a chat model: cheap enough to sit in front of bigger models, long enough to carry repo/docs context, and enable_thinking=false gives you the fast path when you only need JSON/tool args.
koloved@reddit
I feels quite dull for the benchmarks it shows. Any larger model can already be used on the processor, which will give significantly better results.
bidutree@reddit
Model is available at Ollama for those who want to try it there.
alloxrinfo@reddit
It's making a mess in LM Studio, and I've tried a bunch of different settings, which is weird because it's not the same at all on hugging face testing page.
alloxrinfo@reddit
The MLX ones were working a bit better
Few_Water_1457@reddit
😃
coder543@reddit
Looking at the accuracy chart, it appears that the model refused to answer every question, so... that's something!
kevin_1994@reddit
Reminds me when I was a university student and I trained a neural network to determine if a number was prime or not. I was so excited when I saw 90%+ accuracy on millions of samples, then i realised it learned to just guess "not prime" for every number lol
MuDotGen@reddit
Maybe it was trained on contrarian works and found a loophole in every question's motives for being asked?
10minOfNamingMyAcc@reddit
AGI right here!
psylenced@reddit
Just invert the answer!
1337Captain@reddit
It's over fitted
Interpause@reddit
99% hallucination rate seems truly useful for RNG
coder543@reddit
This is 99% non-hallucination, not 99% hallucination.
sterby92@reddit
Did anyone get tool calling to work with llama.cpp and openwebui? For me it spits out broken, half finished toolcalls.
And1mon@reddit
yeah something seems of. Cannot enable thinking as well.
Prize_Negotiation66@reddit
what is the best quant for such models?
Healthy-Nebula-3603@reddit
So small :)
DaleCooperHS@reddit
Whats worng with it being small! MAybe it has other qualities, ... maybe is funny , and romantic, and caring. Nothing wrong with being small OK!
Healthy-Nebula-3603@reddit
Hehe
DigiDecode_@reddit
🤯🤯🤯
jake_that_dude@reddit
the sleeper spec is
131kcontext on a 1.08B model, with only ~680M non-embedding params. that makes it more interesting as a local tool router than a chat model: cheap enough to sit in front of bigger models, long enough to carry repo/docs context, andenable_thinking=falsegives you the fast path when you only need JSON/tool args.