stepfun-ai/Step3-VL-10B · Hugging Face
Posted by TKGaming_11@reddit | LocalLLaMA | View on Reddit | 29 comments
Posted by TKGaming_11@reddit | LocalLLaMA | View on Reddit | 29 comments
Chromix_@reddit
That's quite a step up compared to the larger models. Unfortunately there's no llama.cpp support yet, but given the model size it should run somewhat OK as-is with transformers on a 24 GB VRAM GPU.
bronkape_@reddit
We're very keen on adding llama.cpp support, but our small team is currently at full capacity. We're aiming for early February. We highly encourage community contributions and would love to collaborate with anyone interested in leading this effort!
beneath_steel_sky@reddit
Merged and available in the latest binary https://github.com/ggml-org/llama.cpp/pull/21287
Jazzlike-Result-2330@reddit
They've already created a gguf file that can be used in lmstudio
ZealousidealBadger47@reddit
Any link. I used https://huggingface.co/seanbailey518/Step3-VL-10B-GGUF , it is not working in LMstudio
McVitas@reddit
is there a quantized version of this one?
LegacyRemaster@reddit
Tested on rtx 6000 96gb. Very very very slow.
10 tokens/sec. Not bad for a 8k video card!
C:\llm>python teststep.py
CUDA available: True
GPU name: NVIDIA RTX PRO 6000 Blackwell Workstation Edition
Total GPU memory: 95.59 GB
Torchvision version: 0.25.0.dev20260115+cu128
vidibuzz@reddit
Something looks very fishy there. Not worth installing if performance is that bad.
AvocadoArray@reddit
There’s no way, those are CPU numbers for a 10B model. Or is there something about this model architecture that makes inference slow?
LegacyRemaster@reddit
100% Gpu
Loskas2025@reddit
I read "GPU memory used"...
RnRau@reddit
What inference engines support this one?
bronkape_@reddit
vllm https://github.com/vllm-project/vllm/pull/32329
FullOf_Bad_Ideas@reddit
One of the first VLMs, if not the first one, to use Meta's PE as a vision encoder.
__Maximum__@reddit
So the catch is more inference time and VRAM for context? It's actually not a bad trade-off if it scales. There are many problems for which I am willing to wait if the quality of the answer is better.
SlowFail2433@reddit
Yes test-time compute is usually a fairly decent trade-off TBH
SlowFail2433@reddit
Parallel Coordinated Reasoning (PaCoRe) is the main novelty I think. Also uses Perception Encoder from Meta which is strong
Alpacaaea@reddit
Is it really that hard to make a not horrible graph?
kaisurniwurer@reddit
Seeing as your post is "controversial" I assume there is a lot of personal preference in play here.
I like this one, to me it's more readable than colors while highlighting the model in question.
Top_Necessary7623@reddit
vllm
TheRealMasonMac@reddit
This actually looks like a good graph though. It doesn't distort the relative difference and it's easy to tell which model is which.
foldl-li@reddit
This is terrible. It drove me crazy when reading it. I don't know why, and my brain just felt hard to extract any information from it.
Alpacaaea@reddit
I meant more that the other models are all grey
silenceimpaired@reddit
Grey with patterns… at a glance you can see how this model compares against all other models… and with a closer look you can compare against a specific model. Sure they could have added more colors but then you have to hunt and peck for the model being compared and it would look a. Little garish.
Alpacaaea@reddit
I'd rather it be easy to read and accurate than look nice. More colors would make it easier to see which line is which model.
silenceimpaired@reddit
A fair counterpoint. :)
And1mon@reddit
Looks promising, but I bet real life performance looks very different. Has anyone tried it yet?
lisploli@reddit
Wow, step bro, your vertical bar is huge!
Takashi728@reddit
Bars