model: support step3-vl-10b by forforever73 · Pull Request #21287 · ggml-org/llama.cpp
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 7 comments
STEP3-VL-10B is a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. Despite its compact 10B parameter footprint, STEP3-VL-10B excels in visual perception, complex reasoning, and human-centric alignment. It consistently outperforms models under the 10B scale and rivals or surpasses significantly larger open-weights models (10×–20× its size), such as GLM-4.6V (106B-A12B), Qwen3-VL-Thinking (235B-A22B), and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL.
Kahvana@reddit
It has been merged and released!
https://github.com/ggml-org/llama.cpp/releases/tag/b8705
Local-Cartoonist3723@reddit
Any comparisons done w the new 3.5 27b from Qwen? This is an exciting model based off these charts.
Skyline34rGt@reddit
Its 3 months old model. Not that interesting now.
coder543@reddit
The interesting aspect is that this PR is probably happening now to lay the groundwork for Step 3.6 Flash
Skyline34rGt@reddit
Well, that will be interesting. Indeed
Local-Cartoonist3723@reddit
Fair, I was still impressed at the benchmarks.
jacek2023@reddit (OP)