[Advice needed] Foreign language extraction using Qwen
Posted by Ok_Television_9000@reddit | LocalLLaMA | View on Reddit | 7 comments
For an image like below, would it be possible to extract its vendor name and translate into english? The constraint is a small VRAM(16GB). I have tried using Qwen3VL-8B on 4bit quant
Gemini2.5 pro works, but i need it to be able to work locally.
Any advice and tips would be useful

Chromix_@reddit
I've tested with Qwen3 VL 30B A3B Instruct which allows for good inference speed despite partial offload, as well as the dense Qwen3 VL 32B model. Their outputs are surprisingly different.
30B:
32B:
At least one of them seems quite mistaken.
Ok_Television_9000@reddit (OP)
Thanks for this. Are you running on any quantisation?
Chromix_@reddit
Q4_K_XL to Q6_K_XL for the models with full quality BF16 mmproj. Results fluctuate quite a bit.
Ok_Television_9000@reddit (OP)
llamma.cpp?
Chromix_@reddit
Exactly. The 32B model got me excellent results in cases where DeepSeek OCR, MiniCPM, PaddleOCR and others failed. Doesn't seem to be universal though.
Ok_Television_9000@reddit (OP)
How much VRAM do you have? Able to share the command used?
Chromix_@reddit
This should maximize the usage of your 16 GB VRAM:
llama-server -m Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf --mmproj Qwen3-VL-30B-A3B-Instruct-mmproj_BF16.gguf -ngl -1 -fa on -c 4000 --jinja --n-cpu-moe 14Adjust the "14" upwards in case your VRAM limit is exceeded.