[Advice needed] Foreign language extraction using Qwen

Posted by Ok_Television_9000@reddit | LocalLLaMA | View on Reddit | 7 comments

For an image like below, would it be possible to extract its vendor name and translate into english? The constraint is a small VRAM(16GB). I have tried using Qwen3VL-8B on 4bit quant

Gemini2.5 pro works, but i need it to be able to work locally.

Any advice and tips would be useful

[-]

Chromix_@reddit

I've tested with Qwen3 VL 30B A3B Instruct which allows for good inference speed despite partial offload, as well as the dense Qwen3 VL 32B model. Their outputs are surprisingly different.

Provide a full transcription. Maintain the original format.
Translate to English afterwards.

30B:

Taxi Fare
Tamar Haddad
Company 37540044
22224 Madini
4303526 Marshi Shii
37540044 Nehaga
1893 Rashboni Tamar
0.000 Payment
13:16 Fare Adjustment
13:16 25-05-25
-----------
Additional Charges at Time
180.0

32B:

Newsstand Branch
Ramat HaSharon Directory
37540044: Tel. No.
22224: ID No.
4303526: License No.
Ramat HaSharon Directory: Tel. No.
37540044: Tel. No.
Total Invoice
1893 Pharmacist
0.000 km Distance
13:16 Assistance Station
13:16 25-05-25
——————
Discount Tag
180.0 Amount

At least one of them seems quite mistaken.

[-]

Ok_Television_9000@reddit (OP)

Thanks for this. Are you running on any quantisation?

[-]

Chromix_@reddit

Q4_K_XL to Q6_K_XL for the models with full quality BF16 mmproj. Results fluctuate quite a bit.

[-]

Ok_Television_9000@reddit (OP)

llamma.cpp?

[-]

Chromix_@reddit

Exactly. The 32B model got me excellent results in cases where DeepSeek OCR, MiniCPM, PaddleOCR and others failed. Doesn't seem to be universal though.

[-]

Ok_Television_9000@reddit (OP)

How much VRAM do you have? Able to share the command used?

[-]

This should maximize the usage of your 16 GB VRAM: llama-server -m Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf --mmproj Qwen3-VL-30B-A3B-Instruct-mmproj_BF16.gguf -ngl -1 -fa on -c 4000 --jinja --n-cpu-moe 14

Adjust the "14" upwards in case your VRAM limit is exceeded.