Turbo-OCR Update: Layout Model + Multilingual

Posted by Civil-Image5411@reddit | LocalLLaMA | View on Reddit | 9 comments

Follow-up to my post 18 days ago about the C++/CUDA OCR server. Two additions:

What's New:

Layout model: Added PP-StructureV3 for layout detection
Multilingual: No longer Latin-only. Now supports Chinese, Japanese, Korean, Cyrillic, Arabic, and Latin-script languages.

Same stack: C++, TensorRT FP16, multi-stream, gRPC/HTTP, direct pdf endpoint.

Benchmarks (Linux / RTX 5090 / CUDA 13.2):

Very text-heavy images: 100+ img/s
Sparse/Low-text: 1,000+ img/s
270p/s on FUNSD Dataset

Source: github.com/aiptimizer/TurboOCR

[-]

hainesk@reddit

Requires nvidia driver 595+? Cuda 13.2? Is there support for 12.8?

[-]

Civil-Image5411@reddit (OP)

You can change docker/Dockerfile.gpu line 11 and pick a base image with CUDA 12.8 from NVIDIA's NGC release notes: https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/

CUDA 12.8 is the first version that supports sm_120 (Blackwell), so the existing CMake architectures list works as-is and that one swap is enough. If you want CUDA below 12.8, more changes are needed, see issue #4.

If you build natively instead of via Docker, you also need to relax the gate in scripts/install_native.sh line 34 and add a CUDA 12.8 to TensorRT mapping row around line 85, same pattern as in the comment on issue #4.

I'll try to get hold of an older GPU and build it for older CUDA versions.

[-]

Powerful_Ad8150@reddit

It looks great. I have a DGX Spark. Will it work?

[-]

Limp_Classroom_2645@reddit

made it run on a shitty ass 3050 6GB

[-]

Civil-Image5411@reddit (OP)

GPU-wise it should work Blackwell is supported. But I'm not sure about the CPU side I haven't tested it on DGX Spark and the current Docker image is x86 only. Would need an arm64 build.

[-]

Civil-Image5411@reddit (OP)

Yes, it should work. It uses CUDA 13, although it won’t be as fast. I benchmarked it on an 5090.

[-]

Limp_Classroom_2645@reddit

curl -X POST http://127.0.0.1:8000/ocr/pdf \ -F "file=@document.pdf" {"error":"Backend unavailable"}%

[-]

Civil-Image5411@reddit (OP)

First start compiles the TRT engines it can take a few minutes. Try curl http://127.0.0.1:8000/health if that also fails, wait a minute and retry.

[-]

Limp_Classroom_2645@reddit

thanks, you were right just needed some time to compile