Turbo-OCR Update: Layout Model + Multilingual
Posted by Civil-Image5411@reddit | LocalLLaMA | View on Reddit | 9 comments
Follow-up to my post 18 days ago about the C++/CUDA OCR server. Two additions:
What's New:
- Layout model: Added PP-StructureV3 for layout detection
- Multilingual: No longer Latin-only. Now supports Chinese, Japanese, Korean, Cyrillic, Arabic, and Latin-script languages.
Same stack: C++, TensorRT FP16, multi-stream, gRPC/HTTP, direct pdf endpoint.
Benchmarks (Linux / RTX 5090 / CUDA 13.2):
- Very text-heavy images: 100+ img/s
- Sparse/Low-text: 1,000+ img/s
- 270p/s on FUNSD Dataset
Source: github.com/aiptimizer/TurboOCR
hainesk@reddit
Requires nvidia driver 595+? Cuda 13.2? Is there support for 12.8?
Civil-Image5411@reddit (OP)
You can change docker/Dockerfile.gpu line 11 and pick a base image with CUDA 12.8 from NVIDIA's NGC release notes: https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/
CUDA 12.8 is the first version that supports sm_120 (Blackwell), so the existing CMake architectures list works as-is and that one swap is enough. If you want CUDA below 12.8, more changes are needed, see issue #4.
If you build natively instead of via Docker, you also need to relax the gate in scripts/install_native.sh line 34 and add a CUDA 12.8 to TensorRT mapping row around line 85, same pattern as in the comment on issue #4.
I'll try to get hold of an older GPU and build it for older CUDA versions.
Powerful_Ad8150@reddit
It looks great. I have a DGX Spark. Will it work?
Limp_Classroom_2645@reddit
made it run on a shitty ass 3050 6GB
Civil-Image5411@reddit (OP)
GPU-wise it should work Blackwell is supported. But I'm not sure about the CPU side I haven't tested it on DGX Spark and the current Docker image is x86 only. Would need an arm64 build.
Civil-Image5411@reddit (OP)
Yes, it should work. It uses CUDA 13, although it won’t be as fast. I benchmarked it on an 5090.
Limp_Classroom_2645@reddit
curl -X POST http://127.0.0.1:8000/ocr/pdf \ -F "file=@document.pdf" {"error":"Backend unavailable"}%
Civil-Image5411@reddit (OP)
First start compiles the TRT engines it can take a few minutes. Try curl http://127.0.0.1:8000/health if that also fails, wait a minute and retry.
Limp_Classroom_2645@reddit
thanks, you were right just needed some time to compile