Using PaddleOCR-VL-1.5 with llama-server for book OCR

Posted by Final-Frosting7742@reddit | LocalLLaMA | View on Reddit | 19 comments

Using PaddleOCR-VL-1.5 with llama-server for book OCR

I've been running PaddleOCR-VL-1.5 via llama.cpp's server for OCR on book pages. It handles complex layouts, tables, and mixed text/figure pages surprisingly well.

Setup:
- Model: PaddleOCR-VL-1.5-GGUF + mmproj.gguf
- Backend: llama-server (Vulkan on Windows)
- Pipeline: layout detection → region OCR → Markdown with HTML tables

The pipeline can process an entire folder of page photos end-to-end. You can basically digitalise a book with a single command.

Repo: https://github.com/akmalayari/ocr-book

Has anyone else experimented with vision-language models for OCR?