Using PaddleOCR-VL-1.5 with llama-server for book OCR

Posted by Final-Frosting7742@reddit | LocalLLaMA | View on Reddit | 19 comments

I've been running PaddleOCR-VL-1.5 via llama.cpp's server for OCR on book pages. It handles complex layouts, tables, and mixed text/figure pages surprisingly well.

Setup:
- Model: PaddleOCR-VL-1.5-GGUF + mmproj.gguf
- Backend: llama-server (Vulkan on Windows)
- Pipeline: layout detection → region OCR → Markdown with HTML tables

The pipeline can process an entire folder of page photos end-to-end. You can basically digitalise a book with a single command.

Repo: https://github.com/akmalayari/ocr-book

Has anyone else experimented with vision-language models for OCR?

[-]

Mkengine@reddit

There are so many OCR / document understanding models out there, here is my personal OCR list I try to keep up to date:

GOT-OCR:

https://huggingface.co/stepfun-ai/GOT-OCR2_0

granite:

https://huggingface.co/ibm-granite/granite-docling-258M

https://huggingface.co/ibm-granite/granite-4.0-3b-vision

MinerU:

https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B https://huggingface.co/opendatalab/MinerU-Diffusion-V1-0320-2.5B

OCRFlux:

https://huggingface.co/ChatDOC/OCRFlux-3B

MonkeyOCR-pro:

1.2B: https://huggingface.co/echo840/MonkeyOCR-pro-1.2B

3B: https://huggingface.co/echo840/MonkeyOCR-pro-3B

RolmOCR:

https://huggingface.co/reducto/RolmOCR

Nanonets OCR:

https://huggingface.co/nanonets/Nanonets-OCR2-3B

dots OCR:

https://huggingface.co/rednote-hilab/dots.ocr https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5 https://huggingface.co/rednote-hilab/dots.mocr

olmocr 2:

https://huggingface.co/allenai/olmOCR-2-7B-1025

Light-On-OCR:

https://huggingface.co/lightonai/LightOnOCR-2-1B

Chandra:

https://huggingface.co/datalab-to/chandra-ocr-2

Jina vlm:

https://huggingface.co/jinaai/jina-vlm

HunyuanOCR:

https://huggingface.co/tencent/HunyuanOCR

bytedance Dolphin 2:

https://huggingface.co/ByteDance/Dolphin-v2

PaddleOCR-VL:

https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

Deepseek OCR 2:

https://huggingface.co/deepseek-ai/DeepSeek-OCR-2

GLM OCR:

https://huggingface.co/zai-org/GLM-OCR

Nemotron OCR:

https://huggingface.co/nvidia/nemotron-ocr-v2

Qianfan-OCR:

https://huggingface.co/baidu/Qianfan-OCR

Falcon-OCR:

https://huggingface.co/tiiuae/Falcon-OCR

FireRed-OCR:

https://huggingface.co/FireRedTeam/FireRed-OCR

Typhoon-OCR:

https://huggingface.co/typhoon-ai/typhoon-ocr1.5-2b

Churro-3B:

https://huggingface.co/stanford-oval/churro-3B

Using PaddleOCR-VL-1.5 with llama-server for book OCR

76vangel@reddit

Final-Frosting7742@reddit (OP)

76vangel@reddit

Arkenstonish@reddit

HareMayor@reddit

Final-Frosting7742@reddit (OP)

HareMayor@reddit

Final-Frosting7742@reddit (OP)

Final-Frosting7742@reddit (OP)

ganonfirehouse420@reddit

Final-Frosting7742@reddit (OP)

ganonfirehouse420@reddit

ready_to_fuck_yeahh@reddit

Final-Frosting7742@reddit (OP)

ready_to_fuck_yeahh@reddit

Final-Frosting7742@reddit (OP)

Mkengine@reddit

Service-Kitchen@reddit

Final-Frosting7742@reddit (OP)