numind/NuExtract3 · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
**NuExtract3** is a unified **4B** vision-language reasoning model for document understanding.
It combines strong **structured information extraction** with high-quality **image-to-Markdown** conversion, making it suitable for extraction pipelines, OCR, and RAG preprocessing for all types of documents such as scans, receipts, forms, invoices, contracts or tables.
# Overview
* **Structured extraction**: input (text/images) + JSON template + instructions --> JSON output
* **Markdown conversion**: input (text/images) --> Markdown
* **Multimodal inputs**: text, images, or text + images.
* **Multilingual** documents.
* **Reasoning** and non-reasoning inference modes.
* **Template generation** for structured extraction from natural language or input document.
# [](https://huggingface.co/numind/NuExtract3#benchmark-results)
GGUF, NVFP4, MLX, VLLM, etc., already there
[https://huggingface.co/models?other=base\_model:quantized:numind/NuExtract3](https://huggingface.co/models?other=base_model:quantized:numind/NuExtract3)
6 Comments
Steuern_Runter@reddit
computehungry@reddit
Steuern_Runter@reddit
computehungry@reddit
Il_Signor_Luigi@reddit
pmttyji@reddit (OP)