Best document parser

Posted by aiwtl@reddit | LocalLLaMA | View on Reddit | 17 comments

I am in quest of finding SOTA document parser for PDF/Docx files. I have about 100k pages with tables, text, images(with text) that I want to convert to markdown format.

What is the best open source document parser available right now? That reaches near to Azure document intelligence accruacy.

I have explored

Which one would be best to use in production?