Marker
Marker is a PDF to Markdown converter that recognizes tables, OCRs equations, and re-OCRs bad pdf text. Marker has 8000+ stars on Github, benchmarks well against other similar tools, and is used by hundreds of organizations.
Marker is a PDF to Markdown converter that recognizes tables, OCRs equations, and re-OCRs bad pdf text. Marker has 8000+ stars on Github, benchmarks well against other similar tools, and is used by hundreds of organizations.
Here's what marker can do.
Marker identifies tables and converts them to Github-flavored markdown.
Marker will automatically OCR documents that don't have OCR text.
Equations will be identified and OCRed automatically.
Images and Figures will be identified and saved along with the markdown output.
Marker is 4x faster than nougat, and can be parallelized easily.
Marker will work with any language.