Company Updates
3 mins
December 9, 2025
Folks shopping around for OCR and document intelligence solutions have two primary questions:
Benchmarks help answer #1 by narrowing the field (our newest model, Chandra, tops independent benchmarks), but they have their limitations and don't give you qualitative insight.
Documents are incredibly diverse. Some are downright frightening. You still need to figure out how well models work on your layouts, languages, weird scribbles, low-resolution scans, forms, gnarly tables, and that terrible low-lighting photo someone took of an incredibly important document.

We'd like to help you navigate your purchase confidently. To that end, we're announcing two things today:
We've already generated evals for multiple prospects and aim to allow users to do this in a self-serve way across leading open models and competing proprietary services. Stay tuned.
The benchmark compares:
accurate, balanced, and fast (set the mode parameter to these in our /marker endpoint)For the Datalab output, we used the same API our API customers use. We ran all other models on H100s in Modal using vllm and normalized output from each of the models.
We used a ~8K document sample comprised of:
We classified documents in the sample into the categories you see at the top of https://www.datalab.to/benchmark. With the output, for each class, we:
These are the scores you see on the left of each of the pages.
Finally, we've curated a set of documents for you to examine output for in each class. This way users can view each document alongside rendered or raw output from every model.
Struggling to eval model performance on your documents? Explore our benchmark pages and if you're interested similar output for your documents, talk to sales here.