Product Updates

3 minutes

The Datalab SDK: Transform Document Processing from Hours to Minutes

July 9, 2025

We’re excited to introduce the Datalab SDK (beta), a Python library that makes it even easier to work with Datalab’s document intelligence tools.

The SDK wraps our production APIs: the easiest way to use our leading open-weight document intelligence models, Marker and Surya.

Sign up for our private beta to get started!

Parse Documents with One Command

Process massive document libraries from the command line without needing to poll or manage rate limits:

datalab marker run library/

library/
├── document.pdf
└── +document.pdf/
      ├── marker.json
      └── marker/
            ├── index.html
            ├── index.md
            └── _page_0_image_0.jpeg

Or just as easily with the SDK: 

from datalab import Datalab

client = Datalab(api_key="...") # or set with DATALAB_API_KEY
result = client.marker.run("document.pdf", output_format="markdown,html,json")
markdown, html, json = result.markdown, result.html, result.json

document = Document(response) # Superpowers

Navigate and Transform Structured Outputs

Go beyond black-box outputs. Datalab’s structured results let you explore every element of a document — from handwriting and headings to equations and tables.

With the SDK, you can easily answer complex questions like: “Out of a million forms, which ones are missing signatures?”

document = Document(response) # requires JSON output format
witness_page = [
	page for page in document.pages
	if page.find("IN WITNESS THEREOF", block_types=["Text"]
][0]

if witness_page.find(block_types=["Handwriting"]):
    print("Signed!")

You can even build your own custom renderers.

Built for Scale and Flexibility

Whether you need high-level document processing or granular API control, Datalab adapts to your workflow:

# Access endpoints directly
client.api.v1.health.fetch()

# Or use up-to-date convenience wrappers
client.marker.fetch(...) # GET /api/v1/marker
client.marker.fetch_async(...) # GET /api/v1/marker
client.marker.post(...) # POST /api/v1/makrer
client.marker.post_async(...) # POST /api/v1/marker

The Datalab SDK: fast, flexible, and built for scale

  • Reduce Processing Time: Parse documents in minutes instead of hours
  • Get clean, structured content from PDFs: Extract tables, figures, layout, and text without regex hacks
  • Scale Without Overhead: Process libraries of any size without extra infrastructure
  • Developer-First Design: A clean, native interface that fits into any workflow

We're releasing the SDK in beta to give developers early access and a chance to shape its future. If you spot any issues or have requests, reach out at [email protected].

Sign up for our private beta today.

✨Bonus: Try our public playground ✨

We also released a public playground! Whether you’re evaluating Datalab for a proof of concept or just need a quick Markdown version of a PDF, you can do that directly on our website - no subscription or login required.

This is the same powerful user-interface that we now offer directly in our web application. If you don’t have an account yet, sign up here.