Product Updates

3 mins

Build Document Processing Pipelines with Workflows

October 23, 2025

We're excited to announce Workflows, now available in beta. With Workflows, you can chain together multiple document processing steps - parse, extract, segment, and conditional logic - into reusable templates that you can apply to any file or project.

With workflows, you can define and execute templates like these:

  • Parse → Extract: run extraction right after parsing your document
  • Segment → Conditional Parse: treat each section with its own parsing rules
  • Parse → Quality Check → Re-Parse → Extract: validate output quality first, and if it doesn’t meet your threshold, re-run parsing in high-accuracy mode before extraction

Right now, accomplishing this means duct taping a lot of code that you have to maintain. It's tedious orchestration work that gets in the way of actually processing your documents.

Workflows streamline all of this for you so you can set and forget with peace of mind.

How it works

You define a workflow once as a template. Each step specifies what operation to run (marker_parse, marker_extract, marker_segment) and which previous steps it depends on. Then you execute that workflow with one file or potentially hundreds. Workflows automatically chain the operations together and processes everything in parallel.

Here's a simple example that parses an invoice and extracts structured data:

First, we’ll create a workflow:

import requests

# Create the workflow template
workflow = {
    "steps": [
        {
            "step_key": "marker_parse",
            "unique_name": "parse",
            "settings": {"max_pages": 10}
        },
        {
            "step_key": "marker_extract",
            "unique_name": "extract",
            "depends_on": ["parse"],
            "settings": {
                "page_schema": {
                    "invoice_number": "string",
                    "vendor_name": "string",
                    "total_amount": "number",
                    "line_items": [{
                        "description": "string",
                        "quantity": "number",
                        "unit_price": "number"
                    }]
                }
            }
        }
    ]
}

response = requests.post(
    "https://www.datalab.to/api/v1/workflows",
    headers={"X-API-Key": "your_api_key"},
    json=workflow
)

workflow_id = response.json()["workflow_id"]

Then, we’ll execute it with one or more files:


# Execute it with your files
execution = requests.post(
    f"https://www.datalab.to/api/v1/workflows/{workflow_id}/execute",
    headers={"X-API-Key": "your_api_key"},
    json={"input_config": {
	    "type": "multi_file",
	    "file_urls": [...] # Pass one or more files in
    }}
)

execution_id = execution.json()["execution_id"]

To check your execution status and retrieve results:

workflow_exec_result_url = "https://www.datalab.to/api/v1/workflows/executions/<EXECUTION_ID>"
headers = {"X-API-Key": "<API_KEY>"}

response = requests.get(
	workflow_exec_result_url,
	headers=headers
)

# You will get either the status
# or results keyed by each step's unique_name
# for each successful result, it'll include a presigned URL for the output

Workflows also support complex conditional logic, for example: re-parse low-quality documents with OCR, route invoices above a threshold to detailed extraction, or skip processing for empty pages.

We prepared a full conditional routing example here.

What’s next?

There’s a ton more coming, from letting you define your own Collections to making it easy for you to extend Workflows for complex evaluation tasks. Soon, you’ll be able to take a batch of files, run parse / extract with different combinations of settings, and with possibly with different providers, to seamlessly find your optimal processing path.

If any of this excites you, reach out to us and join our closed beta!!

Try it out

Workflows are in beta and currently free to try (you still pay for the underlying Marker API requests). Get started by signing up, try out the tutorial here, and write to us anytime at [email protected] if you have questions or feedback.

Table of contents: