This sample shows how Datalab transforms an invoice PDF into structured, machine-readable formats.

Markdown Output

Provides a clean, human-readable version of the invoice that preserves layout, tables, and hierarchy. Ideal for displaying parsed documents directly in dashboards, chat interfaces, or internal tools where you want to see the data as text while keeping it easy to review or share.

JSON output (with Citations)

  • Delivers a structured schema containing key fields like customer_name, invoice_number, payment_due_terms, total, etc — each mapped to precise regions of the source document. Every extracted value includes citations — coordinates or bounding boxes pointing back to the exact place in the original PDF where the data was found.
  • These outputs can be generated using Datalab’s automatically inferred schema, which intelligently detects common entities, or through a custom schema you define for your specific use case. This flexibility lets teams extract exactly what they need — from general-purpose invoice fields to highly domain-specific metadata — while maintaining full traceability and confidence in the results.

By extracting both versions, you get the best of both worlds: Markdown for interpretability and presentation, JSON for programmatic use and integration.