Product Updates
3 mins
August 14, 2025
We're announcing two new features in beta today to help you parse your PDFs just the way you want:
Our document intelligence models are state-of-the-art, but they can't read your minds or capture your preferences! And, despite our best efforts, we might never tame the vast wilderness of PDF parsing edge cases (which, incidentally, has driven our founder to near-madness).
We want to give you the ability to get your parse output just right, in any context, and we want it to be easy. That's where Forge Parse and Marker Prompt API come in. They work hand-in-hand:
Some common use cases include:
These features are in public beta and are available to all paying customers. Give them a spin and send us your document parsing preferences and edge cases! We’re keen to address all of them.
A lot of research paper preprints and conference abstracts have line numbers in the left gutter of their pages.
They're part of the document and often get parsed into blocks. We have post-processors that try to strip them but their heuristics do not capture every case.
Now you can just prompt them away.
In addition to rewriting blocks the way you want, we also handle prompts that merge artifacts across page boundaries.
A common request we get is to merge tables across pages. You can use the Marker Prompt API to do that.
Here's an example using Berkshire Hathaway's 2024 annual report, which also requests that Marker insert currency signs in every cell.
NYC's annual expense budget is a PDF that is over 800 pages long. It is littered with pages that summarize agency budgets that look like they came out of line feed printers:
There are tables and text on these pages, but how you might decide to parse this summary into blocks is pretty subjective.
My aim is to parse out a clean financial summary table. I don’t care about anything else.
Marker's output isn't ideal in this instance: it looks at this page, identifies one big table, and doesn’t compose rows/columns in a way that makes sense. Rows are merged when they shouldn’t be, the prose summaries don’t span multiple columns, etc.
We have an escape hatch in this case, and it's the use_llm
flag, which uses Marker's open-source post-processors to fix common errors.
use_llm
's output is much better (and you can use it via Datalab's API):
But what I really want is a clean table with just the financials and none of the prose summarizing agency responsibilities, etc. You can use the Marker Prompt API to do that:
Ah, much better!
Forge Parse is the best way to get started with the Marker Prompt API.
Sign up for an account and either subscribe or provide your card details to get free credits and get started.
If your prompt isn’t working as well as you’d like, read these Marker Prompt API tips.
Other links you might want to check out:
And, if you have questions, please holler at us: [email protected]