Product Updates
3 mins
December 19, 2025
Many PDFs contain clickable hyperlinks - table of contents entries that jump to sections, cross-references between pages, or external URLs to websites. When you convert these PDFs to markdown or HTML, those links are typically lost.
We’ve added a new feature that preserves hyperlinks during OCR, so your converted documents maintain their navigation structure.
The link extraction feature:
For example, a table of contents entry that links to a block on page 5 (1-indexed) in the PDF becomes an <a href="#block-4-1"> (0-indexed) link pointing to the actual content block in your HTML output.
Table of Contents Navigation: Corporate documents, technical manuals, and reports often have clickable tables of contents. With link extraction, readers can click through your converted HTML just like the original PDF.
Cross-References: Legal documents and contracts frequently reference other sections (“see Section 4.2”). These cross-references stay clickable.
Citation Links: Academic papers and reports with hyperlinked citations maintain their references to external sources.
Form Instructions: Government forms often link to instruction pages or external resources - these links are preserved.
Add extras: "extract_links" to your API request:
import requests
response = requests.post(
"https://www.datalab.to/api/v1/marker",
headers={"X-Api-Key": "YOUR_API_KEY"},
files={"file": open("document.pdf", "rb")},
data={
"output_format": "html",
"extras": "extract_links"
}
)
result = response.json()
# result["html"] contains clickable links
The response includes:- html: Your converted document with <a href="..."> tags for links and <a id="..."> anchors for link targets.
Below, we show you what extracted links look like in the HTML output.
Internal links (page jumps):
<p>
For specifications related to electrical, automation and
<a href="#block-3-5">SHE</a>
(Safety, Health & Environmental), the machine shall endorse...
</p>
Anchor targets (where internal links point):
<a id="block-3-4"></a>
<table>
<thead>
<tr>
<th>Key</th>
<th>Definition</th>
</tr>
</thead>
...
</table>
External links (URLs):
<p>
For more information, visit
<a href="https://example.com/docs">our documentation</a>.
</p>extras: "extract_links" to your requestsIf you have documents with complex linking structures and want help optimizing extraction, reach out at [email protected].