Case Studies

2 minutes

RevisionDojo (YC F24): Achieving 10x cost reduction and seamless scalability with Datalab

July 1, 2025

RevisionDojo (YC 24) is a rapidly growing edtech platform designed to offer modern solutions for students and educators to manage, process, and personalize educational content. By providing transcription and content extraction services, RevisionDojo streamlines study preparation and content review for its 310k+ user base. Since launching in 2024, the company has seen exponential growth and is pioneering a format of learning focused on quality, fun, and personalization. 

Confronting Infrastructure Pain Points

Honza Kocourek, RevisionDojo’s Founding Engineer, is responsible for ensuring the platform remains fast and reliable for reviewing and revising content from PDFs. Early on, he identified that their infrastructure and processes would eventually become a bottleneck. The team initially relied on a self-hosted, screenshot-based system that required manual capture and upload of PDF pages, which presented several challenges:

  • High Costs: Manual screenshot-based processing was resource-intensive and led to high infrastructure spend.
  • Scalability Issues: After a successful LinkedIn launch, a sudden spike in usage caused system crashes, revealing the limitations of their early infrastructure.
  • Limited Layout & Image Detection: Other tools they evaluated failed to capture visual elements and struggled with non-standard and multi-column formats.
  • Limited Model Adaptability: Popular models like Meta’s LLaMA-2 were trained on structured research papers and struggled with real-world documents submitted by their users.

Academic content from PDFs are used to generate personalized formats for studying and learning.

Unlocking Scale and Speed with Datalab

RevisionDojo adopted DataLab’s API-based PDF processing solution, resulting in:

  • 10x Cost Reduction: Switching from screenshot-based to full PDF processing with Datalab enabled a 10x cost reduction, making it possible to continue offering their product at a low pricepoint.
  • Seamless Scaling: To quote Honza, “It was amazing because within 30 minutes we had a deployment ready to go with all of the services updated, all of the links changed, and it was smooth sailing from then on. We haven’t touched the implementation since” 
  • High Throughput: The average PDF processing time is now around 10 seconds. “Before we begin checking, they’re transcribed and ready for us to use.”
  • Superior Content Handling: DataLab’s advanced image extraction and layout detection outperformed competitors, especially with real-world PDFs.

Infrastructure Planning for Expansion

RevisionDojo plans to expand beyond IB coursework to other curricula and explore new service models. To support this growth and their rapid user adoption, the company will continue their partnership with Datalab, eventually migrating to an on-premise solution. “It has allowed everything and everyone to scale in ways we hadn’t imagined. The fact that we can use Datalab without worrying about servers or maintenance is delightful.”