Product Updates
1 min
December 2, 2025
TL;DR; We trained a latency-optimzed small model that is 2-3x faster with minimal performance degradations.
We recently shipped a few updates to Chandra and our API (Making Chandra 3x faster and Chandra 1.1). Now, we’re excited to announce Chandra Small, a latency-optimized model found exclusively in the Datalab API.
Shortly after our launch of Chandra, we trained and deployed Chandra Small. In our testing, Chandra Small is 2-3x faster than Chandra with minimal performance degradations. Additionally, we trained Chandra Small using QAT to enable quantization and further reduce latency.

Additionally, we found that we can reduce the number of tokens needed for many pages. This led to 30% latency reductions. With Chandra Small, users can expect 2-4 pages/s on an H100.
Give Chandra Small a try in the API (via Fast mode).
We’re excited to continue to push the frontier to make Chandra as accurate and fast as possible! Reach out to [email protected] for more information or for access to a self-hosted version of Chandra Small!