# Chandra
OCR model from [Datalab](https://www.datalab.to/) that handles complex documents — handwriting, tables, math equations, and messy forms. 40+ languages, layout-aware output.
## Install
```bash
uv pip install chandra-ocr
```
## Quick Start
**CLI:**
```bash
chandra input.pdf ./output --method hf # local (HuggingFace)
chandra input.pdf ./output --method vllm # production (vLLM server)
chandra_app # interactive web UI
```
**Python:**
```python
from chandra.model import InferenceManager
from chandra.input import load_pdf_images
manager = InferenceManager(method="hf")
images = load_pdf_images("document.pdf")
results = manager.generate(images)
print(results[0].markdown)
```
## Capabilities
- **Handwriting** — cursive, messy print, doctor notes
- **Tables** — merged cells, financial filings, invoices
- **Math** — inline and block equations as LaTeX
- **Forms** — checkboxes, radio buttons, field values
- **Complex layouts** — multi-column, newspapers, textbooks
## Output Formats
Markdown, HTML (with bounding boxes), or JSON with layout metadata.
## License
Apache 2.0 (code). Model weights: free for research, personal use, and startups under $2M.
---
See also: [[Whisper]] (speech-to-text)