Document AI - OCR Processor
Mistral Document AI API comes with a Document OCR (Optical Character Recognition) processor, powered by our latest OCR model mistral-ocr-latest, which enables you to extract text and structured content from PDF documents.

Before You Start
Key Features
- Extracts text content while maintaining document structure and hierarchy
- Preserves formatting like headers, paragraphs, lists and tables
- Returns results in markdown format for easy parsing and rendering
- Handles complex layouts including multi-column text and mixed content
- Processes documents at scale with high accuracy
- Supports multiple document formats including:
image_url: png, jpeg/jpg, avif and more...document_url: pdf, pptx, docx and more...
The OCR processor returns the extracted text content, images bboxes and metadata about the document structure, making it easy to work with the recognized content programmatically.
OCR with Images and PDFs
OCR your Documents
We provide different methods to OCR your documents. You can either OCR a PDF or an Image.
PDFs
Among the PDF methods, you can use a public available URL, a base64 encoded PDF or by uploading a PDF in our Cloud.
Be sure the URL is public and accessible by our API.
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
},
include_image_base64=True
)Images
To perform OCR on an image, you can either pass a URL to the image or directly use a Base64 encoded image.
You can perform OCR with any public available image as long as a direct url is available.
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "image_url",
"image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
},
include_image_base64=True
)Cookbooks
For more information and guides on how to make use of OCR, we have the following cookbooks: