OCR Tools
AllenAI
olmOCR – Open-Source OCR for Accurate Document Conversion
olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, and handwriting.
https://olmocr.allenai.org/blog

stepfun-ai/GOT-OCR2_0 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/stepfun-ai/GOT-OCR2_0
OCR attention head
Unlike general retrieval heads, specialized for text recognition in images
arxiv.org
https://arxiv.org/pdf/2505.15865
Large Dataset
nvidia/Llama-Nemotron-VLM-Dataset-v1 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/nvidia/Llama-Nemotron-VLM-Dataset-v1

Seonglae Cho