home›Document Processing›

NuExtract3: How an Open-Weight Model Revolutionizes Document Data Extraction

Discover NuExtract3, the Apache-2.0 licensed model from Numind that transforms messy, visually structured documents into clean, machine-readable Markdown or JSON, running locally on your hardware.

May 26, 2026

#Dev Tools #LLM #OCR #Open Source #Privacy

Explore NuExtract3, an open-weight, local-first model built on Qwen3.5-4B that efficiently extracts structured data from invoices, forms, and reports. Learn how it outperforms traditional OCR with robust table handling and offers immediate developer utility through diverse quantization formats for consumer hardware.

A Universal Document Decoder

NuExtract3 is an open-weight model that reads visually structured documents — think scanned invoices, PDF forms, receipts, or multi‑column reports — and converts them into clean, machine‑readable formats. It takes in an image or a screenshot and can output either Markdown (with tables described in HTML) or JSON that follows a template you supply. Released under the Apache‑2.0 license by Numind, it succeeds the earlier NuMarkdown model and targets anyone who needs to extract structured data from messy, layout‑heavy pages. As a “local‑first” tool, it can run on your own hardware, avoiding cloud costs and privacy concerns. Its design goal is straightforward: replace brittle, closed‑source OCR pipelines with a single model that understands both text and layout.

The Genome of NuExtract3

Under the hood, NuExtract3 is built on Qwen3.5‑4B, a 4‑billion‑parameter vision‑language model. Training took just three days on a single node of eight NVIDIA H100 GPUs, with a deliberate focus on maximizing the context length so that long documents can be processed. For Markdown conversion, the team recommends page‑by‑page processing to keep speed high and enable parallelization. The model accepts both text prompts and visual inputs — PDF pages, screenshots, forms — and can generate outputs in two shapes: Markdown that may embed HTML table code, or structured JSON following a user‑defined schema. The 4‑billion‑parameter size strikes a balance between capability and efficiency, letting the model run even on consumer hardware when quantized versions are used.

Close-up of a weathered, unremarkable gray stone held in a palm under soft, overcast light, its cracked surface opening slightly to reveal a luminous golden core of intricate, crystalline lattice structures streaming upward like silent data — subtle glimmers of code-like geometry, Markdown characters and tiny JSON braces coalescing from dust motes. Moody, cinematic, shallow depth of field, the transformation is quiet and unflashy, evoking practical magic hidden in plain sight.

“The Exact Kind of Boring Model Release”

A community member described NuExtract3 as “the exact kind of boring model release that ends up being useful.” That remark captures its quiet ambition. There is no flashy demo page; instead, there are immediate, practical assets: safetensors weights, GGUF quantizations galore (GPTQ, W8A8, FP8, Q4, Q6 and more), and even MLX weights for Apple Silicon. With a floor of just 4 GB VRAM, the smallest quantized versions bring document extraction to modest laptops. Day‑one availability of these formats drew appreciation because it lets developers plug the model directly into local pipelines with tools like vLLM, SGLang or llama.cpp. Boring, perhaps — but for anyone who has battled complex extraction tasks, this is the kind of quiet release that quietly becomes indispensable.

Tables That Don’t Crumble

Tables in scanned documents are notoriously fragile: a single missing pipe character in Markdown can collapse an entire structure. NuExtract3 sidesteps that problem elegantly by using HTML‑inside‑Markdown for tables. This approach preserves every merged cell, every multi‑line header, and every intricate alignment exactly as it appears on the page. One tester wrote that it was the first model they tried that handled complex table extraction out of the box without any post‑processing fixes — outperforming dedicated OCR engines like Paddle and GLM. The HTML table acts as a sturdy scaffold; instead of trying to flatten a table into a sparse grid, the model captures the true layout and lets downstream tools render it faithfully. For pipelines that feed into databases or knowledge bases, this fidelity saves hours of manual repair.

Questions the Community Is Asking

Enthusiasm has sparked a wave of practical questions that remain open. Can it handle multi‑column layouts, sidebars, footnotes, and handwriting? How does it perform on academic papers and digital newspapers? Does it hallucinate values for missing JSON keys, or does it reliably return null? Chinese OCR on burned‑in video subtitles and scanned forms with typed‑plus‑handwritten annotations are known pain points that have not yet been answered publicly. Comparisons with dedicated tools like MinerU or Docling, and the potential to replace web‑page scraper libraries like trafilatura, were also raised. Several users saw immediate business use: one imagined a service that converts physical forms into digital databases, selling the feature to companies like ClickUp or Monday.com. The conversation reveals a community eager to map the model’s boundaries and turn it into a building block for real‑world workflows.

How to Run It Yourself

Deploying NuExtract3 is designed to be low‑friction. Weights come in safetensors format, as well as a broad selection of GGUF quantizations and MLX weights. The minimum VRAM requirement is 4 GB thanks to aggressive quantization, making it feasible on entry‑level GPUs. Tested inference engines include vLLM, SGLang, and llama.cpp; using --load-format safetensors with vLLM speeds up loading of multi‑shard checkpoints by 4–7×. One quirk: if vLLM struggles with the Qwen3.5 VLM weight prefix, stripping the model.language_model.* prefix from the safetensors file or removing the mrope_section_size key from config.json resolves the issue. Official Ollama support is absent at launch — the maintainers cite reservations about Ollama’s chat template engine — though community interest is high and a future port seems likely.

What’s Next

Numind has submitted a paper on NuExtract3 to a peer‑reviewed venue; it’s not yet on arXiv. In the meantime, you can explore the model immediately through several official channels: a blog post detailing the release, the Hugging Face model card, a collection of related resources, and a free online demo that requires no sign‑up. A Discord server exists for deeper discussion. The combination of an open license, low hardware barrier, and robust table handling positions NuExtract3 as a serious candidate for anyone building document understanding pipelines — from researchers to SaaS founders. As the community stress‑tests it against edge cases, the answers to those open questions will show just how far this “boring” model can go.

A Universal Document Decoder

The Genome of NuExtract3

“The Exact Kind of Boring Model Release”

Tables That Don’t Crumble

Questions the Community Is Asking

How to Run It Yourself

What’s Next

GTIG AI Threat Tracker: Adversaries Weaponize AI for Cyber Attacks

Fast Byte Latent Transformer: Efficient Byte-Level Generation via Diffusion and Speculation

Interaction Models: Real-Time Human-AI Collaboration at Scale

2026 Agentic Coding Trends: The Era of AI Collaboration

Grok Skills: Reusable Instruction Sets for Task Automation