A Universal Document Decoder
NuExtract3 is an open-weight model that reads visually structured documents â think scanned invoices, PDF forms, receipts, or multiâcolumn reports â and converts them into clean, machineâreadable formats. It takes in an image or a screenshot and can output either Markdown (with tables described in HTML) or JSON that follows a template you supply. Released under the Apacheâ2.0 license by Numind, it succeeds the earlier NuMarkdown model and targets anyone who needs to extract structured data from messy, layoutâheavy pages. As a âlocalâfirstâ tool, it can run on your own hardware, avoiding cloud costs and privacy concerns. Its design goal is straightforward: replace brittle, closedâsource OCR pipelines with a single model that understands both text and layout.
The Genome of NuExtract3
Under the hood, NuExtract3 is built on Qwen3.5â4B, a 4âbillionâparameter visionâlanguage model. Training took just three days on a single node of eight NVIDIA H100 GPUs, with a deliberate focus on maximizing the context length so that long documents can be processed. For Markdown conversion, the team recommends pageâbyâpage processing to keep speed high and enable parallelization. The model accepts both text prompts and visual inputs â PDF pages, screenshots, forms â and can generate outputs in two shapes: Markdown that may embed HTML table code, or structured JSON following a userâdefined schema. The 4âbillionâparameter size strikes a balance between capability and efficiency, letting the model run even on consumer hardware when quantized versions are used.

âThe Exact Kind of Boring Model Releaseâ
A community member described NuExtract3 as âthe exact kind of boring model release that ends up being useful.â That remark captures its quiet ambition. There is no flashy demo page; instead, there are immediate, practical assets: safetensors weights, GGUF quantizations galore (GPTQ, W8A8, FP8, Q4, Q6 and more), and even MLX weights for Apple Silicon. With a floor of just 4 GB VRAM, the smallest quantized versions bring document extraction to modest laptops. Dayâone availability of these formats drew appreciation because it lets developers plug the model directly into local pipelines with tools like vLLM, SGLang or llama.cpp. Boring, perhaps â but for anyone who has battled complex extraction tasks, this is the kind of quiet release that quietly becomes indispensable.
Tables That Donât Crumble
Tables in scanned documents are notoriously fragile: a single missing pipe character in Markdown can collapse an entire structure. NuExtract3 sidesteps that problem elegantly by using HTMLâinsideâMarkdown for tables. This approach preserves every merged cell, every multiâline header, and every intricate alignment exactly as it appears on the page. One tester wrote that it was the first model they tried that handled complex table extraction out of the box without any postâprocessing fixes â outperforming dedicated OCR engines like Paddle and GLM. The HTML table acts as a sturdy scaffold; instead of trying to flatten a table into a sparse grid, the model captures the true layout and lets downstream tools render it faithfully. For pipelines that feed into databases or knowledge bases, this fidelity saves hours of manual repair.
Questions the Community Is Asking
Enthusiasm has sparked a wave of practical questions that remain open. Can it handle multiâcolumn layouts, sidebars, footnotes, and handwriting? How does it perform on academic papers and digital newspapers? Does it hallucinate values for missing JSON keys, or does it reliably return null? Chinese OCR on burnedâin video subtitles and scanned forms with typedâplusâhandwritten annotations are known pain points that have not yet been answered publicly. Comparisons with dedicated tools like MinerU or Docling, and the potential to replace webâpage scraper libraries like trafilatura, were also raised. Several users saw immediate business use: one imagined a service that converts physical forms into digital databases, selling the feature to companies like ClickUp or Monday.com. The conversation reveals a community eager to map the modelâs boundaries and turn it into a building block for realâworld workflows.
How to Run It Yourself
Deploying NuExtract3 is designed to be lowâfriction.
Weights come in safetensors format, as well as a broad selection of GGUF quantizations and MLX weights.
The minimum VRAM requirement is 4 GB thanks to aggressive quantization, making it feasible on entryâlevel GPUs.
Tested inference engines include vLLM, SGLang, and llama.cpp; using --load-format safetensors with vLLM speeds up loading of multiâshard checkpoints by 4â7Ă.
One quirk: if vLLM struggles with the Qwen3.5 VLM weight prefix, stripping the model.language_model.* prefix from the safetensors file or removing the mrope_section_size key from config.json resolves the issue.
Official Ollama support is absent at launch â the maintainers cite reservations about Ollamaâs chat template engine â though community interest is high and a future port seems likely.
Whatâs Next
Numind has submitted a paper on NuExtract3 to a peerâreviewed venue; itâs not yet on arXiv. In the meantime, you can explore the model immediately through several official channels: a blog post detailing the release, the Hugging Face model card, a collection of related resources, and a free online demo that requires no signâup. A Discord server exists for deeper discussion. The combination of an open license, low hardware barrier, and robust table handling positions NuExtract3 as a serious candidate for anyone building document understanding pipelines â from researchers to SaaS founders. As the community stressâtests it against edge cases, the answers to those open questions will show just how far this âboringâ model can go.





