Item detail
github.com

opendatalab/MinerU

opendatalab/MinerU is a high-accuracy pdf / office docum in RepoRadar's Apache-2.0 (with commercial-use threshold at 100 section, holding Gold tier and a 'try now' verdict. Its strongest signal is workflow potential, scored 9.7 out of 10.

Score8.6
Popularity69086.0
Riskconditional
TierGold
Score breakdown
Usefulness9.0
Novelty7.0
Momentum9.0
Maturity9.2
Open-source/build8.4
Evidence7.2
Workflow potential9.7
Setup ease8.8

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for AI engineering teams that need a high-accuracy PDF / Office document parser to feed RAG pipelines, agent memory layers, or LLM context windows: MinerU is the Apache-2.0 (with a commercial-use threshold of 100M MAU / $20M MRR, well above typical open-source commercial deployments) high-accuracy document parser with VLM + OCR dual engine, 109-language OCR, and cross-page table merging tha

Who should use it

AI engineering teams that need a high-accuracy PDF / Office document parser to feed RAG pipelines, agent memory layers, or LLM context windows: MinerU is the Apache-2.0 (with a commercial-use threshold of 100M MAU / $20M MRR, well above typical open-source commercial deployments) high-accuracy document parser with VLM + OCR dual engine, 109-language OCR, and cross-page table merging that the alternatives get wrongAI product builders who want one MCP server for document parsing instead of separate PDF, DOCX, and PPTX integrations — MinerU is the unified document-parsing MCP server that Cursor, Claude Desktop, and Windsurf can connect to via the stdio MCP transportRAG framework users (LangChain / LlamaIndex / RAGFlow / RAG-Anything / Flowise / Dify / FastGPT) who want a native integration with a document parser that returns structured Markdown / JSON instead of raw text — MinerU's native integrations are the documented pathEngineering teams that need a Chinese AI-chip deployment (Ascend, Cambricon, Enflame, MetaX, Moore Threads, Kunlunxin, Iluvatar, Hygon, Biren, T-Head) — MinerU ships 10+ domestic AI-chip backends in the inference layer, the broadest domestic-chip support of any document parserOrganizations that need a private / fully-offline deployment (the `pipeline` mode runs on CPU or GPU, no external API calls, no hallucination, the `vlm-engine` mode supports vLLM / LMDeploy / mlx, and the `hybrid-engine` mode combines the two)Users who want a desktop client or online hosted version for one-off document parsing without deploying the full stack — the mineru.net hosted version and the desktop client are the no-install patharXiv-paper-backed research (MinerU technical report arXiv 2409.18839, MinerU2.5 report arXiv 2509.22186, MinerU2.5 Pro report arXiv 2604.04771) — the project's accuracy claims are backed by three peer-reviewed-style technical reports and a live HuggingFace / ModelScope demo, the verification surface is unusually complete for a project at this scaleEngineering teams evaluating document parsers (run MinerU on a fixed PDF set, compare accuracy / latency / cost against the alternatives, use the technical reports as the baseline) — the technical reports are the eval surface and the project's accuracy numbers are reproducible from the open-source code

Who should skip it

Skip opendatalab/MinerU if the source repository or demo is inactive, unmaintained, or no longer matches the description shown here.

About this signal

opendatalab/MinerU is tracked by RepoRadar as a high-accuracy pdf / office docum in the Apache-2.0 (with commercial-use threshold at 100 section. It was first seen on 2026-06-25 and last updated on 2026-06-25. The current verdict is 'try now' with a Gold tier and easy setup difficulty. The standout signals for opendatalab/MinerU are workflow potential (9.7) and maturity (9.2), while novelty (7.0) trails — that balance shapes where it fits best. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.

How this item is evaluated

RepoRadar assigned opendatalab/MinerU a composite score of 8.6 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 69086.0 and never affects the composite score or tier. The risk label of 'conditional' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.

Putting this into practice? Read How to evaluate an AI tool before you adopt it for the checklist behind this score.

Risk explanation

**Apache-2.0 with a commercial-use threshold of 100M MAU / $20M MRR (consolidated across affiliates).** The LICENSE.md requires a separate commercial license from the MinerU Team if the user and their affiliates exceed either 100M monthly active users or $20M monthly revenue; the threshold is documented explicitly and is well above typical open-source commercial deployments, but organizations approaching that scale must contact the MinerU Team before continuing deployment. The license also requires attribution ('If you provide online services to third parties based on MinerU, you must clearly and prominently indicate, in the relevant product or service interface or in publicly available documentation, that MinerU is used') — online services that skip the attribution will have their license terminate automatically.; **The hosted version at mineru.net is a paid service.** The README links to mineru.net as a zero-install / desktop-client path; that hosted version is a separate commercial product, not the Apache-2.0 self-hosted version. Self-hosted deployment under Apache-2.0 is the canonical path for the open-source codebase.; **The `vlm-engine` and `hybrid-engine` modes require a VLM model.** The `pipeline` mode runs on CPU or GPU with no external dependencies, but the higher-accuracy `vlm-engine` and `hybrid-engine` modes need a VLM model (vLLM / LMDeploy / mlx); the model size and hardware requirements are documented per mode and the user should confirm the deployment environment matches before turning on the higher-accuracy modes..

Evidence links

Closest alternatives / related signals

opendatalabminerUdocument-parserpdf-parserdocx-parserpptx-parserxlsx-parserimage-parser