Score breakdown
Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.
Why it matters
Useful for AI engineering teams that need a high-accuracy PDF / Office document parser to feed RAG pipelines, agent memory layers, or LLM context windows: MinerU is the Apache-2.0 (with a commercial-use threshold of 100M MAU / $20M MRR, well above typical open-source commercial deployments) high-accuracy document parser with VLM + OCR dual engine, 109-language OCR, and cross-page table merging tha
Who should use it
Who should skip it
Skip opendatalab/MinerU if the source repository or demo is inactive, unmaintained, or no longer matches the description shown here.
About this signal
opendatalab/MinerU is tracked by RepoRadar as a high-accuracy pdf / office docum in the Apache-2.0 (with commercial-use threshold at 100 section. It was first seen on 2026-06-25 and last updated on 2026-06-25. The current verdict is 'try now' with a Gold tier and easy setup difficulty. The standout signals for opendatalab/MinerU are workflow potential (9.7) and maturity (9.2), while novelty (7.0) trails — that balance shapes where it fits best. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.
How this item is evaluated
RepoRadar assigned opendatalab/MinerU a composite score of 8.6 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 69086.0 and never affects the composite score or tier. The risk label of 'conditional' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.
Putting this into practice? Read How to evaluate an AI tool before you adopt it for the checklist behind this score.
Risk explanation
**Apache-2.0 with a commercial-use threshold of 100M MAU / $20M MRR (consolidated across affiliates).** The LICENSE.md requires a separate commercial license from the MinerU Team if the user and their affiliates exceed either 100M monthly active users or $20M monthly revenue; the threshold is documented explicitly and is well above typical open-source commercial deployments, but organizations approaching that scale must contact the MinerU Team before continuing deployment. The license also requires attribution ('If you provide online services to third parties based on MinerU, you must clearly and prominently indicate, in the relevant product or service interface or in publicly available documentation, that MinerU is used') — online services that skip the attribution will have their license terminate automatically.; **The hosted version at mineru.net is a paid service.** The README links to mineru.net as a zero-install / desktop-client path; that hosted version is a separate commercial product, not the Apache-2.0 self-hosted version. Self-hosted deployment under Apache-2.0 is the canonical path for the open-source codebase.; **The `vlm-engine` and `hybrid-engine` modes require a VLM model.** The `pipeline` mode runs on CPU or GPU with no external dependencies, but the higher-accuracy `vlm-engine` and `hybrid-engine` modes need a VLM model (vLLM / LMDeploy / mlx); the model size and hardware requirements are documented per mode and the user should confirm the deployment environment matches before turning on the higher-accuracy modes..
