PaperGuru-AI/PaperGuru-Benchmark

Score7.8

Popularity1213.0

Risklow

TierSilver

Score breakdown

Usefulness7.0

Novelty9.0

Momentum7.0

Maturity7.8

Open-source/build8.4

Evidence7.2

Workflow potential8.2

Setup ease4.2

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for researchers and engineering teams working on long-horizon LLM agents who hit the same failure modes every RAG / agent framework falls into: PaperGuru is the MIT-licensed benchmark + reproduction-package from a team with 10 peer-reviewed acceptances that formalises Lifecycle-Aware Memory (LAM) as the missing fourth AI-system primitive alongside compute, models, and retrieval; for benchma

Who should use it

Researchers and engineering teams working on long-horizon LLM agents who hit the same failure modes every RAG / agent framework falls into: PaperGuru is the MIT-licensed benchmark + reproduction-package from a team with 10 peer-reviewed acceptances that formalises Lifecycle-Aware Memory (LAM) as the missing fourth AI-system primitive alongside compute, models, and retrievalBenchmark-track work where reproducing the LAM axioms on a custom domain (a payments service, a docs site, a long-horizon software-engineering session) is the validation surface (the maintainer's pitch is that PaperBench is the canonical PaperBench-class benchmark for memory primitives)Engineering teams who want to evaluate whether their own memory layer is lifecycle-aware (the maintainer ships a 4-axiom formalisation in §3 of `paper/PaperGuru-CCM.pdf` that the team's memory layer can be checked against)Research groups adopting PaperGuru as a baseline that 'a small cheap model with a smart router + LAM delivers frontier-quality research at ~1/10th the cost' (the DRACO-style claim, validated by the maintainer's own benchmark suite)Users who want a 23-paper reproduction-package in `PaperBench/submissions/` to compare their agent's reproduction success against the published numbersSurvey-generation teams who want a 20-survey reference set in `SurveyBench/` to evaluate their own survey-generator's content + composite richness scoresAcademic authors who want a peer-reviewed track-record signal (10 acceptances across FSE 2026, ICML 2026, TOSEM, AEI, ICoGB)

Who should skip it

Skip PaperGuru-AI/PaperGuru-Benchmark for now if your priority is a tool you can use today without configuring a build pipeline or development environment.

About this signal

PaperGuru-AI/PaperGuru-Benchmark is tracked by RepoRadar as a lifecycle-aware memory benchmark in the MIT Lifecycle-Aware Memory benchmark from the Pa section. It was first seen on 2026-06-25 and last updated on 2026-06-25. The current verdict is 'worth watch' with a Silver tier and hard setup difficulty. PaperGuru-AI/PaperGuru-Benchmark leads on novelty (9.0) and open-source/build quality (8.4); its lowest signal is setup ease (4.2), so factor that in before investing setup time. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.

How this item is evaluated

RepoRadar assigned PaperGuru-AI/PaperGuru-Benchmark a composite score of 7.8 out of 10, placing it in the Silver tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 1213.0 and never affects the composite score or tier. The risk label of 'low' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.

Risk explanation

**23 PaperBench reproduction submissions are derivative works of their source papers; redistributors must consult each submission's README.** The LICENSE is MIT for the framework + survey generator, but the 23 reproduction submissions in `PaperBench/submissions/` are derivative works of their respective source papers and inherit the license of each source paper. The maintainer is explicit: 'Consult the README inside each submission before redistributing.' A team adopting the bundle should map each submission's license before redistributing the bundle — the LICENSE note is not a license grant, it's a documentation requirement; **Published numbers are the maintainer's own measurements on the maintainer's own benchmark suite.** The 65.95% mean reproduction on PaperBench, the 20/23 above the 41% human ML-PhD bar, the 94.66% content score on SurveyBench, the 43.76% composite richness — all are the maintainer's own measurements on the maintainer's own benchmark suite. Adopters who want to cite the work as a baseline should reproduce the PaperBench + SurveyBench numbers on their own compute budget before claiming the LAM mechanism delivers the published lift on their domain; **0 forks on a research repo is normal but signals a single-team codebase; track-record is the academic footprint, not the community ports.** 1213 stars, 84 forks, last push 2026-06-08 is the canonical signature of an active research release with a credible peer-reviewed track record (10 acceptances across FSE 2026 / ICML 2026 / TOSEM / AEI / ICoGB) but no community ports yet. Adopters who want a more battle-tested long-memory framework should evaluate closed-source alternatives (Claude Projects, ChatGPT Memory, Mem0) for production workloads and use PaperGuru as the open-source baseline / research canvas.

Evidence links

github.com

Closest alternatives / related signals

papergurupaperguru-benchmarkpaperguru-ailifecycle-aware-memorylammemory-primitivecapital-chunk-memoryccm