hanxiao/searchbox - RepoRadar

Score7.5

Popularity39.0

Risklow

TierSilver

Score breakdown

Usefulness7.0

Novelty8.0

Momentum6.0

Maturity6.3

Open-source/build8.4

Evidence7.2

Workflow potential8.6

Setup ease6.4

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for AI research and engineering teams evaluating test-time-compute (TTC) scaling for closed-corpus QA and search, particularly teams that want to explore the 'is grep really all you need' question against a denser retriever in a controlled environment: searchbox is the MIT airgapped closed-corpus QA loop by Han Xiao (creator of ChromaDB) with a local Qwen3.6-35B-A3B in a minimal Pi harness;

Who should use it

AI research and engineering teams evaluating test-time-compute (TTC) scaling for closed-corpus QA and search, particularly teams that want to explore the 'is grep really all you need' question against a denser retriever in a controlled environment: searchbox is the MIT airgapped closed-corpus QA loop by Han Xiao (creator of ChromaDB) with a local Qwen3.6-35B-A3B in a minimal Pi harnessAI research groups that want to study agentic-search model preferences (which tool does the agent reach for — grep, embeddings, rerankers, bash?) in a controlled environmentRAG research groups that want to evaluate whether scaling test-time compute via token budget forcing gives better answers on the hard questions (the force-budget OFF default is the single-turn probe, the force-budget ON mode is the full TTC eval)Engineering teams that need an airgapped QA loop (the design is intentionally airgapped — the model is locked to the dataroom and the harness never lets it cheat with web information, which is the right setup for the team's research questions about TTC scaling)Organizations that want a public evaluation surface (the live demo at hanxiao.io/searchbox is the public evaluation surface — submit a prompt, drop in a `.zip` dataroom, set a turn budget, watch the run)Teams that want a built-in default corpus (the README ships `jina-corpus.zip` as the default dataroom, so the project's first eval pass is one command)Research groups that want to read the underlying design questions (the README's three research questions are the right starting points for any follow-up research)Engineering teams that want to integrate with the companion ecosystem (hanxiao/dataroom crawls a corpus into a zip, hanxiao/knowledge-graph-extractor extracts entity relations and walks the longest path to find non-trivial questions — together they form a closed-corpus research stack)AI safety researchers evaluating agentic-search model preferences in a closed environment (the airgapped design prevents web-cheating and forces the model to use the dataroom only, which is the right setup for honest TTC scaling experiments)Engineering teams evaluating closed-corpus QA accuracy (the `run_meta.json` records stop reason, turns, per-turn token breakdown, tool calls, and config — the validation surface the team needs to compare configurations)

Who should skip it

Consider hanxiao/searchbox lower priority if you already have a working solution in this category.

About this signal

hanxiao/searchbox is tracked by RepoRadar as a airgapped closed-corpus qa loop in the MIT airgapped closed-corpus QA loop by Han Xiao section. It was first seen on 2026-06-25 and last updated on 2026-06-25. The current verdict is 'try now' with a Silver tier and moderate setup difficulty. Across RepoRadar's eight signals, hanxiao/searchbox is strongest on workflow potential (8.6) and open-source/build quality (8.4) and weakest on momentum (6.0) — a profile worth weighing against your own priorities. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.

How this item is evaluated

RepoRadar assigned hanxiao/searchbox a composite score of 7.5 out of 10, placing it in the Silver tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 39.0 and never affects the composite score or tier. The risk label of 'low' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.

Risk explanation

**39 stars and an early research testbed — the project's framing is research, not production.** Searchbox is at 39 stars with last push 2026-06-25 and is positioned as a research testbed for TTC scaling, with the three research questions (model preferences, is grep all you need, TTC scaling) as the framing. Treat the project as a research surface, not a production closed-corpus QA framework. The README is explicit that the harness is a testbed; the live demo at hanxiao.io/searchbox is the public evaluation surface, not a production SLA-backed service; **Airgapped design is intentional but limits the eval to a closed corpus.** The design is intentionally airgapped — the model is locked to the dataroom and the harness never lets it cheat with web information. This is the right setup for honest TTC scaling experiments, but it means the eval surface is limited to closed-corpus QA, not open-domain retrieval or live web search. For open-domain retrieval, point the team's existing RAG stack at searchbox's eval methodology (the `run_meta.json` shape is the durable contribution, not the closed-corpus harness itself); **Qwen3.6-35B-A3B is a specific model choice; the eval results are model-conditional.** The local model is `Qwen3.6-35B-A3B`, and the eval results (model preferences, is grep all you need, TTC scaling) are conditional on this model. For a different model choice, run the team's preferred local model and reproduce the eval on the team's hardware. The research question framing is model-agnostic, the concrete eval results are not.

Evidence links

github.com

Closest alternatives / related signals

searchboxhanxiaohanxiao-searchboxchromachroma-creatorchromadbclosed-corpusclosed-corpus-qa