tongxuluo/gamecraft-bench

Score7.7

Popularity7.8

Risklow

TierSilver

Score breakdown

Usefulness7.0

Novelty9.0

Momentum7.0

Maturity5.8

Open-source/build8.4

Evidence7.2

Workflow potential9.2

Setup ease6.4

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for AI research teams, agent benchmark authors, and game-AI engineers who need an Apache-2.0-licensed end-to-end agent benchmark that asks whether AI agents can build playable games in a real game engine, so they can reproduce the same scoring on any agent or model release without re-deriving the test harness.

Who should use it

AI research teams who need a reproducible end-to-end agent benchmark for long-horizon creative tasks on a real game engineagent benchmark authors who want code and data they can re-run on any agent or model release without re-deriving the harnessgame-AI engineers who need to measure whether agent loops can actually ship a playable game, not just a code snippetopen-source contributors who want an Apache-2.0 alternative to closed-source agent benchmark suites

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

It is a research benchmark that requires a real game engine environment to run, so review the engine requirements, confirm the scoring harness matches the agent or model release you want to test, and verify the reproducible data is current before publishing comparative results.

Evidence links

github.com

Closest alternatives / related signals

agent-benchmarklong-horizon-agentsgame-aireproducible-researchopen-sourceapache-2.0