Item detail

guoqingbao/xinfer

guoqingbao/xinfer is an MIT-licensed pure-Rust LLM inference engine with zero Python or PyTorch dependencies, exposing native Flash Attention, FlashInfer kernels, CUDA Graphs, continuous batching, prefix caching, and PD disaggregation behind a portable C ABI and a small C++/Rust runtime.

Score8.6
Popularity7.4
Riskconditional
TierGold
Score breakdown
Usefulness9.0
Novelty9.0
Momentum7.0
Maturity6.8
Open-source/build8.4
Evidence7.2
Workflow potential9.7
Setup ease6.4

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for platform and infra engineers shipping self-hosted LLM serving on constrained, Python-free, or edge environments where pulling in PyTorch and the Python toolchain is operationally painful.

Who should use it

platform and infra engineers running self-hosted LLM inferenceRust/C++/Go teams that need to embed LLM serving without Pythonedge or constrained environments where PyTorch is too heavyengineers standardizing on a portable C ABI for LLM runtimes

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

It loads and executes model weights and GPU kernels on your hardware, so verify the trust chain of any downloaded checkpoint, pin model hashes, and sandbox the runtime before exposing it on shared infrastructure.

Evidence links

Closest alternatives / related signals

llm-inferencerustgpupytorch-freeservingopen-source