Item detail

huawei-csl/KVarN

KVarN huawei-csl/KVarN is an Apache-2.0 native vLLM attention backend for KV-cache quantization — engineered so that quantization buys you capacity *and* throughput, not capacity at the cost of speed; on Qwen3-32B (AIME25, 16K-context burst, TP=2) it matches FP16 accuracy while delivering ~4x the KV-cache capacity and ~1.3x the throughput of FP16, with the vLLM TurboQuant blog's quoted 40-52% thro

Score8.5
Popularity413.0
Risklow
TierGold
Score breakdown
Usefulness8.9
Novelty10.0
Momentum10.0
Maturity9.1
Open-source/build7.4
Evidence7.2
Workflow potential9.2
Setup ease6.5

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for ML/AI infrastructure teams running vLLM in production who need longer context (multi-agent memory, large RAG contexts, 100K+ token agents) without giving up throughput or accuracy, agent builders whose coding agents or RAG pipelines blow past the FP16 KV-cache budget and need a one-flag drop-in to recover, vLLM operators who evaluated TurboQuant but found the 40-52% throughput drop disq

Who should use it

BuildersPower users

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

Risk label needs manual review.

Evidence links

Closest alternatives / related signals