Item detail

kvcache-ai/Mooncake

Mooncake is an Apache-2.0 KVCache-centric disaggregated architecture for LLM serving, open-sourced by Moonshot AI as the platform that powers Kimi. It separates prefill and decode across nodes, pools GPU memory via a distributed KVCache store, and ships both a Transfer Engine (RDMA + TCP + NVMe-of) and a high-level Mooncake Store. Now integrated into vLLM, SGLang, TensorRT-LLM, and the PyTorch eco

Score8.2
Popularity88.0
Riskconditional
TierGold
Score breakdown
Usefulness8.0
Novelty8.0
Momentum9.0
Maturity7.9
Open-source/build8.4
Evidence7.2
Workflow potential9.3
Setup ease4.2

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for inference platform teams running vLLM, SGLang, or TensorRT-LLM at scale who want to share KV cache across instances, disaggregate prefill/decode, or move weights RDMA-fast during distributed training. Pull the v0.x release, follow the vLLM Mooncake Connector docs, and benchmark TTFT / ITL on your own traffic before adopting it in production.

Who should use it

inference platform teams running vLLM, SGLang, or TensorRT-LLM who want to share KV cache across instancesteams who want to disaggregate prefill and decode across GPU nodes instead of fighting for the same SMsdistributed RL training runs that need RDMA-fast weight transfer (SGLang reports 7x speedup for 1T-param weight updates using Mooncake)Moonshot / Kimi-aligned teams who want the exact platform behind one of the world's top LLM servicesresearchers studying KVCache reuse, disaggregation, or PD-disaggregation who want a real production reference implementation

Who should skip it

Skip for now if you need a low-setup, non-technical tool today.

Risk explanation

production deployment requires RDMA-capable networking (RoCE or InfiniBand) for best results; advanced platform - not a drop-in replacement for a single-node vLLM setup.

Evidence links

Closest alternatives / related signals

inferencekvcachedisaggregationprefill-decodellm-servingmoonshotkimivllm