Item detail

defilantech/LLMKube

LLMKube is an Apache-2.0 Kubernetes operator for self-hosted LLM inference that manages llama.cpp, vLLM, TGI, and mlx-server deployments, with autoscaling and air-gapped cluster positioning for teams that want more than a single-node local model setup.

Score8.2
Popularity13.0
Riskconditional
TierSilver
Score breakdown
Usefulness8.0
Novelty7.0
Momentum5.0
Maturity6.2
Open-source/build8.4
Evidence7.2
Workflow potential8.6
Setup ease4.2

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for platform teams that need repeatable self-hosted inference across different backends or hardware: pilot it in a test namespace first, then measure whether the operator model is simpler than the custom charts, scripts, or one-off manifests you already maintain.

Who should use it

platform engineersself-hosting teamsKubernetes operatorscompanies building private AI infrastructure

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

It orchestrates model-serving workloads inside your cluster and can pull model artifacts across multiple backends, so review cluster permissions, storage, network policy, and model-license constraints before production use..

Evidence links

Closest alternatives / related signals

kubernetesllm-servingself-hostedvllmllama-cpp