Item detail

lyogavin/airllm

AirLLM lyogavin/airllm is an Apache-2.0 open-source Python library that runs 70B-class large language models in inference on a single 4GB GPU (and 405B Llama-3.1 on 8GB VRAM) without quantization, distillation, or pruning by streaming model layers through a unified-memory-style scheduler, with first-class support for Llama 3 / 3.1, Qwen 2.5, Mixtral, ChatGLM, Baichuan, Mistral, and InternLM and a

Score8.3
Popularity20871.0
Risklow
TierGold
Score breakdown
Usefulness8.7
Novelty10.0
Momentum10.0
Maturity9.0
Open-source/build7.4
Evidence7.2
Workflow potential9.0
Setup ease6.5

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for AI engineers, researchers, and local-AI tinkerers who want to run 70B or 405B-class LLMs in inference on a single consumer GPU (4GB for 70B, 8GB for 405B Llama 3.1) without quantization, distillation, or pruning, because AirLLM lyogavin/airllm ships a layer-streaming scheduler that loads one transformer layer at a time and overlaps layer prefetch with compute, which means a developer wi

Who should use it

BuildersPower users

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

Risk label needs manual review.

Evidence links

Closest alternatives / related signals