Item detail

Tilde Research Wall Attention

Open-source attention variant with per-channel, per-timestep multiplicative decay, adapted from linear-RNN diagonal forget gates. Models trained on 4K context generalize without further training to 200K+ tokens. RoPE-free. Ships with reference Triton kernels.

Score7.0
Popularity35.0
Riskmedium
TierSilver
Score breakdown
Usefulness7.0
Novelty5.8
Momentum3.5
Maturity5.9
Open-source/build7.4
Evidence7.2
Workflow potential7.0
Setup ease6.5

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for researchers and teams training their own long-context open-weight models, especially if RoPE-based length generalization has been a bottleneck. Treat the 4K-to-200K claim as a single-data-point until independently reproduced.

Who should use it

BuildersPower users

Who should skip it

Skip or sandbox it if you cannot review permissions, data access, and failure modes before use.

Risk explanation

Medium risk: use sandboxing, least privilege, and explicit review before connecting sensitive data or accounts.

Evidence links

Closest alternatives / related signals