Analyst memo

Infrastructure1 source

Moonshot AI Releases FlashKDA for Speedy Attention

Moonshot AI has open-sourced FlashKDA, a CUDA kernel for the Kimi Delta Attention mechanism, offering notable speed improvements on NVIDIA H20 GPUs.

Published May 1, 2026, 4:22 AMUpdated May 1, 2026, 4:22 AM

What happened

Moonshot AI has released FlashKDA, a powerful CUDA kernel that speeds up the Kimi Delta Attention mechanism, with notable improvements of 1.72× to 2.22× in prefill speed on NVIDIA H20 GPUs.

Why it matters

FlashKDA enhances linear attention processes, which are crucial for scaling AI models while reducing computational costs and improving efficiency during long-sequence generation.

Who is affected

AI developers and researchers utilizing NVIDIA infrastructure stand to benefit significantly from these optimizations, particularly those engaged in high-throughput inference systems.

Risks / uncertainty

As the kernel requires specific hardware and software versions, adoption may be limited by these technological prerequisites and the fixed head dimension constraint.