vLLM v0.21.0 Release: The "Omni-Architecture" Update

Version 0.21.0 introduces significant performance breakthroughs for next-generation multimodal models, specifically focusing on the Gemma 4 and Qwen 3.6 ecosystems.

🚀 Google Gemma 4 Optimizations

Gemma 4 support is now native, moving beyond the generic GemmaForCausalLM implementation:

🚀 Alibaba Qwen 3.6 Enhancements

Qwen 3.6 (including MoE and Coder variants) receives several specialized kernels:

🛠️ General Core Changes

For the full list of 350+ PRs, visit the official vLLM GitHub repository.