🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 72 comments

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!
🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens! 🔧 Powered by: • Dual Chunk Attention (DCA) – A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence. • MInference – Sparse attention that cuts overhead by focusing on key token interactions 💡 These innovations boost both generation quality and inference speed, delivering up to 3× faster performance on near-1M token sequences. ✅ Fully compatible with vLLM and SGLang for efficient deployment. 📄 See the update model cards for how to enable this feature. https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507 https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507 https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507 https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507 https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507 https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507 https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507 https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Thinking-2507