Xiaomi MiMo - MiMo-7B-RL

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 18 comments

[https://huggingface.co/XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL) **Short Summary by Qwen3-30B-A3B:** This work introduces *MiMo-7B*, a series of reasoning-focused language models trained from scratch, demonstrating that small models can achieve exceptional mathematical and code reasoning capabilities, even outperforming larger 32B models. Key innovations include: * **Pre-training optimizations**: Enhanced data pipelines, multi-dimensional filtering, and a three-stage data mixture (25T tokens) with *Multiple-Token Prediction* for improved reasoning. * **Post-training techniques**: Curated 130K math/code problems with rule-based rewards, a difficulty-driven code reward for sparse tasks, and data re-sampling to stabilize RL training. * **RL infrastructure**: A *Seamless Rollout Engine* accelerates training/validation by 2.29×/1.96×, paired with robust inference support. MiMo-7B-RL matches OpenAI’s o1-mini on reasoning tasks, with all models (base, SFT, RL) open-sourced to advance the community’s development of powerful reasoning LLMs. https://preview.redd.it/rhbeynh1awxe1.png?width=714&format=png&auto=webp&s=78ac27cfa4b73b3fcc1cb591f7a1a7b314700ec2