Minimax M2.7 its Dense or MoE ?
Posted by Different_Stuff_9344@reddit | LocalLLaMA | View on Reddit | 7 comments
https://huggingface.co/MiniMaxAI/MiniMax-M2.7
Anyone know this model Dense or Moe?
I checked it out hf.co the card and their official post, but I didn't see the number of model parameters in the text anywhere (on hf.co was written 229b), and the number of active ones is probably dense?
nabeelkh5@reddit
Is any one working on its Reap version?
DeepOrangeSky@reddit
If it was 229b dense it would probably be so strong that it could figure out how to time travel. But it would be so slow that it would take centuries for the prefill. But it would know how to time travel by the end of it, so, it could just go back in time and give you the answer, so, it would end up being pretty fast, actually.
Excellent_Produce146@reddit
https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/config.json#L3
MiniMaxM2ForCausalLM - still a MoE architecture like 2.5, 2.1
https://huggingface.co/docs/transformers/model_doc/minimax_m2
Different_Stuff_9344@reddit (OP)
thanks!
Few_Painter_5588@reddit
MoE. 230B parameters, 10B active
jacek2023@reddit
MiniMax is MoE
PassionIll6170@reddit
Moe