Minimax M2.7 its Dense or MoE ?

Posted by Different_Stuff_9344@reddit | LocalLLaMA | View on Reddit | 7 comments

https://huggingface.co/MiniMaxAI/MiniMax-M2.7

Anyone know this model Dense or Moe?

I checked it out hf.co the card and their official post, but I didn't see the number of model parameters in the text anywhere (on hf.co was written 229b), and the number of active ones is probably dense?

[-]

nabeelkh5@reddit

Is any one working on its Reap version?

[-]

DeepOrangeSky@reddit

If it was 229b dense it would probably be so strong that it could figure out how to time travel. But it would be so slow that it would take centuries for the prefill. But it would know how to time travel by the end of it, so, it could just go back in time and give you the answer, so, it would end up being pretty fast, actually.

[-]

Excellent_Produce146@reddit

https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/config.json#L3

MiniMaxM2ForCausalLM - still a MoE architecture like 2.5, 2.1

https://huggingface.co/docs/transformers/model_doc/minimax_m2

[-]

Different_Stuff_9344@reddit (OP)

thanks!

[-]