Is dynamic moe models possible?

Posted by CurrentNew1039@reddit | LocalLLaMA | View on Reddit | 7 comments

is it possible that a moe model can decide how many billion parameters to activate per token according to the task. eg if qwen 3.6 35b a3b - if a task is harder, it can activate 10b per token, if its easy it can stay in 3 b active.
i know there is a speed caveat there, like it will slow down if it execeeds my computers compute.

but what if we can control how much parameters active ourselves, like 35 b model with dynamic moe, means i can make it a dense model by activating all parameters, or make it moe by reducing the active parameters,

its just a theory i thought, it will help larger parameter model to run on all devices by manually adjusting it that would be awesome