Optimizing MiniMax 2.7 - Experts vs Layers for best VRAM/RAM utilization

Posted by CBHawk@reddit | LocalLLaMA | View on Reddit | 6 comments

I'm curious if there is a rule of thumb regarding how to best load Minimax given varying amounts of VRAM/RAM configurations. Is there a way to estimate how many experts versus layers to offload for individuals running either 16GB/24GB/32GB/48GB VRAM? Can you get performance gains by only activating 1 expert with 24GB of vram then offloading x number of layers?

Please forgive my ignorance if I'm thinking about this the wrong way.