Considering two Sparks for local coding

Posted by chikengunya@reddit | LocalLLaMA | View on Reddit | 28 comments

I'm currently running a 4x RTX 3090 system (96GB VRAM, DDR4 2133 RAM) and have tested opencode and pi.dev using Qwen3.5-122B-A10B (AWQ) up to 200k context for web app coding (html/js/python). I'm now seriously considering picking up two Sparks paired with MiniMax M2.7 for local inference.

Two units are needed to keep prompt processing at acceptable speeds. Output tokens/sec stays the same regardless (\~15 tok/s at \~100k context, based on what I've seen here). Combined 2 * 128 GB = 256 GB VRAM leaves headroom for future models (next MiniMax version, Qwen3.6-122B).

Idle power draw: \~50 W per Spark measured at the wall. My 4x 3090 rig idles at \~130 W (all cards power-limited to 275 W, 22W idle per card in nvidia-smi; under full load with the 122B model it peaks at \~750 W).

I need context up to \~120k tokens for coding sessions. Based on the numbers above, two sparks with MiniMax M2.7 should deliver acceptable speeds in that range which would be enough for me.

I can't properly benchmark MiniMax M2.7 on my current setup, 96 GB VRAM isn't enough to load it comfortably, and the slow DDR4 2133 RAM makes prompt processing a bottleneck anyway.

I'm curious what your experience is. How much better is MiniMax M2.7 than Qwen3.5-122B-A10B (AWQ) for real-world coding tasks (HTML/JS/Python)? Thanks in advance.