Intel Lunar Lake 258V (32GB) vs Qwen 3.6 35B-A3B: Pushing the limits of MoP architecture.
Posted by PLCinsa@reddit | LocalLLaMA | View on Reddit | 4 comments
Hardware: Intel Core Ultra 7 258V, 32GB Unified Memory.
Model: Qwen 3.6 35B A3B (Quant: Q3_K_S) via LM Studio.
Symptoms: Coil whine (audible buzz), TDR (screen flickering), thermal errors after extended Reasoning sessions.
Issues: At 10k context, the model starts generating gibberish. Even after switching back to Gemma 4 26B, the stability issues persist until a full power cycle.
Question: Has anyone found a way to stabilize the iGPU (Arc 140V) for MoE models with high context, or is this a physical limitation of the 32GB shared memory?
Zidrewndacht@reddit
SYCL was terribly buggy when I tried it in Iris Xe (different architecture, sure, but I wouldn't count on this being strictly an "arc 140V issue"). Have you tried Vulkan? At least in Iris Xe, Vulkan was stable and actually faster.
PLCinsa@reddit (OP)
Plot twist! I just double-checked my settings and realized I was actually running the Vulkan backend (v2.13) this whole time. My apologies for the confusion earlier!
DerDave@reddit
What OS, what driver?
PLCinsa@reddit (OP)
Sure, here are the details to help narrow this down:
Symptoms: Physical buzzing/coil whine from the SoC area followed by "belebuble" (gibberish) text output.
Is it possible that the SYCL backend is mismanaging VRAM allocation on the Lunar Lake's MoP architecture at this context size?