Intel Lunar Lake 258V (32GB) vs Qwen 3.6 35B-A3B: Pushing the limits of MoP architecture.

Posted by PLCinsa@reddit | LocalLLaMA | View on Reddit | 4 comments

Hardware: Intel Core Ultra 7 258V, 32GB Unified Memory.

Model: Qwen 3.6 35B A3B (Quant: Q3_K_S) via LM Studio.

Symptoms: Coil whine (audible buzz), TDR (screen flickering), thermal errors after extended Reasoning sessions.

Issues: At 10k context, the model starts generating gibberish. Even after switching back to Gemma 4 26B, the stability issues persist until a full power cycle.

Question: Has anyone found a way to stabilize the iGPU (Arc 140V) for MoE models with high context, or is this a physical limitation of the 32GB shared memory?

[-]

Zidrewndacht@reddit

SYCL was terribly buggy when I tried it in Iris Xe (different architecture, sure, but I wouldn't count on this being strictly an "arc 140V issue"). Have you tried Vulkan? At least in Iris Xe, Vulkan was stable and actually faster.

[-]

PLCinsa@reddit (OP)

Plot twist! I just double-checked my settings and realized I was actually running the Vulkan backend (v2.13) this whole time. My apologies for the confusion earlier!

[-]

DerDave@reddit

What OS, what driver?

[-]

PLCinsa@reddit (OP)

Sure, here are the details to help narrow this down:

OS: Windows 11 Home
Driver: Intel Arc Graphics v32.0.101.8629
Software: LM Studio (latest version)
Backend: SYCL
Model: bartowski : Qwen Qwen3.6 35B A3B GGUF Q3_K_S and after that google/gemma-4-26b-a4b Q4_K_M
Context trigger: Issue starts consistently at \~10,000 tokens.

Symptoms: Physical buzzing/coil whine from the SoC area followed by "belebuble" (gibberish) text output.

Is it possible that the SYCL backend is mismanaging VRAM allocation on the Lunar Lake's MoP architecture at this context size?