Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details | TheaterFire

Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 47 comments

Last time I posted on how this model has performed in creating the webapp based on provided research paper. I got so much love to see people has appreciated the post and of-course the potential of this MOE model.

I am sharing detailed on how I used this model to create webapp just using prompt and step by step guiding it. Later I converted my guidance steps into skills using same qwen-code cli with this model, that helped to add more examples.

Here is github repo where I have added the research-webapp-skill that you all can use and validate potential of this model on different papers.

I have added examples in the repo research-webapp-skill/examples at main · statisticalplumber/research-webapp-skill

Below is the command that I use to run this model on 16GB VRAM RTX 5080 Laptop

:: Set the model path
set MODEL_PATH=C:\Users\test\.lmstudio\models\unsloth\Qwen3.5-35B-A3B-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_L.gguf

echo Starting Llama Server...
echo Model: %MODEL_PATH%

llama-server.exe -m "%MODEL_PATH%" --chat-template-kwargs "{\"enable_thinking\": false}" --jinja -fit on -c 90000 -b 4096 -b 1024 --reasoning off --presence-penalty 1.5 --repeat-penalty 1.0 --temp 0.6 --top-p 0.95 --min-p 0.0 --top-k 20  --context-shift --keep 1024

if %ERRORLEVEL% NEQ 0 (
    echo.
    echo [ERROR] Llama server exited with error code %ERRORLEVEL%
    pause
)

I have tried gemma4 26b moe, its not able to make app where qwen is keeping hold of context even at 70 80K. I tried latest jinja template of gemma4 and latest models from unsloth but still its not able to pull this task.

Again, I might be doing somewhere wrong, as I like this model too which I am using running at llama-server native UI for other tasks.

Thanks

[-]

No_Split_5652@reddit

please can you guys help me with this project: https://github.com/ChrisX101010/training-arena 🙏❤️ https://github.com/abubakarsiddik31/axiom-wiki for reference I would appreciate it.

[-]

pauloeavf@reddit

!RemindMe 2 weeks

[-]

xeeff@reddit

mind testing a specific quant (https://huggingface.co/byteshape/Qwen3.5-35B-A3B-GGUF/blob/main/Qwen3.5-35B-A3B-Q3_K_S-2.89bpw.gguf) for me and seeing how it performs in your benchmark? fits nicely within my 16gb vram and 128k context (turbo3 KV cache), and i'm wondering if it's as capable as higher quants

would appreciate you getting back to me :)

[-]

dreamai87@reddit (OP)

Sure man will do and update you as well

[-]

xeeff@reddit

remindme! 3d

[-]

enrique-byteshape@reddit

We would love to see this too!

[-]

xeeff@reddit

didn't expect to see you here, hi :p

27b would be crazy 🙏

[-]

enrique-byteshape@reddit

Maybe we are, maybe we aren't already working on it :)

[-]

xeeff@reddit

qwen3.6 35b a3b released an hour ago. what a shame it would be if your to-do list increased by one ;)

[-]

enrique-byteshape@reddit

We sometimes think the Qwen team are working against us :(

[-]

xeeff@reddit

i doubt there'll be as good of a model as qwen3.6 for quite a while, so you have time to catch up we believe in you 🙏

[-]

Defilan@reddit

Been running this on dual 5060 Ti's and yeah it punches way above its weight for a 3B active model. How are you fitting 90K context on 16GB VRAM though? That seems super tight with Q4_K_L.

[-]

dreamai87@reddit (OP)

fit on manages its expert in gpu along with cache and rest layers on cpu

[-]

Defilan@reddit

Ahhh got it. That's a good way to go about it with the hybrid offloading for MoE. Cool stuff!

[-]

henk717@reddit

For me the 27B is my favorite model currently, way better than the 35B is and unlike Gemma it writes long when I ask it to. Its just a model that gets me. If only the 3.6 wasn't a hybrid model and fixed the looping issue. The hybridness of it is the only quirk that makes it trickier to use.

[-]

dreamai87@reddit (OP)

Sure it should be as 27b is dense model. I tried today IQ3XXS Model it’s good

[-]

DanielusGamer26@reddit

Why not using the 27B UD IQ_3_XXS? i run it on RTX 5060Ti and seems more intelligent even at 3bit

I run it with this command:

`--threads 9 --ctx-size 64385 -fa 1 --jinja -ctk q8_0 -ctv q8_0 -np 1` + all the others parameters like temp, min p etc.

[-]

dreamai87@reddit (OP)

https://i.redd.it/eehqw4upsivg1.gif

here is output from qwen27b size that you have suggested, its good holding it well

[-]

dreamai87@reddit (OP)

I don’t know, it’s just a habit of using models above or at 4 bit. Will give a try. Let me know how it’s working in the skill I shared.

[-]

reddoca@reddit

!RemindMe 2 weeks

[-]

RemindMeBot@reddit

Your default time zone is set to Europe/Berlin. I will be messaging you in 14 days on 2026-04-30 09:58:47 CEST to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[-]

Life-Screen-9923@reddit

IMHO, option "context_shift" does Not work for Qwen3.5 models

[-]

dreamai87@reddit (OP)

To be honest I have setting considering common for other models. Just had it there

[-]

Most-Trainer-8876@reddit

How does it compare with Gemma 4 26B A4B?

[-]

dreamai87@reddit (OP)

I mentioned in the post at bottom. It’s good model but not good for calling multiple tools

[-]

Havage@reddit

As someone applying AI to research specifically - Thank you! Going to play with this in the morning!

[-]

dreamai87@reddit (OP)

Please check the repo where i have posted all the webapp examples that i converted using the skill

[-]

saito_zt81@reddit

Same here. It works really fast on my 3090 ti, ~100 tps. I tried Gemma 4 26B, but it's a little bit slower, but tool callings is unusable and make context windows full with failures.

[-]

dreamai87@reddit (OP)

I agree, Gemma is good but really failing in calling the tools

[-]

admajic@reddit

Why do you have -b twice? Also 4096 uses a lot of vram could be why you can't get other models to load

[-]

dreamai87@reddit (OP)

Sorry 1024 is ub -b 4096 -ub 1024

[-]

ResponsibleTruck4717@reddit

How good is it compare to the 9b?

[-]

dreamai87@reddit (OP)

9b does more mistakes compared to this model. But 9b is good too

[-]

ResponsibleTruck4717@reddit

Really? I thought the 9b out perform the 35b a3b.
Going to test it now :)

[-]

External_Dentist1928@reddit

So you use that skill within Qwen Coder Cli?

[-]

dreamai87@reddit (OP)

Yes qwen code cli with this model and shared skill.

[-]

qubridInc@reddit

Qwen3.5 35B is still insanely good for local use handles long context and real tasks way better than most models its size.

[-]

dreamai87@reddit (OP)

Yes agree, it’s always on task and follows prompt lot better.

[-]

Mir4can@reddit

Your server settings a bit mixed. Normally qwen suggest these:

Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

Is it intentional?

[-]

dreamai87@reddit (OP)

I have reasoning disabled, I have noticed this model is good even when parameter disabled.

[-]

BannedGoNext@reddit

Makes sense. Qwen 3 Coder next has no reasoning, and it's a beast at programming.

[-]

Imaginary-Unit-3267@reddit

35B is my daily driver for most tasks because of your previous post! Thank you!

[-]

dreamai87@reddit (OP)

Thank you dear 🙏

[-]

silenceimpaired@reddit

Isn’t 35b dense? That would be far better than the MoE. Did you compare against the Gemma 31b dense

[-]

floconildo@reddit

35B is MoE

[-]

iphoneverge@reddit

That looks impressive. Thanks for sharing all this info. How quick is it on your laptop with 16GB VRAM? Also if you had to compare, what commercial LLM model would you say this is closest to in terms of capability and speed? Thanks.

[-]

dreamai87@reddit (OP)

It’s 700 to 800 pp and 58 tgp