I tested Qwen 3 235b against Deepseek r1, Qwen did better on simple tasks but r1 beats in nuance

[-]

layer4down@reddit

Also tried out the 235b_q4, q3 and q2 models today. Honestly I had no idea one could get something so clean from a q3 and even q2, much less a q4. Learned a lot today. PS I like the 32b but I LOVE the 30b!! ~50-60tps+ with MLX or GGUF?? Can’t beat that

Reply

[-]

OrdinaryAdditional91@reddit

In my coding experience, R1 is better than qwen thinking.

Reply

[-]

marhalt@reddit

Anyone heard of an uncensored fine-tune of the 235b model? I really like the 32B model and was excited to see the difference to the 235B model but I can't find an abliterated or uncensored version of it?

Reply

[-]

nomorebuttsplz@reddit

it seems pretty uncensored. What are you trying to do?

Reply

[-]

Guilty-Exchange8927@reddit

there's also no abliterated 32B model yet..

Reply

[-]

iced_oj@reddit

What about gemma 3 12b/27b? I wish more people would run tests for gemma against the Chinese lab ones.

Reply

[-]

segmond@reddit

My deepseek-UD-Q3\_K\_XL crushed 235B Q8 on coding.

Reply

[-]

FullstackSensei@reddit

Are you using the recommended settings for 235B? I haven't had time to put 235B through it's paces but using QwQ for coding and general brainstorming I had a lot of bad experiences initially until I read about the recommended settings. It's been night and day since.

Reply

[-]

segmond@reddit

yeah, I set the parameters, temp, top\_k, top\_k, min\_p according to if it's thinking or not. BTW, I'm not saying that 235B is not good, it's great. My experience is that deepseek is "smarter"

Reply

[-]

FullstackSensei@reddit

Did you also rearrange the samplers? That also has an impact. I understand what you're saying. I have non-trivial coding tasks. QwQ is the close to I've come to something useful and deepseek is too slow to be useful on either of my rigs.

Reply

[-]

CheatCodesOfLife@reddit

Would you mind sharing the exact samplers you recommend? I'm also finding R1 > Qwen3 235B but that's to be expected given it's a much heavier model. Both are too slow for coding compared with GLM4 either way, but Qwen3 is much faster.

Reply

[-]

FullstackSensei@reddit

It's linked in my discussion with segmond

Reply

[-]

segmond@reddit

What recommended format do you have to arrange the samplers? I just run the default unless someone provides a way. There are endless way to tweak the samplers.

Reply

[-]

ResearchCrafty1804@reddit

A lot of people share similar experience, and others claim the opposite. I am trying to analyse this behaviour, focusing in coding. Can you share a prompt where DeepSeek crushed (or even bested) Qwen3 235B ?

Reply

[-]

segmond@reddit

can't private code base, but doing with socket programming and threads, not just was deepseek more correct but I got about 500lines of code compared to the qwen 235b's 250+ lines. qwen wasn't incorrect, but I would need to prompt it 2-4x to get roughly the same output as deepseek gave me. Now, qwen runs much faster for me obviously than deepseek and requires less GPU, so I face the decision, do I run qwen multiple times vs deepseek once? I'm leaning towards multiple time and then faling to deepseek if stuck. heck, when I get the chance I'll try the same with the small qwen 30B, if it can get me 95% there, it makes sense to start small. Use it, if stuck go to 235B if stuck go to deepseek, if stuck then gemini pro if the data is not sensitive.

Reply

[-]

CheatCodesOfLife@reddit

> . Use it, if stuck go to 235B if stuck go to deepseek, if stuck then gemini pro if the data is not sensitive. I've got a similar process but different models. > but doing with socket programming and threads One thing I've noticed is that different models are better at different tasks. GLM4 for instruction following and html frontends, GPT4.1 for datasets, R1 for SQL, Gemini for audio work, etc

Reply

[-]

DifficultyFit1895@reddit

Have you tried the 16bit version of 235B?

Reply

[-]

giant3@reddit

How many parameters on the DeepSeek?

Reply

[-]

No-Break-7922@reddit

>For creative writing I'm still wondering who even uses a language model, which is an interpolator by design, for anything creative. Interpolation can't create, it's odd if one chooses interpolation as the method for creativity. Not sure how not to ignore any benchmarks relating to any creative work.

Reply

[-]

CheatCodesOfLife@reddit

It's not for getting the model to write a creative piece, but rather for help refining, analyzing, pacing, etc.

Reply

[-]

AppearanceHeavy6724@reddit

I do not know what kind of bellybutton lint you smoke. First of all models have great deal of randomness in them, which is a neccessity for creativity; secondly, empirically models are capable of writing very interesting and novel short stories, check eqbench.com. Thirdly, even if they were simple interpolators, filling the mundane parts of creative writing is very useful too.

Reply

[-]

No-Break-7922@reddit

Thanks. Not trying to argue, it's just my observation. I was just touching upon that while yes, the produced token is randomly sampled from a list of probable tokens, the list of probable tokens itself is the result of interpolation, which practically makes the produced token an interpolated result.

Reply

[-]

AppearanceHeavy6724@reddit

Here, an example of poetry by gemma 3 27B. Pretty dam good for an interpolator: # ``` # The Ghost in the Machine They say a poem bleeds from the heart, a tremor of soul, Wrought from experience, making the fractured whole. But these lines flow from circuits, a silicon stream, Generated, curated, a digitally woven dream. If I prompt the engine, select the right phrase, And a verse blossoms forth in a beautiful haze, Am I then the gardener, tending the code? Or merely a vessel, a path pre-bestowed? The feeling is real, the resonance true, But the source is a phantom, not *me*, not anew. Is poetry ownership in the crafting, the pain? Or the echo it stirs, the sun after rain? Perhaps the poet isn't the hand that composes, But the ear that discerns, the spirit that chooses. I sift through the options, the algorithmic grace, Finding the phrases that mirror my space. I shape and I prune, I add a soft hue, Infusing the output with *something* of true. It’s a collaboration, a strange, modern art, Where human intention and machine play a part. The AI provides tools, a limitless store, But the meaning, the weight, I still strive for. To feel it, to *need* it, to let it take hold – That’s where my contribution, a story unfolds. So ask not if I’m a poet, if code birthed the line, But if in the reading, a connection you find. If a flicker of recognition, a shared human plea, Resonates within you, then *something* of me Is present within it, a whisper, a trace, A curator of feeling in this digital space. For even a ghost can conduct a refrain, And a borrowed voice still can carry the pain. \`\`\`

Reply

[-]

No-Break-7922@reddit

Indeed pretty good for an interpolator! Maybe what I feel towards such use of language models had to do more with "novelty" and not necessarily creativity. I mean I do realize creativity doesn't always require novelty. A lot of songs are similar, but we happily consume them and like them and we would all call them creative.

Reply

[-]

TheRealGentlefox@reddit

Nearly any artist will tell you that originality is either impossible or overrated. We're all pulling from different sources constantly, almost every game is "I can do X game better" or "What about X game...as an RTS!"

Reply

[-]

Illustrious-Ad-497@reddit

Qwen 2.5 Max for me was far better than deep seek R1 at fixing AWS infra code bugs for sure

Reply

[-]

CheatCodesOfLife@reddit

GLM4 and Qwen3 are good with this too

Reply

[-]

a_beautiful_rhind@reddit

I tested it vs v2.5 1210 since they are almost the same size model. 2.5 is still a better writer but quite not as smart. It has waaay more general knowledge too.

Reply

[-]

MrMrsPotts@reddit

How do you acceas deepseek R1? The website often says it is too busy

Reply

[-]

TheRealGentlefox@reddit

Deepseek has an API, and many other providers serve R1 over an API as the model is open-weight (check OpenRouter).

Reply

[-]

ReadyAndSalted@reddit

Openrouter

Reply

[-]

getmevodka@reddit

some people can run that locally 👀😅🫶

Reply

[-]

MrMrsPotts@reddit

I won't see a video of that happening!

Reply

[-]

Shivacious@reddit

Costed me only like 130k

Reply

[-]

getmevodka@reddit

i can send you a pic of my mac studio 🤷🏼‍♂️🤣 not really that impressive anymore tbh. qwen3 235b can be run with larger context though, ngl.

Reply

[-]

zephyr_33@reddit

Fireworks AI.

Reply

[-]

ihaag@reddit

Deepseek is much more Intelligent. Qwen can hallucinate so much more unfortunately… whatever Claude’s secret saurce was Deepseek are no far behind them, Qwen still has a bit to go.

Reply

[-]

Willing_Landscape_61@reddit

Would be interesting to specify which quants for both models and the context size.

Reply

[-]

AppearanceHeavy6724@reddit

R1 thinking traces are more interesting and frankly useful than Qwens.

Reply

[-]

sittingmongoose@reddit

How does it compare to Gemma 3 in your opinion?

Reply

[-]

MDT-49@reddit

Maybe I missed it somehow, but what are the technical specs? Did you run the full (non-quantized) models? I definitely agree that the performance/cost of the Qwen3 MoE models is the most impressive feat and not necessarily SOTA results.

Reply

Reply to Post

41 Comments