Coders are getting better and better

[-]

Some_Endian_FP17@reddit

Supernova something that runs on Qwen 2.5 14B. It honestly is the best coding assistant I've used, online or offline, because it's so focused on coding. ChatGPT rambles on and is shackled by too many safeguards.

[-]

tspwd@reddit

Better than Claude 3.5?

[-]

aitookmyj0b@reddit

No. In the context of coding, the gap between Claude 3.5 and Open source is like quite large. Not in the same league.

[-]

f2466321@reddit

Probably isn’t case if you can use Mistral large 2 but Takes 3-4 3090 to run it and it Will Be still 3x slower than Claude

[-]

aitookmyj0b@reddit

In my experience Claude is leagues ahead of everything, including the huge models.

[-]

f2466321@reddit

Probably but i encourage you to try Mistral large 2 , its insane , for sure on par with 4o

[-]

KedMcJenna@reddit

I just gave it a try based on your comment, and wow, yes. It solved on the 2nd attempt a tricky problem with a spaghettified React component that neither Claude nor ChatGPT had made much headway with. I grabbed an API key. A free billion tokens a month? Am I reading that right?

[-]

aitookmyj0b@reddit

Sure I'll give it a try

[-]

OfficialHashPanda@reddit

Maybe for some tasks, Mistral large 2 can rival Claude 3.5 Sonnet, but for most of my coding usecases, Sonnet unfortunately does much better. I also found deepseek v2 Code to be somewhat better than Mistral Large 2 specifically for coding and a lot faster, though it takes more vram to run locally.

[-]

Healthy-Nebula-3603@reddit

https://livecodebench.github.io/leaderboard.html

Even Queen 2.5 32b is crushing misteal large 2 ....

[-]

Healthy-Nebula-3603@reddit

https://livecodebench.github.io/leaderboard.html

Queen 2.5 is better

[-]

Orolol@reddit

On livebench, sonnet 3.5 absolutely crush Mistral large.

[-]

tspwd@reddit

I was hoping this wasn’t the case any more. Thanks for clarifying!

[-]

Healthy-Nebula-3603@reddit

https://livecodebench.github.io/leaderboard.html

Queen 32b seems to have a level of seonnet 3.5 new ... Deepseek is far worse .

[-]

PitchSuch@reddit

If you have the hardware to run the full Deepseek 2.5 model it isn't very far from Claude 3.5 Sonnet.

[-]

tspwd@reddit

Do you think it could run on an M4 Pro 128GB?

[-]

Inspireyd@reddit

There are people who claim that he is actually outgrowing Claude.

[-]

shaman-warrior@reddit

And do they give a specific example? I would be super curious

[-]

nightman@reddit

There will probably be no comprehensive examples as it's simply not true (maybe in somw corner case)

[-]

808phone@reddit (OP)

Yeah, it's good. I'm testing it now but it answered a number of programming questions a lot better than the stripped down Qwen.

[-]

Ystrem@reddit

Hi can I run it on GPU with only 8GB VRAM somehow ? Thx

[-]

TerminatedProccess@reddit

Go look at the huggingface link in the conversation. Then click on Files and you will see a whole list of models that are designed to work under different memory conditions.

[-]

llIlIIllIlllIIIlIIll@reddit

Even 4o, Claude, o1?

[-]

MusicTait@reddit

wow great.. so how do you run it? copy and paste or is there a way to integrate in, say, vs code

[-]

Some_Endian_FP17@reddit

Continue.dev and run the model as an OpenAI-compatible endpoint.

[-]

Pineapple_King@reddit

what supernova? do you have a link or name of manufacturer?

[-]

giblesnot@reddit

https://blog.arcee.ai/introducing-arcee-supernova-medius-a-14b-model-that-rivals-a-70b-2/

[-]

808phone@reddit (OP)

Loading now!

[-]

shaman-warrior@reddit

Whut

[-]

Pineapple_King@reddit

ohh! Thank you!

[-]

Some_Endian_FP17@reddit

SuperNova Medius, you can get the GGUF files at https://huggingface.co/bartowski/SuperNova-Medius-GGUF

Many thanks to Bartowski.

[-]

No_Afternoon_4260@reddit

It is apache 2

[-]

remghoost7@reddit

Any clue if it supports FIM...?

It doesn't seem like it on the main repo page...

[-]

iyzL0Ken0bi@reddit

I appreciate the input here. Im going to check out this Supernova. Ive been working on a Convoy defense fps game in Unreal 5 and I need a hand in some of the scripting. Thanks

[-]

808phone@reddit (OP)

I'm going to try the 14B but the 7B was already good for the tasks I gave it.

[-]

softwareguy74@reddit

I too am curious about this. I currently exclusively use Claude sonnet 3.5 and it's amazing. Can I expect a local LLM to match this to some degree?

[-]

808phone@reddit (OP)

Yes it can match it to some degree. It works for a lot of things. I would only use it for private data. Otherwise if you are paying $20/month for the commercial stuff, just keep using it, but local LLM is really getting much better.

[-]

Yud07@reddit

Qwen2.5 32b 4k context window at iq4xs is just about right for 16 GB VRAM. A little spillover of layers into CPU/RAM

[-]

BurgerQuester@reddit

What Mac do you run this on?

[-]

808phone@reddit (OP)

I'm running M1Max 64G/32

[-]

BurgerQuester@reddit

Ah great! I’ve got the mac too.

I haven’t run a model locally yet though, need to look into this.

Thank you

[-]

808phone@reddit (OP)

I never used all 64G, and finally I have use for it.

[-]

BurgerQuester@reddit

What is the performance like?

[-]

me1000@reddit

Qwen 2.5 32B is outperforming Claude for me on a lot of tasks I've been throwing at it the last couple weeks. It's a hell of a model, and it's not even their coding specific model.

[-]

Healthy-Nebula-3603@reddit

https://livecodebench.github.io/leaderboard.html

Yes queen 2.5 32b and 70b are monsters .

[-]

talk_nerdy_to_m3@reddit

I have never tried a local LLM coder, but I have a hard time believing that anything can come close to Claude. They are way ahead of even GPT 4o from my experience. I would be shocked if Qwen is really that good but I will give it a try! What are you using for UI to chat with it, system prompt, temp etc?

[-]

Qual_@reddit

Qwen 32b is okayish, but unusable withing an IDE, is it not capable of fill in the middle. Qwen 7b coder is capable of fill in the middle, but it's kind of dog shit as soon as you need more than truncate functions, or just auto complete the arguments in a function call. Nothing came close to gpt4o and Claude new sonnet. I really don't know what they are coding with Qwen to be satisfied enough

[-]

3-4pm@reddit

There's a lot of pro-qwen propaganda here that doesn't match reality.

[-]

Qual_@reddit

Yes, but to be honest 'it's good" and it's even surprisingly good for it's 7b size. It's kind of on par with copilot back in the days with GPT 3.5. The issue is that when you actively use copilot with GPT 4o or new sonnet 3.5 ( either with copilot/cursor) and the in files changes etc etc. It's just nowhere near the capabilities of closed models yet. No matter how you twist the benchmarks or whatever. It's a cool model, and i'm glad to be able to rely on it if I would lose my internet connexion, but let's be real for a moment.

[-]

Anjz@reddit

I think that’s the key though. A year ago most smaller models were shitty. With the Qwen 32b Coder coming out soon, I think people don’t understand the gravity of having an amazing coding model run on a local 3090/4090. With the price of APIs, integration of multi agentic, reiterative ‘create a full stack software’ like bolt.new, makes less sense. Of course zero-shot the big LLMs will always win.

[-]

Qual_@reddit

ofc, I love local models, but i'm still using the "big" ones for prod/ serious stuff.
I don't mind messing with gemma 2 27b, so it can do something the whole night on hundreds of thousand lines, but the choice is harder the more the "big ones" get cheaper.
For exemple Gemini flash is almost free, probably cheaper than the electricity cost of running the equivalent models myself.

I'm just skeptical when i read "better than Claude for specific cases". There is no way.

[-]

Emotional-Pilot-9898@reddit

I agree here. Nothing comes close to Claude. With it weren't the case. For me, Qwen models work good for other tasks. Decent at coding, but not better than Claude.

Python developer here. Claude has better Linux recommendations as well.

[-]

808phone@reddit (OP)

Claude has been great for me, but in the end, ChatGPT seems to always get the answer correctly when Claude or Gemini fails.

[-]

sedition666@reddit

I have never tried a local LLM coder

You should definitely try some recommended models out. It is a lot closer than you would imagine.

[-]

me1000@reddit

LMStudio (tbh, all the local clients are bad, but it works fine for my needs). MLX Q4. Temp is 0.5 and my system prompt is:

You are a helpful assistant who helps me with my day to day tasks.

I am a programming and computer expert, so there is no need to dumb down anything technical for me. When I ask questions about programming I’m probably referring to Javascript, C, C++, or Objective-C. I don’t write iOS apps, so you can usually assume I’m talking about a Mac or desktop app.

Write responses in flowing paragraphs with clear transitions. Use narrative explanations rather than bullet points or lists. Structure longer responses with markdown headings to organize the content.

When more information is needed to properly answer a question, ask for the specific missing context needed rather than making assumptions. Don’t be lazy in your responses, make sure you do all the work. There is no need to give compliments or apologize. Just be professional and respond to my questions accurately.

Notably, I often give claude instructions to stop using bullet points and write prose, and it still really likes to use bullet points.

I was also surprised with how well Qwen was performing. Sonnet 3.5 has been my daily model since it came out.

[-]

MusicTait@reddit

nice one!

[-]

zero_proof_fork@reddit

Have you tried connecting it to cline, this is where Claude is shining for me, its not so much the model , its the model combined with an IDE extension that grab large amounts of code context over multiple files, no copy and pasting between different windows.

[-]

MasterDragon_@reddit

Can you share what hardware you are using to run it locally at reasonable speed?

[-]

me1000@reddit

M3 Max MacBook Pro 128GB or ram. About 18 tokens per second

[-]

kuroninh0@reddit

Dear god, how much it cost? I was thinking in buying a M1 Max 32gb

[-]

me1000@reddit

It was $5k. It’s primarily a work machine, but given the option to max out the RAM I did so I could run local models.

My M4 Max is on the way! :D

[-]

kuroninh0@reddit

The M4 Max will be release only next year, no?

[-]

me1000@reddit

No, the announced it last week. Expected delivery date is Friday.

https://www.apple.com/newsroom/2024/10/new-macbook-pro-features-m4-family-of-chips-and-apple-intelligence/

[-]

kuroninh0@reddit

wow man that's huge! congratz!

[-]

MasterDragon_@reddit

Thanks.

[-]

Pedalnomica@reddit

I'm running the 72B (at 8-bit) and Claude 3.5 Sonnet definitely has a better shot at getting complicated stuff right. I basically just use the 7B coder or Claude depending.

[-]

me1000@reddit

I haven’t been using the 72B much because it’s a bit too big for my machine, but I can run it, it’s just slow. And funny enough the 32B was doing a little better at coding than the 72B (both Q4).

[-]

Pedalnomica@reddit

Maybe I should try the 32B

[-]

me1000@reddit

You should double check me, but IIRC the 32B model actually had a much higher score on the coding benchmarks than the 72B. Which makes me think they trained the smaller model on more coding data.

[-]

MaskedDelta@reddit

It could be because the larger model is being run at lower precision in the user’s machine, impacting performance negatively. It’s amazing what these models can do when their quality is not diluted to run at scale.

[-]

me1000@reddit

Im talking about the published benchmark numbers. I don’t run benchmarks on my machine.

[-]

cantgetthistowork@reddit

Have you tried comparing it with nemotron?

[-]

Weary_Long3409@reddit

Yeah, qwen 2.5 32b is a gpt-4o-mini killer for me. Hope there's a full-fledged 32b coder.

[-]

badgerfish2021@reddit

waiting for that one as well, the blog post said there would be one but nothing yet...

[-]

glowcialist@reddit

One of the main developers was asked about Qwen2.5 Coder 32b a few days ago and just responded "Not today", kind of implying soon. I have my fingers crossed for a release like 24 hours from now, but I'm probably wrong.

[-]

DeltaSqueezer@reddit

'not today' sounds more like 'f-- off and stop bothering me' ;)

[-]

femio@reddit

Like what tasks?

[-]

me1000@reddit

It’s better at following instructions when I ask it to write paragraphs and not bullet points. But I’m mostly asking it c and c++ coding questions.

[-]

ForsookComparison@reddit

Mistral-Nemo 12B is my sweet spot right now between performance and quality. Pretty acceptable speeds using CPU inference on DDR4

[-]

nuclear_semicolon@reddit

I have been using this model locally for a while now, and it has been working wonders

[-]

visualdata@reddit

For coding I mostly use Claude 3.5, Its really worth the price. But Qwen comes close

[-]

PutMyDickOnYourHead@reddit

I run Deepseek Coder 33B with Continue. Canceled my Github Copilot subscription the second I got it working.

[-]

Anjz@reddit

A year ago most smaller models were super shitty in general. With the Qwen 32b Coder coming out soon, I think people don’t understand the gravity of having an amazing coding model run on a local 3090/4090. They think, “Oh Claude is so much better at one shot” With the price of APIs, integration of multi agentic, reiterative ‘create a full stack software’ like bolt.new, makes less sense. Of course zero-shot the big LLMs will always win. I just think it’s a giant leap for AI, not relying on expensive APIs and reiterative ‘swarm’ software that would eventually give a better output than one shot expensive models.

[-]

Natural-Sentence-601@reddit

It's not just coding too. Anthropic Claude Sonnet 3.5 engages fully in conversations on "How best to proceed" about architecture, reuse libraries, UML-like design, and frameworks, all while providing demonstration code snippets. Because it doesn't have a plugin to VSC or GitHub CoPilot, you have to copy-paste into VSC, but it is just awesome.

[-]