Solidity

Posted by swingbear@reddit | LocalLLaMA | View on Reddit | 20 comments

Hey all!

I have spent the last few evenings building a modern solidity LM with sota CoT/tool calling runs in the later stages.

Question: what are you all using for solidity or smart contract development? I find the current SOTA models don’t have a tremendous amount of focus on that niche language especially vulnerability’s and economic attacks.

Any local models out there that are half decent or should I just continue with my side project until it’s done?

[-]

swingbear@reddit (OP)

Update: I’m about 50% though my first attempt https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-audit-stage2

[-]

o0genesis0o@reddit

Would it really be better than just getting a normal SOTA coding model to create the code, and then use a suite of auditing technique to refine the output and detect issue? The other day I saw on arxiv a paper that trains a heterogeneous graph neural network to help LLM detect and understands issues in solidity code better. Hooking these sorts of stuffs to double check the output from a SOTA coding model could be a more efficient solution.

Since solidity code (technically, the smart contract bytecode) by design would be publicly available on ethereum, the privacy angle of local models is out of window as well. Given that one does not need to build many smart contracts, and quality is above on, this is where I would rely on the biggest baddest cloud model and pay the token.

[-]

swingbear@reddit (OP)

I think the issue stems from sota models not having a focus on solidity data during training. I have just finished my first sol lm iterations and it’s outperformed opus on soleval.

[-]

wren6991@reddit

Richard Sites of Alpha wrote a great article in the 90s called, "It's the memory, stupid!" -- you can read it here if you search for his name: http://cva.stanford.edu/classes/cs99s/papers/architects_look_to_future.pdf

This is one of those classic papers that becomes more relevant to computer architecture and LLMs as time goes by. There's an easy trap people fall into where they read about FPGAs and assume they'll be good for compute task. When you find yourself falling into this trap, just say to yourself: "it's the memory, stupid!"

[-]

rm-rf-rm@reddit

Just curious, do you do code in Solidity for your job or ? Ive genuinely seen 0 applications (trading or any related ETH infra stuff doesnt count) used by real users at any meaningful scale

[-]

HumanDrone8721@reddit

Nope, you'll have to learn fine tuning and supervized learning and "specialize" a model for your use case if you want to get an edge, the cloud crowd scrapped whatever they could find on the Internet and pirate to train their behemoths, but in niche situations like yours where there is no data and no specialized supervision they are as dumb as your 27B model. Just test a few to get the best available free one.

[-]

DinoAmino@reddit

Seems everyone forgets about RAG and consider fine-tuning first. RAG takes far less time and resources to setup and get working. If you have already done a lot of successful fine-tuning and GPU power available I can see it, but Lora adapters are not enough to learn a language - whether coding or spoken.

[-]

swingbear@reddit (OP)

Yeah I have tried the sota models they are no good for this, they can produce solidity but it’s often janky.

I’m training Qwen 3.6 27b right now. It seems to be such a sandbagged area of AI. Every other use case there are tons of finetunes, solidity… nada. I’ll finish up, bench it and if it’s any good I’ll release on HF.

[-]

HumanDrone8721@reddit

This is the way, while is nice to have at your fingertips a monster that knows about medieval art and Russian 19th century ballet dancers as well as the latest coding patterns, expert trained small models running locally is the way to go ahead. This why ALL the cloud bro are keeping costs low, but use your prompts and data to refine their stuff, even as none of them admit like they didn't admit with the pirated stuff.

The current open-weight models are actually good enough for domain usage, especially with extra tuning that can not be found outside some experts circle.

[-]

swingbear@reddit (OP)

Yeah i have become rather obsessed with local finetune, it’s satisfying when your 27b on-prem model gives a better answer than a 1tn param Goliath haha.

But I was just taken aback by how little attention had been given to small solidity models. Normally there’s 1000’ on huggingface.

It’s either way harder than I’m expecting(but I can’t see how) or people don’t like to share them because of its direct advantage.

[-]

HumanDrone8721@reddit

There is both, having a proven results trained model that gives useful answers in a highly-paid niche domain is a good commercial and venture opportunity and also the existing experts in a niche domain would like to remain the few existing and absolutely necessary experts in that domain and not train their replacements any time soon, so even if they did some stuff, they use it for their own projects and don't publicly disclose it.

So this can be an opportunity or a threat, a SWOT analysis is necessary ;).

[-]

swingbear@reddit (OP)

Well I’m just gonna dump mine publicly lol I’ll add a buy me a coffee link at the bottom, the api calls are no joke for opus data collection haha

[-]

swingbear@reddit (OP)

I mean damn, even the data sets on HF are old or useless.

[-]

ortegaalfredo@reddit

I work as a solidity auditor as a daily job and SOTA models (even local like deepseek) are very good at auditing smartcontracts, even at somewhat obscure languages like Clarity. They excel at solidity, but they won't find obscure economic attacks that depends on design but you can basically teach the model to look for that by providing examples and the model generally understands them better than a human would do.

[-]

SkyFeistyLlama8@reddit

Have fun using a non-deterministic machine to create unchangeable code!

[-]

swingbear@reddit (OP)

So I agree and disagree, static codebase audits yes they can find logical issues and code hygiene problems. But when I create scenarios where a bad actor creates an an economic attack (specifically defi) it falls short. And for some reason it struggles a bunch with gas optimisation

[-]

ortegaalfredo@reddit

It also depends a lot on the harness (agent) that you are using. I get very different results from copy/paste code into the web interface or asking claude code to find things, than using a specialized agent.

[-]

swingbear@reddit (OP)

Yeah harnesses are mandatory, I have had some decent success training 3.6 27b https://huggingface.co/samscrack/Qwen3.6-27B-Opus-CoT-S1-Hermes-S2-SFT

[-]

darkens89@reddit

Can Claude or gpt really not manage this?

[-]

sic7k@reddit

Not sure tbh