Local AI for small biz owner

[-]

Agreeable-Market-692@reddit

They don't need an LLM, they need an attorney. If they're in the USA it's quite common for bar associations (the terminology for the professional associations for law firms and attorneys) to provide a legal hotline that callers can ask questions on and possibly get referrals to young attorneys trying to get experience.

The average law student who has passed their classes and the bar exam (thus enabling them to give legal advice) is always going to outperform a local LLM some non-LLM engineer set up on a PC.

Please do not put them in a situation that would risk their hard work and financial future of their family.

DO NOT DO THIS. DO NOT DO THIS. DO NOT DO THIS. DON'T DO IT. Find an attorney.

I am not an attorney, I do not own a law firm, I don't even personally know any lawyers. I am 40yo lifelong natural language processing and AI nerd with 20 years of experience as a developer and 28 years of Linux and opensource use and advocacy. I build agents for a living, I build RAG apps for a living, I build things that engineers and technicians depend on daily that if they catastrophically failed would ignite millions of dollars. Please do not do this. They must use a real attorney. It's great to want to help and it's great to be frugal, I am often frugal myself.

I am not saying they will get tricked into giving up their first born to Rumplestilskin, and I use and do experiments with small models every day, but this is not an idea to follow through.

Build them marketing agents if you want to help out, show them n8n automations, but don't pretend a pocket calculator can be a lawyer. This is a complex topic that constantly experiences nuanced changes and sometimes big changes. It requires engineers and legal experts to build.

If you do this you are subjecting them to undue risk, it could be the least helpful thing anyone has ever done for someone they apparently liked!

I am sorry if this sounds rude, I need you to understand that you should not under any circumstances do this.

[-]

cromagnone@reddit

This, 100%. Good tool use involves knowing when not to use a particular tool. LLMs are no different to impact wrenches or chainsaws.

[-]

kspviswaphd@reddit

Try using llama.cpp for combination of RAM and VRAM and start with 8B models, maybe the fine tuned one with your specific data. You can use unsloth in colab to perform this fine tuning task. I don’t necessarily agree that you need to sell off and get paid models. Instead you can try and see how much you can benefit from available hardware. Paid model is not always the solution.

[-]

binyang@reddit (OP)

From what I can tell, I know very little about legal stuff, they know nothing. So pretty much they need help to understand the contracts, like translate legal writings to something normal human can understand.

[-]

cromagnone@reddit

For a business incurring liability, this is why lawyers exist.

[-]

MushroomCharacter411@reddit

I don't necessarily think they need paid models either, but they still need better hardware. My hardware is slightly better than theirs in every way except the CPU (which is almost a wash) and I wouldn't dream of using this as a production system. It's just impractically slow.

[-]

StardockEngineer@reddit

Do not, absolutely do not do this. You don’t know well enough you shouldn’t. Thats ok until it’s something like this. Then it’s really not OK.

They should be asking an attorney. Only then can they maybe use a frontier, hosted model to help flag some questions they might want to ask said attorney.

[-]

No-Design1780@reddit

Tbh, I don’t think a small local model would be sufficient at the task. The smaller the models get, the more incoherent they become. You also have to consider the cache which will likely overflow given the long context you’re providing. Even if you did get the model on there without overfilling the prefix with a large contract file, the throughput might be too slow.

[-]

Dry_Yam_4597@reddit

And leak data to chatgpt? Thats poot advice. To summarise and explain documents a 3090 and qwen / deepseek will do.

[-]

redragtop99@reddit

100%, I wouldn’t even trust chatGPT, you could ask it for laws and then double check its answers, but when working w legal stuff it can get the location (which is extremely critical) mixed up. LLMs are not humans, they don’t think “we need the laws of this state and this area” they just give you laws they’re trained on and change the location to whatever it’s asked. For instance, LLMs will recite case law that is totally made up.

They’re good to use for overall contract and legal strategy, but depending on them will get you sued!

[-]

cchung261@reddit

This is the right answer.

[-]

No-Consequence-1779@reddit

There are legal models. Expecting someone with zero legal training to understand and find what they are looking for in contracts is not realistic. LLM or not.

And let’s say he does find what they are looking for - …. Nothing.

[-]

joelW777@reddit

Use Qwen3 VL 30B A3B in the GGUF format, quantized to q4, in LM Studio, offload the KV cache to the GPU, but no layers. Use the unsloth-Version. This is the only model almost as smart as a 32B model (enough for this purpose) but only using 3 billion actove parameters, so answers are way quicker.

[-]

huzbum@reddit

This is what I would try. Download LM Studio and use that to download the model.

Doesn’t need to be fast, just has to translate. 5 tokens a second is enough for that purpose, and I doubt it would be that slow.

[-]

joelW777@reddit

With 32B you'd probably have way less than 5 tokens per second, but with 30B A3B you could have around 15. The slowest part is the prompt processing anyway, and with llamacpp (KoboldCPP, ollama, LMstudio..) that can be accelerated with CUblas by a lot. For more speedup, you need a format that can not be offloaded to CPU well, like exl3, awq, gptq, etc.

[-]

joelW777@reddit

With 32B you'd probably have way less than 5 tokens per second, but with 30B A3B you could have around 15. The slowest part is the prompt processing anyway, and with llamacpp (KoboldCPP, ollama, LMstudio..) that can be accelerated with CUblas by a lot. For more speedup, you need a format that can not be offloaded to CPU well, like exl3, awq, gptq, etc.

[-]

MushroomCharacter411@reddit

Sorry to say, but that system is not even sufficient for screwing around with AI models at home, although it would be somewhat capable (although still really slow) with a much larger SSD. AI models are *big.* Although I'm sure there are other people who have made it work, I wouldn't even attempt to set up AI models for anything "serious" with less than a 1 TB SSD. Also, the 1660Ti is going to be a bottleneck. I've got a 12 GB RTX 3060 and find that it's rather inadequate for any serious work with an LLM. My system isn't *too* different (i5-8500, 48 GB of RAM) and I can do a little with Qwen 3 in a 4-bit quantization but that's at a somewhat painful 2 tokens per second -- like I said, adequate for screwing around at home, but not even close to enough for a production system. And Llama 3.1 or DeepSeek-R1:70b? The best I can manage is about 0.5 tokens per second, meaning every reply takes 5 or 6 minutes.

You aren't going to get enough speed without a GPU upgrade. And if you're sinking $1000+ into a GPU, you might as well back it up with a better CPU and more RAM, and a lot more storage because even those Qwen 3 "toy models" weigh in at about 14 GB each. Llama 3.1 Instruct and DeepSeek-R1:70b are 41 GB each. I have just five models at my disposal and they add up to 127 GB which would be half of your 256 GB SSD all by themselves.

[-]

takuarc@reddit

If you haven’t noticed, we are still in the phase where everyone is adding more GPUs (hence all these mega sized data centers) to their LLMs to squeeze that little bit of incremental improvement out of them. We are still years away before they will start to compete for the smallest models. Hang tight.

[-]

teleprint-me@reddit

This is a bad idea, even if you had the hardware to do it.

If you have legal accreditation, using embedding models to query for document similarity would be a better play, but still error prone.

LLMs may feel magical, but theyre not a panacea. You will need to fill in the gaps yourself.

[-]

abnormal_human@reddit

You could probably convince something like Qwen3 4B to run on that box in VRAM decently well.

That said, proofreading contracts is frontier model territory and I would not consider it a laptop-scale local AI problem.