Offline AI and Data

Posted by EnviousLemur69@reddit | preppers | View on Reddit | 17 comments

I’ve realized lately that in case SHTF, another good idea for prep in a situation of self sustainment, having diverse dataset and offline AI models could prove useful for a number of situations assuming you have your own power source and storage. I don’t know how this only just occurred to me. Anyone have experience prepping in this way?

[-]

Hot-Profession4091@reddit

To do this in a way that doesn’t give you hallucinations, you need to build a RAG (retrieval augmented generation). Basically, you’d want to build a large corpus of documents you want to search, then build a search engine. When you prompt the LLM, you first actually perform a search of the corpus and find the most relevant documents, then feed that into the LLM as context along with your original prompt. This helps ensure the LLM gives you accurate and factual information.

If that sounds like a lot of work, that’s because it is.

(I build AIs for a living.)

[-]

incruente@reddit

(I build AIs for a living.)

Super. How much would you want to build a decently useful prepper-oriented AI that could be run locally on affordable hardware?

[-]

Hot-Profession4091@reddit

More than any of you can afford.

Honestly, building the thing isn’t that hard and probably wouldn’t cost more than $20-30k. The expensive part is building the document database. That would be a never ending task of curation.

[-]

incruente@reddit

More than any of you can afford.

Honestly, building the thing isn’t that hard and probably wouldn’t cost more than $20-30k. The expensive part is building the document database. That would be a never ending task of curation.

So building it costs more than "any of you can afford", but it's also more like 20-30K? Got it.

[-]

Hot-Profession4091@reddit

I don’t suspect you have $30k laying around to pay my consulting rates to build this.

[-]

incruente@reddit

I don’t suspect you have $30k laying around to pay my consulting rates to build this.

And like I said, that’s for the software. I’m not going to begin to estimate the cost of building the knowledge base.

You can suspect all you want. Have a nice day.

[-]

Hot-Profession4091@reddit

Yeah. That’s what I thought.

[-]

Main-Engineering4445@reddit

Tons of open source models. Llama 3.1 is open source. You’ll need some GPUs to run them. You’ll want the 8B parameter models. Anything larger just wouldn’t be feasible to run unless you’ve got some serious horsepower (I’ve heard 10 4090s for the 70B).

I’m a “datahorder”. I’ve got just shy of 200TB right now. I’ll be upgrading to a petabyte server in the next year or so. I’m happy to answer any specific questions you have.

[-]

EnviousLemur69@reddit (OP)

What I had imagined was smaller models trained on certain datasets like medical or gardening or mechanical.

Do you think there’s a balance with capability and power consumption. Either way having access to extensive data for handling problems not in immediate skill set without hoarding books seems useful.

[-]

Main-Engineering4445@reddit

I have a huge library of books on all manner of topics from construction and electric codes, to sheet music, to chemistry, astrophysics, and beyond. Libgen is really handy for getting digital books.

As far as running models, I wouldn’t count on them for your only source of information. They have a nasty habit of hallucination and boldly stating facts that are wrong. That might change but at this point I wouldn’t count on it in an emergency or SHTF situation outside of idea generation.

As far as power consumption I couldn’t really comment. I would imagine 5-10 4090s could easily draw 2-4 kilowatts of power under load. I can’t imagine it being useful enough to take resources away from other areas you’d need power for.

[-]

EnviousLemur69@reddit (OP)

Thanks for the feedback. I agree in that AI isn’t quite there yet for being a necessity but maybe soon enough. Either way datahoarding seems quite valuable.

[-]

Main-Engineering4445@reddit

Lots of benefits in every day life too. All the movies and TV we could ever want and always locally available. No more streaming platform games where they pass rights around so you have to buy 5 different subscriptions.

[-]

preppers-ModTeam@reddit

Your post or comment was removed because you submitted it twice, or the topic you are posting about has been already posted by a different user, discussed recently, or is posted about frequently.

[-]

HibariNoScope69@reddit

AI is kind of garbage

[-]

TheSensiblePrepper@reddit

Most "Consumer Grade" AI systems are around 13 years old in human years. So yeah, they aren't that great. Just wait and it will get better/worse.

[-]

kkinnison@reddit

The only "Diverse dataset" i need in case of SHTF is books

Any "Offline AI Model" is crap, and worthless. They can fake knowledge and for someone that is ignorant on an issue they sound good, but they can be wrong, and you wouldn't know it. They are not even good to fake a cybersex date with even if it can pass a turing test.

[-]

incruente@reddit

Data? Check out r/datahoarder

AI? There are local models, but they usually require a pretty beefy computer, and for my money they're currently too prone to lying to be anything other than possibly idle conversational partners.