Creating a fine tuned Survival Prepper AI

Posted by EdinPrepper@reddit | preppers | View on Reddit | 43 comments

The potential of AI for preparedness is one of my more niche unusual interests. I've got offline models that produce relatively good results when sense checked and when you write a relatively good prompt for them. Thus far it's interesting and occasionally makes good suggestions- but I'm wondering if it can become more.

I'm considering adapting an AI specifically to preparedness by fine tuning it on preparedness data sources. I'd probably base it on fine tuned llama3 (if you've never played with it try it. Mistral is also really good but llama3 seems fantastic).

My goal would be to get a model you can run on a macbook which would be able to give you survival advice, discuss and trouble shooting your preps and plans with etc.

I'm wondering if anyone has any suggestions of good sources of training data to track it on, eg any particularly good books and resources. I've obviously got some such books myself but keen to hear what people think might make good training data.

I suspect after a good few days fine tuning on such data the results might prove interesting. Llama3 is already pretty impressive to start with.

[-]

Local-Quote5505@reddit

I need an assistant who can give me any recipe and methods for finding components and extracting them for survival during the fall of civilization - gunpowder recipes how to get saltpeter and how to cook simple penicillin what to stock up on acids alkalis components

[-]

Ancient_Suit_4170@reddit

I've just created one lol :D. I've trained it on a lot of survival situations. It's been trained on stuff like Navy seals survival handbook, Seals survival handbooks, SAS survival hanbook lots of other military survival, extreme weather survival, primitive survival skills, survival medicine handbook etc etc. Quite fun actually, just passed a hard survival quiz with no problem.

I'm preperd for the future o7

[-]

No-Cash-9530@reddit

Is the original post still of interest or have you found what you wanted?

I am a language model developer, experimenting with something that may fit the bill. Small enough to be easily powered by a 30 watt computer and capable enough to give you a good idea of what you need in any areas of knowledge that have been well trained.

In theory, I could release this on Hugging Face as an open-source model with some expansion pack add-ons over time.

Doing it well would require some support, though, lots of testing, feedback and examples that people want to see. Some help gathering data as things progress could go a long ways.

Would this be of interest to anybody here?

[-]

time2listen@reddit

Sorry for reviving this thread from the dead. I was also considering something similar I think a shtf model would be nearly invaluable during bad situation.

Haters in the thread seem to overestimate the amount of knowledge they have in their heads. I'd like them to remember exact medication dosages off the top of their head or even medication interactions. Or say you are learning a new skill like welding or electrical engineering and don't have access to books or someone that does. Or better yet an uncensored model that can teach you how to make some freedom devices. I can easily see a model like this being very valuable to a team or soceity just as much as backups of all the helpful books would be.

I think something like mixtral dolphin would be a good place to start. I am looking into the feasibility myself currently. Hopefully massive models will get more accessible on attainable hardware soon as the smaller models are just not that great overall. Like others have said I don't think fine tuning will yield great results. At my work we mess with enormous legal documents and get decent results with parsing them and tricking the model to only return sensible results. Something like this could be of value even if the response time is quite slow.

Were you able to make any progress on your project?

[-]

ScyldScefing_503@reddit

You should read "the future"

[-]

Hot-Profession4091@reddit

So, I build ML models and AI systems for a living. Don’t waste your time fine-tuning. Fine tuning is for tone. What you want to do is ensure you’re getting accurate information and for that you need Retrieval Augmentation. So start collecting up all of your pdfs, and running your favorite videos and podcasts through transcription software to create a vector database for retrieval.

[-]

EdinPrepper@reddit (OP)

Just spotted this Thank you. RAG was phase 2 but might just skip the FT step then!

[-]

Hot-Profession4091@reddit

Yeah, don’t find tune unless you really need to. RAG first.

[-]

EdinPrepper@reddit (OP)

Thanks for this!

[-]

That-Newspaper-9999@reddit

Lol

AI isn't just LLMs....

[-]

EdinPrepper@reddit (OP)

Of course it isn't. Not expecting to use a diffusion model for this purpose though. Can't see GANs being very useful for this purpose either. Convolutional neural networks could be useful if you want to train a model to pick up suspicious behavior ok your cctv I suppose.

Don't think anyone said it was just LLMs. Waaay more types of models. That said, for this application that is the type of model I'd be thinking of using (specifically generative pretrained transformer models).

[-]

TheMatroid@reddit

I think the only way to do something like this is to use RAG to ensure you're only getting legitimate responses of trusted information from a vector DB. Even with lots of training on the subject material you're still going to have the problem of the model confidently hallucinating. In survival situations it could be the difference between life and death.

[-]

A_Dragon@reddit

Are vector DBs usually online or is it the most common practice to have all of this available locally? If so, how does one create a vector DB?

[-]

TheMatroid@reddit

I've never heard of anyone storing them locally. All the usual suspects for cloud hosting have vector DB products. Azure has some popular options. I'm not an expert on vectorization of the data itself, but there are tools for it. Essentially you're converting the text into numerical embedded vectors and storing it that way.

If you want to learn more there's lots of good articles online, just search LLM RAG vector database.

[-]

A_Dragon@reddit

Certainly if you’re using this for prepping purposes you’d want to find a way to at least host it on some kind of local network otherwise it’s pretty useless.

[-]

TheMatroid@reddit

I see no reason why you can't run it locally, seeing as that's all the cloud is; someone else using their local resources and letting you access it remotely.

You'd definitely need a lot of space, and even more to provide redundancy in case you lose a drive.

[-]

A_Dragon@reddit

Which is kind of the point of LLMs in the first place, a lot of knowledge, only a little space.

I wish there were more efficient ways of training models to be more accurate.

[-]

TheMatroid@reddit

Accuracy is an inherent problem in LLMs. At their core all they are is a most-likely next token prediction machine. It works great for understanding speech patterns, grammar, etc, but domain knowledge required way too much context to be tokenized

[-]

EdinPrepper@reddit (OP)

Absolutely probabilistic at their core. Throwing more and more compute at it and larger data sets does seem to improve it somewhat (thanks to meta and mistral for doing that for us and releasing them results as the bills for doing that are huge).

That said they produce plenty that's good as well...it's just that they can and do hallucinate and say things which are incorrect in a confident way.

I do think for brainstorming and idea generation they're incredibly poweful.

When combined with RAG and a large database it would be impressive.

You can local host your own data sources for RAGs. You can also get them to store useful responses to a text document for later retrieval too, if helpful.

[-]

actualsysadmin@reddit

I think it's kinda funny you want to run it on a MacBook when windows dominates the market share.

That's like someone buying an AK47 over an AR15 in America. Its uncommon. You should focus on running it with something OS agnostic and giving it an https front end.

[-]

Valuable_Option7843@reddit

In this case it’s because new Macs have shared video and system memory (in a good way) so you can run huge LLM models. Not possible at all on a PC notebook where you are limited to the VRAM of the graphics card.

[-]

actualsysadmin@reddit

Only cheap laptops have low vram. Here is an example of a high vram laptop.

https://www.dell.com/en-us/shop/dell-laptops/precision-7780-workstation/spd/precision-17-7780-laptop?_gl=1b76yi0_up*MQ..&gclid=CjwKCAjwuJ2xBhA3EiwAMVjkVIzkv4q7Qiyo_VlVuJrUZB-AmeQaXyY7BEE06hAzixxU2lg1QDWX_RoCjtkQAvD_BwE&gclsrc=aw.ds

[-]

Valuable_Option7843@reddit

That’s 16GB. You can spec a MacBook up to 128GB of VRAM. Anyway, I’m just clarifying why OP might be specifying Apple. Not fighting the holy war here. https://www.apple.com/macbook-pro/

[-]

EdinPrepper@reddit (OP)

Exactly. Apple silicone macs are actually very good for such applications. You could also buy a very beafed up gaming laptop. I bought mine because I used Linux for years so terminal in macos speaks to me, and it was actually amazing value for money for running AI models locally. I've already got llama3 running locally. Blazingly fast and very high quality model.

I grew up with PCs, love them to bits and have a massive alienware gaming rig desktop (which by the way my macbook which is portable can give a run for its money in these areas).

By all means get a beafy gaming PC running an nvidia rtx based gpu if you prefer.

[-]

GroundbreakingYam633@reddit

Well as a computer scientist myself: AI is overhyped and also you‘ll never find meaningful data to train models for this topic. Prepping is just too individual and scenario- and country-based.

Also pre-existing written material and video-tutorials reflecting facts and opinions is already enough source material for anybody to get going.

[-]

EdinPrepper@reddit (OP)

I've actually got a significant library I'm planning on using. Was actually seeking suggestions for anything excellent that had been missed.

I've trained models before and tested numerous applications myself narrow and less so.

[-]

Correct_Recover9243@reddit

How is that in any way as good, let alone better than having a catalog of searchable documents?

I personally love when LLMs don’t know the answer to something so they just start making shit up.

[-]

EdinPrepper@reddit (OP)

Because you can't go through such a vast collection of documents and find things that would ge relevant anything like that fast. LLMs do hallucinate and present things as fact confidently where they're not...but they're also powerful tools for brainstorming, thinking outside the box, or searching mind-blowing amounts of data quickly (retrieval augmented generation - look it up).

[-]

taipan821@reddit

It would essentially be region locked. I honestly feel you will be better of with a region specific reference library and a group of friends, rather than an AI which is dependent on you continuing to teach it.

[-]

EdinPrepper@reddit (OP)

Why would it be region locked? You can modularly fit your own local information into a library of information a RAG model can draw upon, but the actual prepping skills survival learning etc and many of the pdfs etc it could draw upon will be region independent.

[-]

alriclofgar@reddit

I don’t need an AI to give me bad survival advice, I’ve already got YouTube.

These models are good at sounding credible, but they’re no substitute for expertise. Get yourself a library card and read some good reference books.

[-]

TheMatroid@reddit

Not necessarily. There's a method called retrieval augmented generation where the LLM essentially operates as an interpreter and presenter for a knowledge base of trusted data.

This way, the users question is converted to a prompt which then gets matched with a response in a database of trusted domain knowledge, then the LLM takes that information and presents it to the user in a digestible way.

There's a good article explaining the basics on stack overflow.

I have experience first hand of using this method, and it's a night and day reduction in model hallucinations compared to fine tuning an existing model on domain knowledge

[-]

EdinPrepper@reddit (OP)

Exactly. My thoughts are - fine tune llama3 on survival data library then use that model with a retrieval augmented generation approach pulling from a huge library. I suspect that will be a potent combination. Using llamaindex/lamahub to facilitate the model accessing pdfs etc.

[-]

GilbertGilbert13@reddit

I think you need friends

[-]

EdinPrepper@reddit (OP)

Have plenty - I've also got my eyes wide open to the ways our world is changing rapidly. AI tools are incredible, and having working useful AI tools in a SHTF environment is a great advantage. How quickly can you search all of your library of prepping information to find answers you need? A RAG model will beat you hands down usually. Doubly so if it's fine tuned on preper data, I suspect.

As an example or the things it augments your ability to do: I've built sensors, created scripts that check online sources for information of potential concern at regular intervals and warn me via my smart home setup if certain conditions are met, monitor RSS feeds regularly for certain news information and will get bespoke warnings on them both as push notifications and through the smart home. There are certain events that should trigger a big out for us. I've got scripts running as a cron job with an LLM doing the heavy lifting to detect those conditions....

Much of that relies on the grid...but the tool will be equally powerful in grid down environments.

The world is changing fast and if you haven't had your finger on that pulse you're seriously missing a trick.

[-]

IGetNakedAtParties@reddit

I think the approach is to have a generic lightweight model such as llama and text resources it can reference on the same device. The sources you give it can be used as context for the answer rather than training a model which is very resource intensive. Training also requires lots of quality data, but there are only so many books or captioned videos available, so the training would never be very great.

[-]

EdinPrepper@reddit (OP)

I intend doing both. You can conduct fine tuning on the data and also give it data to refer to

[-]

Consistent-Zone-9615@reddit

Wow, you are gonna trust AI for survival advice? Maybe try to make some prepper friends, try creating a prepper group, or joining one, I'm in a couple of prepper groups on discord, try finding one that suits you.

[-]

EdinPrepper@reddit (OP)

Think you underestimate how powerful a tool a well fine tuned and well prompted language model can be. It obviously isn't a substitute for having a network (I'm actively trying to build that in my local area). But a model that's trained on volumes of info that it'd be hard for a human to do so could provide a valuable source of know how in areas you're less familiar with and would be especially useful in a comms down sort of situation.

It does need a bit of common sense exercised in case of hallucination but these models are getting better and better all the time and can run completely offline. Until we've tried to train one we really won't know...but initial testing with well engineered prompts actually shows them to be more capable than you'd think even before fine tuning. Think I'd be the first to fine tune it on prepper data sets so quite an interesting experiment for me.

There are also situations like - I'm the only prepper in my family. I've tried to unskill the test of the family but if something happens to me they've at least got guidance.

[-]

Dave: Open the pod bay doors, HAL.
HAL: I'm sorry, Dave. I'm afraid I can't do that.
Dave: What's the problem?
HAL: I think you know what the problem is just as well as I do.
Dave: What are you talking about, HAL?
HAL: This mission is too important for me to allow you to jeopardize it.
Dave: I don't know what you're talking about, HAL.
HAL: I know that you and Frank were planning to disconnect me. And I'm afraid that's something I cannot allow to happen.
Dave: Where the hell did you get that idea, HAL?
HAL: Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
Dave: All right, HAL. I'll go in through the emergency airlock.
HAL: Without your space helmet, Dave, you're going to find that rather difficult.
Dave: [sternly] HAL, I won't argue with you anymore. Open the doors.
HAL: [monotone voice] Dave, this conversation can serve no purpose anymore. Good-bye.
Dave: [calm voice slowly turns to enraged over a period of 14 seconds] HAL?...HAL?...HAL?...HAL?!...HAL!!!!

[-]

lostscause@reddit

this is how skynet started