Traning Llama3.2:3b on my whatsapp chats with wife
Posted by jayjay_1996@reddit | LocalLLaMA | View on Reddit | 115 comments
Hi all,
So my wife and I have been dating since 2018. ALL our chats are on WhatsApp.
I am an LLM noob but I wanted to export it as a txt. And then feed it into an LLM so I could ask questions like:
- who has said I love you more?
- who apologises more?
- what was discussed during our Japan trip?
- how many times did we fight in July 2023?
- who is more sarcastic in 2025?
- list all the people we’ve talked about
Etc
So far - the idea was to chunk them and store them in a vector DB. And then use llama to interact with it. But the results have been quite horrible. Temp - 0.1 to 0.5, k=3 to 25. Broke the chat into chunks of 4000 with overlap 100
Any better ideas out there? Would love to hear! And if it works I could share the ingestion script! 🙇
Sea_Platform8134@reddit
Use a KnowledgeGraph (Neo4J)
stubrich@reddit
I've had good results using GPT4All for querying text, including very large documents. It can ingest most document types in less than a minute (depending upon the document size)
dreamai87@reddit
you better finetune llm on your chats and make role play chat to interact and see how it manages the personality. Fun
pokatomnik@reddit
Did you think about a bot that can answer her questions like “do you love me?”
KingMitsubishi@reddit
Would make a nice Black Mirror episode.
Thedudely1@reddit
Plot twist is your wife ends up with the ai instead of you
SatisfactionSad7769@reddit
Are you able to open source the data? Usually we need to have a better understanding of the data and THEN decide how to tackle the problems. 😂😂😂 JK.
DrivewayGrappler@reddit
I did it with 16 years of my wife and I’s texts, well I didn’t train a model. I used Postgres, went through with another LLM. Created tags for every day of exchanged messages, the vectorized each message along with a rolling window of 8 or 10 messages, daily sentiment analysis. Now can vector search it, or make sql queries and build charts via LLM, or can just use the front end I made for it.
I say I love you more.
rohitkeshri4705@reddit
Whattttt
SysATI@reddit
Why don't you use NoteBookLM instead ?
TUBlender@reddit
RAG will be useless for every of your example questions, except the japan trip one.
Only the top 'k' best matching text chunks will be used to answer your question, with 'k' usually between 3-10.
Quantity questions cannot be answered correctly because of this.
Soggy-Camera1270@reddit
Agree. I feel like ingesting these into a SQL database and querying stats on certain terms would probably be more useful.
Kale@reddit
Duck DB and pandas on Python would be my tools of choice.
If I'm being lazy, I pickle the pandas data frame directly.
amphion101@reddit
Pickled Pandas sounds fun.
Kale@reddit
I always "import pickle as p"
A file open for writing is always "fo". I have so many "p.dump(df,fo)" in my lazy code it's not funny. And also "df=p.load(fi)".
shemer77@reddit
Yea, a purely llm solution for this won’t work. Need a mix of statistically analysis with llm probably
Special_Bobcat_1797@reddit
Agreed on this . Maybe use a more agentic model which is good at sql
_raydeStar@reddit
Even feeding it into a RAG isn't going to work. llms are notoriously bad at counting.
sleepy_roger@reddit
Eh I disagree, I see some saying "RAG is useless for counting" but that's just bad implementation definitely not a limitation of RAG itself.
If you chunk lazily by tokens and use a fixed k of course it will fail.
With metadata-rich chunking (sender, timestamp, message_id) and a hybrid search strategy (semantic with metadata filters), RAG can absolutely handle quantity based questions.
A smart agent can break down "Who said 'I love you' more?" into sub queries, count matches per person, then compare.
I did this myself using the King James Bible in a RAG setup. It can count how many times "Daily Bread" appears in the New Testament because retrieval is designed to get all relevant chunks, not just the top X hits.
RAG definitely isn't useless it just needs the correct implemetnation.
NoElephant7872@reddit
All of that can be done easily with a Python script. I also tried it with Llama and DeepSeek and I only had problems handling the large amount of text I had. I can't do much more, but I would like to learn something about AI Learning.
Efficient_Bus9350@reddit
This, I worked with a RAG system that had knowledge of a number of simplified views and it was able to then query those views for more accurate information.
Original_Finding2212@reddit
r/technicallycorrect
HasGreatVocabulary@reddit
I strongly suggest not doing this, over-analysis is not always worthwhile
indicava@reddit
WhatsApp is SQLite underneath? At least it was.
In any case I would throw it in a relational db and use some text2sql agent to grab results.
finah1995@reddit
The is the Way.
Special_Bobcat_1797@reddit
This is the way op
Snoo_28140@reddit
This man is gonna win every single argument from now on 🤣
FluoroquinolonesKill@reddit
One does not “win” arguments with a wife, young man.
finah1995@reddit
Well said.
Snoo_28140@reddit
🤣 wise words
QuantumCatalyzt@reddit
One might win by ending up sleeping on the street
redlightsaber@reddit
I was more thinking he must really be sick of his wife, as he's decided to take the nuclear route to relationships.
HappyFaithlessness70@reddit
:)
LumpyAd7854@reddit
Until his wife finds out about his relational database analysis, then the whole relation might be at risk!
Ok_Cow1976@reddit
Why would you want to know the answers? I mean, these questions have standard answers. Like the 1st one, it is you. The 2nd one, it's you again!
tmvr@reddit
yes, you know it, i know it and I guess a bunch of other people reading this know it. I'd also wager that OP knows it as well, but his subconscious can still deny it until the hard numbers are there :)
tmvr@reddit
Yeah, I don't think having this information will lead to anything. When you are in the phase that these are discussed, numbers are meaningless. If your aim is to salvage something then go to couple's counseling.
YouAreRight007@reddit
I would instead train a model using her questions and your responses.
Would be a fun exercise creating a WhatsApp husband bot.
dude792@reddit
Not every problem is solvable by technology and logic. Search for attorneys in your region before presenting the facts. You might need one for the disputes over divorce. Maybe your next Japan trip will be half the cost.
Nevertheless it's interesting to do. Proceed at your own risk :)
Good luck. If we don't hear from you after that project we know what happend :D
mlabonne@reddit
Check out this 1.2B RAG model, it'll be a lot faster and higher quality than Llama 3.2 3B for this task: https://huggingface.co/LiquidAI/LFM2-1.2B-RAG
TechnoByte_@reddit
You'll need some LLM with a massive context size, RAG won't work for your questions.
Some options:
https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M (7B, 1 million tokens)
https://huggingface.co/MiniMaxAI/MiniMax-Text-01 (456B, 4 million tokens)
https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct (109B 10 million tokens)
Realistically, Qwen2.5-7B-Instruct-1M is the only one you'll be able to actually run locally, but to reach that context size without insane amounts of RAM, you'll need quantized cache (
--cache-type-k q4_0 --cache-type-v q4_0
in llama.cpp)LinkSea8324@reddit
Bad advices, because qwen2.5 is actually the only one of them which work with this length, but it’s outdated, qwen3 got their own 1m patch but vllm latest version doesn’t support sparse or dual chunk attention anymore
PontiacGTX@reddit
This is the actual answer you need to use function calling with deterministic approach to it and give it a time frame and iterate through each time and make it count itself on this method
Fuzzy_Independent241@reddit
Use Google's Notebook LM, that's what it does and it's free.
Busy_Leopard4539@reddit
Lemmatize and make clustering + factorial analysis. Done that on my side a year ago lol LLM/RAG are pretty useless here.
Special_Bobcat_1797@reddit
Wow I don’t understand this at all .. can you please help throw some more light kind soul
givingupeveryd4y@reddit
4/6 items on your list tell me I wouldn't like to be you or your wife rn :p
Special_Bobcat_1797@reddit
😂
rudythetechie@reddit
you can stitch this with zapier or n8n and openai... pull from notion slack drive into one doc then summarize weekly... add erpdotai if you want unified client data without juggling ten apis
Special_Bobcat_1797@reddit
Can you help me understand how embedding and retrievers help increasing rag efficiency ?
I know I can ask chat gpt, but I just want your thoughts here
yayosha@reddit
If you want the absolute fastest way: Split the data into chunks of 10k tokens, feed them to the LLM one by one, ask for a summary and have a premade prompt with your questions you want answered.
Compile results on your own or input all into the LLM again to summarize over the summaries.
Also read through the input yourself, don't just assume the LLM will notice everythinh that might be interesting to you.
It will not work magically, and you have to go through the pain of some manual labor... The more you do, the better your results will be tho :)
BadBoy17Ge@reddit
I have tried it already once a long back,
This is how i did it,
Man i think people say rag is not required its true to some extent,
But here what i did-
Make separation of each session by using time stamp and use the LLMs to name them
And then when message is made or sent to you,
You pick couple of things that matches and put it on context and give the message sent in that session and ask llm to provide a structure output
This will make the message more convincing.
And you can do this with go whatsapp library.
I think i might have this with me but i didn’t complete it though
Substantial-Gas-5735@reddit
You need much more than simple vector search based RAG, as others pointed out we cannot answer how many kind off questions,
Try graph RAG
TheDreamWoken@reddit
You would’ve better off doing statistical analysis
Unless you want to train a llm style and writing that sounds like you and your wife
michaelsoft__binbows@reddit
Y'all go around typing in a chat when you fight? What kind of fights are those lol
-Django@reddit
I agree that this is more statistical analysis, but I fine tuned an LLM on a group chat with my buddies and got some fun results for simulating conversations. I chunked the chats by day, and experimented with different formats e.g. "Bob: blablabla /n Alice: yoyoyo" and used unsloth + collab to train the model. You don't need to fine tune an LLM to answer those questions you have, but a fine tuned LLM will be a fun toy for simulating chat threads
73tada@reddit
In regards to formatting data for training. Manually chunking the data has been key, followed by using unsloth's Mets Synthetic Data kit for Q&A generation on those chunks.
For working with "emotional data" you may want to look into old school (~2019) NLP semantic processing for the sarcasm classification.
RAG seemed to be a good plan, however unsloth's docs say "LoRa is better". I don't know about that, I'm still undecided.
What I did learn recently is the LoRa doesn't "add" information as much as it replaces or overrides existing data within the model. On the other hand RAG doesn't affect the model beyond the current context. RAG ia kind of like a database query in the moment; it's forgotten once you start a new chat.
The net result of the last paragraph is that the current answer is "we need longer context" to hold all the data in memory for the entire conversation. Neither RAG nor LoRa do this.
In the end, for both RAG and LoRa, all we are getting back is AI formatted information a database could've given us -and the database query would have less hallucinations. The AI formatted infomation is great for rephrasing, not so great for accuracy.
randomqhacker@reddit
Better ideas: divorce?
FlyingDogCatcher@reddit
RIP your marriage
emptinoss@reddit
!remindme 2 days
meccaleccahimeccahi@reddit
You could do this with python on a second vs waiting for ai to get it wrong.
Awwtifishal@reddit
I don't think training a model is useful for that purpose. Also RAG doesn't cut it either. Unless you build a RAG database in a way that it contains the answers to the questions. It's not hard to do for someone that codes, but as far as I know there's no out-of-the-box solution for these.
DerekMorr@reddit
Did she consent?
SaltyRemainer@reddit
There are probably better small models than llama3.2:3b now. Qwen 3 1.4b is incredible in my testing. Nevertheless though, I love this idea.
PontiacGTX@reddit
Function calling is kinda broken for Qwen at least the format for the function calling uses mark down<><> rather a json
kkiran@reddit
Isn't this more statistical analysis and/or RAG? I would love to learn how this turns out and if it can get you the results you are looking for!
DifficultyFit1895@reddit
I’ve been thinking some combination of LLM, RAG+Knowledge graphs plus agentic use of python NLP / statistical packages.
kkiran@reddit
Wow, you have the full spectrum covered! Please post your findings. This use case you are looking for can be applied to some industrial use cases as well. Print money while you are at it!
DifficultyFit1895@reddit
Well, all I’ve been doing is thinking about it so far. This is mostly a hobby for me, and I have limited time to pursue it with family and professional responsibilities. I’m in a field where this is tangentially related, so we wouldn’t be developing these solutions but would be customers of them at the enterprise level.
Maegom@reddit
Some of this can be done with simple data analysis. As for the more complex ones, I feel like you can put the data in an Excel sheet and loop through the chat chunks and classify each row based on what it contains and what you're looking for, then just count or filter the result and you'll get your stat.
Local_Philosopher_49@reddit
Check out mem0, I think their products(can be locally deployed) might work for you. They extract entities and relationships from conversations (simplified explanation of mem0). Their paper was a great read, the mem0g with graph db as a back end might work better with temporal relationships.
omegaindebt@reddit
LLMs will not be good with this. Instead, try either statistical methods with keyword matching or synonym matching with small models like bert
Or you can go the route of enriching your entire chat through the llm, and have it output the chats in a JSON format with meta tags such as time, date etc, and llm generated tags like mood, keywords, tone, etc. the tags will have to be created while keeping in mind what kinda questions you wanna ask it
fingertipoffun@reddit
for love you counts, it doesn't require llm and llm can't count.
apologises needs a tag run IE LLM tagging all messages with an apologie to the other and then count them.
what was discussed might work with rag.
number of fights is another tagging run and summing.
list all the people we talked about is a run looking for people discussed and then a uniq to remove duplicates on a run.
Shoddy-Tutor9563@reddit
OP never appeared in comments. It's a bad sign
Shoddy-Tutor9563@reddit
Chunking by tokens is a bad idea. Chunk by conversations. Go figure it out yourself what is the best value for the time gap between your chats that will split the dataset into different reasonable conversations.
RAG alone will be miserable here as ppl say. You need something better than this. Like load your conversations into a relational database, so you could post process them and extract additional information. And be prepared that in order to answer another "who was more in a " you'll need to extract that for every connected conversation and only then you'll be able to know that.
But it's all doable
thisoilguy@reddit
You could build a nice word map to visualise this better and much simpler.
tangawanga@reddit
Just upload the entire convo to a gpt
Ok-Palpitation-905@reddit
Can't you just feed the full text to gpt oss, which has a huge context. Along with your question. I suspect that may work
FullOf_Bad_Ideas@reddit
I trained Yi 34B 200k on my chats a few years ago. LoRA.
It was fun, not what you're aiming for here - it emulated me or her in the convo instead. You could do it with llama 3.2 3b too and you both could play with it to get some introspection on how the other side perceives you - I also found it surprisingly insighful to re-read my own chats from the perspective of the receiver.
For your usecase, you need more context available to the model at once. Tokenize the txt, see how much it is. Try some long context models like Jamba 3B Reasoning 256K, Jamba Mini 256K or Seed OSS 36B 512K and stuff it in context. I don't know you so I don't know how many tokens you have in chats but I'm guessing there's a good chance it will fit. If not, split by years I guess.
bralynn2222@reddit
This is a very large endeavor to be honest firstly, you’re going to at the very least need to perform continued pre-training to add this knowledge to its data pool if you will fine-tuning only teaches it to use the data it has access too so for it to work properly via simplifying tuning you would have to like others have suggested connect it to a rag database and then fine-tune it on how to process/perform data analysis on the given data that’s your simplest route now if you wanna go down the pre-training route you would have to first perform the pre-training itself, adding the data to the model then you would have to create a fine-tuning data set targeted at the model performing data analysis over its entire knowledge pool which is a practical AI challenge
inmadisonforabit@reddit
Why would you train an LLM on this? It doesn't make sense.
No_Dig_7017@reddit
Hmm maybe it's cheating but what about feeding it to gemini? It's 1m context length should be enough for all your chat history likely.
Hot-Elk-8720@reddit
Well thats a sure way to solve some of your relationship problems and remember the anniversary and pain points...
planetwords@reddit
Jesus how dsytopian. lol
The_GSingh@reddit
I would just chunk it.
Have your questions and pass your chats day by day (or hour by hour or message by message depending on how it goes) and have the llm increment each answer.
For who said xyz more just have it keep a count and so on. You could also probably do this without a llm better but where’s the fun in that.
Btw llms suck at this without structure and taking it step by step.
mr_birkenblatt@reddit
To answer those questions turn the script around. An the llm too create metadata for you that can answer this question and then query the metadata. Or, even simpler, ask your questions and give the llm question+message pairs. Then tally everything up
bachree@reddit
I would first evaluate each conversation with LLM and generate tags based on preset questions. Such as is there a fight. And enrich convos with tags. Then ingest the convos into vector db with tags as metadata. For learning experience creating a graph schema with neo4j is an option if you want to get fancy. Then LLM can use the database to query both based on similarity and also on the tags.
No_Brilliant_1371@reddit
!RemindMe 2 days
In any case, I would suggest to store each message like this:
"time"
"sender"
"receiver"
"message"
"llm description" - for each message ask llm to describe the message to be able to be found for semantic search, you will need local llm and multithreading because otherwise it's gonna take forever..
Or langgraph but havent worked much with it
Nabukov@reddit
I would build an Agentic workflow.
Think ClaudeCode/Cursor for your chat history. Yes RAG it but as a tool for semantic search the Agent the use.
You ask to count some type of interaction->the agent searches thru the history and ->builds up the list, just like your coding agent is looking up relevant files in codebase.
You might need a bigger 8b LLM or finetuned 3b for this.
Heavy lifting would be the scaffolding and the tools required.
Some cases like “who’s more sarcastic” would require frontier LLMs.
Good luck !
M4K4T4K@reddit
I'm sure you already know what you're doing(as in your decisions). I hope you're secure enough to open that box.
-dysangel-@reddit
obviously not if he feels the need to count who's said "I love you" more..
some_user_2021@reddit
Psychoanalyze my wife using our conversations, give me the best arguments to win my discussion.
What are my wife's weaknesses so I can assert control over her?
Can you tell from our conversations if my wife has been unfaithful?
SpaceChook@reddit
Also why am I better and righter, please provide examples and
Medium_Chemist_4032@reddit
The actual most probable outcome is simply he will discover that said wife isn't that much romantically into him. Also, re:
> weaknesses so I can assert control over her
It's probably already done, by wife over OP.
Just a hunch. Probably am totally wrong
Some-Ice-4455@reddit
Not sure if it's viable for your application just a thought and would involve some work but would it be easier for the model to understand if you made those in to markdowns?
anandfire_hot_man2@reddit
!Remind me 4 days
ed_ww@reddit
Maybe a path is to distill the conversation by running a pipeline which goes through chunks of conversation and tags them with different variables such as, sender, receiver, dates, key learnings, sentiment of the messages, summary of chunk,key topics covered and any other variable you find relevant. each chunk with an id etc. feed this to the LLM, evaluate the max tokens of each chunk, set that as chunk size, some 10-20% of overlap. Then evaluate the best top-k for your use cases (the qs you had). Not so sure you will get exact quantitative results without having to dump the whole contents of the distillation into context. But could take you the right direction.
aiplusautomation@reddit
RAG won't work because it retrieves semantic chunks up to a limit. It won't retrieve ALL docs then calculate quantities. Fine tuning won't work because, while the data is being added to training, it still wouldn't be quantitative.
You need a knowledge graph. That way you can match conversation entities with specific labels and those can be counted.
Check out Zep
yangop@reddit
!remindme 3 days
texasdude11@reddit
What you need is something called Temporal aware Knowledge Graphs' custom implementation.
Kind of what Graphiti has implemented but tuned for your usecase, so I advocate for a specific implementation that you'd wanna implement.
MDT-49@reddit
Use RAG and an embedding & reranker model to find the chunks based on the query (question) and semantic meaning (sarcasm, apologetic, etc.). Use statistical tooling to count them and assign them to the right person.
If that's done, you can download an uncensored RP model to keep you company after your divorce.
Skhadloya@reddit
Too much data, use a strong llm to summarise structurally weekly, store with metadata, then rag should work (with metadata filtering)
Plane_Ad9568@reddit
Next step let llm copy your style and respond on ur behalf! Win-win More video games time and happy wife
Hyiazakite@reddit
A simple RAG pipeline will not work, and neither will fine tuning. I'm just brain storming now. You could store each message as a vector to find matches using vector search sure, or you could use BM25 without the need for embedding. Make sure you extract as much metadata in separate fields like date, time, etc. Some precomputed metadata using NER for keyword extraction? incentive analysis using BERT? Then you'd need a custom pipeline to build different queries depending on what you ask, which I'm guessing you could do with an agentic workflow and a custom MCP server.
_qeternity_@reddit
What has led you to believe that an LLM could do this??
SpecificWay1954@reddit
Hey just make sure your data don't get leaked
vinilios@reddit
they're on whatsup, doesn't that mean they are already leaked somehow?
SpecificWay1954@reddit
It's going to be double leaked 😆
PsychohistorySeldon@reddit
You have 2 types of questions:
- Qualitative ones: what was discussed, list all people we talked about
- Quantitative: who apologizes more / who has said I love you more
For the latter, you'll have to set up a proper pipeline. Either you structured the data with pre-determined attributes beforehand and store it as such, or you keep it unstructured but vectorized, and use OLAP, or an abstraction layer that: a) builds the query, b) extracts the semantically meaningful data from that query, c) performs the actual math/analysis from the data.
MakerBlock@reddit
!remindme 5 days
One-Mud-1556@reddit
Last year, I worked on a project that involved processing a large number of PDFs. I used PrivateGPT to summarize them and generate reports. It allows you to configure various local models, and it worked well for my needs.
I’m not sure if there are any newer projects now, but at the time, it was simple to use and let me focus on analyzing the data rather than building the processing tools myself.
Currently, I’m using Gemini because of its large context window, and since I don’t need to process a large collection of PDFs anymore, I no longer use PrivateGPT. Still, it was very useful for that project.
Own_Ambassador_8358@reddit
Start with the easiest solution. Split this conversation into per day/week/month and pass it to LLM as you normally would. There is no need to learn anything or train. Then just summarize the output.
Borkato@reddit
This is actually an incredible idea and I’d love to hear the answers you get! !remindme 2 days
I can see rag working, but aside from that I don’t have many ideas haha
roadwaywarrior@reddit
That probably only represents a subset of data, 12 weeks times the number of years
RemindMeBot@reddit
I will be messaging you in 2 days on 2025-10-14 16:59:10 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)