Is there a guide from 0 to full running RAG?
Posted by thegreatcerebral@reddit | LocalLLaMA | View on Reddit | 10 comments
I'm looking for a guide to go from having nothing to having a full running RAG all self-hosted, on-prem. My understanding is that I could point it to my KB and all the files that have been developed like internal training, how-to documents etc. and then I can ask questions and it will use those when responding correct?
If I am not understanding correctly then please correct me but my understanding is that is what a RAG does, you point it to your information and it then uses that for responses.
Pacyfist01@reddit
RAG is actually two neural networks working on a single problem.
1) Something called a "Vector Database" that stores the data (like every paragraph inside a book)
2) A standard issue LLM that can reply in human readable form.
The actual prompt sent to the LLM is
Hi I have just found this data inside my database {data here} can you answer a question that the user asked? {question}
You can use GPT4All to just do this without any coding.
https://digitaconnect.com/local-rag-with-gpt4all-local-docs/
ForceBru@reddit
Vector databases aren't neural networks, so there's only one neural network: the LLM
__JockY__@reddit
This is incorrect, the embeddings for the vector DB are created by a model, so there are indeed two models: embedding and LLM.
wurst_mann@reddit
The vectors are created by an embedding model.
ForceBru@reddit
Right, so two neural nets: the embedding model and the LLM. AFAIK, one can also take embeddings from LLMs, so you end up with one model. But these are very high-dimensional and need more storage
Pacyfist01@reddit
Technically embeddings are usually done by BERT network (Bidirectional Encoder Representations from Transformers) specifically SBERT (Sentence-BERT) They take text ans input and a vector as output. They are trained to make similar vectors if inputs are semantically similar.
wurst_mann@reddit
And as regular llms are bigger they not only are bigger but are much slower which matters if you have to process a lot of data.
privacyparachute@reddit
You could use www.papeg.ai for this. Just drag some documents into the UI, and then click the document-search icon in the bottom left.
Co0lboii@reddit
this was just posted https://www.reddit.com/r/LocalLLaMA/comments/1g4aicc/autorag_huggingface_space_release_optimize_rag/
asankhs@reddit
Something like this from langchain should be a good starting point -https://python.langchain.com/docs/tutorials/rag/