Is there a guide from 0 to full running RAG?

Posted by thegreatcerebral@reddit | LocalLLaMA | View on Reddit | 10 comments

I'm looking for a guide to go from having nothing to having a full running RAG all self-hosted, on-prem. My understanding is that I could point it to my KB and all the files that have been developed like internal training, how-to documents etc. and then I can ask questions and it will use those when responding correct?

If I am not understanding correctly then please correct me but my understanding is that is what a RAG does, you point it to your information and it then uses that for responses.

[-]

Pacyfist01@reddit

RAG is actually two neural networks working on a single problem.
1) Something called a "Vector Database" that stores the data (like every paragraph inside a book)
2) A standard issue LLM that can reply in human readable form.

The actual prompt sent to the LLM is Hi I have just found this data inside my database {data here} can you answer a question that the user asked? {question}

You can use GPT4All to just do this without any coding.
https://digitaconnect.com/local-rag-with-gpt4all-local-docs/

[-]

ForceBru@reddit

Vector databases aren't neural networks, so there's only one neural network: the LLM

[-]

JockY@reddit

This is incorrect, the embeddings for the vector DB are created by a model, so there are indeed two models: embedding and LLM.

[-]

wurst_mann@reddit

The vectors are created by an embedding model.

[-]

ForceBru@reddit

Right, so two neural nets: the embedding model and the LLM. AFAIK, one can also take embeddings from LLMs, so you end up with one model. But these are very high-dimensional and need more storage

[-]

Pacyfist01@reddit

Technically embeddings are usually done by BERT network (Bidirectional Encoder Representations from Transformers) specifically SBERT (Sentence-BERT) They take text ans input and a vector as output. They are trained to make similar vectors if inputs are semantically similar.

[-]

wurst_mann@reddit

And as regular llms are bigger they not only are bigger but are much slower which matters if you have to process a lot of data.