Best platform-agnostic tools/frameworks to vectorize large wikis (not wikipedia) for RAG?

Posted by Mgeek35@reddit | LocalLLaMA | View on Reddit | 6 comments

Hi folks,

I'm working with an LLM company tailored to a special business use case. Since most LLMs were not trained on the business data, we are scraping wikis in our business and trying to build a vector database out of these wikis to use in our RAG. We want to have this database usable regardless of the RAG framework. One problem I found with things like LlamaIndex (please correct me if I'm wrong), they store the data in special objects, which are not really usable/transferable outside LlamaIndex.

[-]

andreasntr@reddit

Mongodb vector indexes are quite flexible, plus you can store and query metadata to filter out objects which do not match some given conditions at inference time. Also pgvector does the same but with a relational approach

Mgeek35@reddit (OP)

Thanks people. I really appreciate the ideas