Improving RAG Results with OpenWebUI - Looking for Advice on Custom Pipelines & Better Embeddings

Posted by b5761@reddit | LocalLLaMA | View on Reddit | 11 comments

I’m currently working on improving the RAG performance in OpenWebUI and would appreciate advice from others who have built custom pipelines or optimized embeddings. My current setup uses OpenWebUI as the frontend, with GPT-OSS-120b running on an external GPU server (connected via API token). The embedding model is bge-m3, and text extraction is handled by Apache Tika. All documents (mainly internal German-language PDFs) are uploaded directly into the OpenWebUI knowledge base.

Setup / Environment:

Observed Issues:

  1. The RAG pipeline sometimes pulls the wrong PDF context for a query – responses reference unrelated documents.
  2. Repeating the same question multiple times yields different answers, some of which are incorrect.
  3. The first few responses after starting a chat are often relevant, but context quality degrades over time.
  4. I suspect the embedding model isn’t optimal for German, or preprocessing is inconsistent.

I’m looking for practical advice on how to build a custom embedding pipeline outside of OpenWebUI, with better control over chunking, text cleaning, and metadata handling. I’d also like to know which German-optimized embedding models from Hugging Face or the MTEB leaderboard outperform bge-m3 in semantic retrieval. In addition, I’m interested in frameworks or methods for pretraining on QA pairs or fine-tuning with document context, for example using SentenceTransformers or InstructorXL. How does this pre-training work? Another question is whether it’s more effective to switch to an external vector database such as Qdrant for embedding storage and retrieval, instead of relying on OpenWebUI’s built-in knowledge base. Does a finetuning or training / customized PDF-Pipeline work better? If so are there any tutorials out there and is this possible with Openwebui?

Thanks for your help!