I spent 2 years building privacy-first local AI. My conclusion: Ingestion is the bottleneck, not the Model. (Showcase: Ollama + Docling RAG Kit)

Posted by ChapterEquivalent188@reddit | LocalLLaMA | View on Reddit | 4 comments

Hi r/LocalLLaMA,

I’ve been working on strictly local, data-privacy-compliant AI solutions for about two years now. Dealing with sensitive data meant that cloud APIs were never an option—it had to be air-gapped or on-prem.

The biggest lesson I learned:

We spend 90% of our time debating model quantization, VRAM, and context windows. But in real-world implementations, the project usually fails long before the prompt hits the LLM. It fails at Ingestion.

Especially in environments like Germany, where "Digitalization" just meant "scanning paper into PDFs" for the last decade, we are sitting on mountains of "Digital Paper"—files that look digital but are structurally dead (visual layouts, no semantic meaning).

The Solution:

I built a self-hosting starter kit that focuses heavily on fixing the Input Layer before worrying about the model.

The Stack:

What this Kit is:

It’s a docker-compose setup for anyone who needs a "Google Code Wiki" style system but cannot let their data leave the building. It’s opinionated (Ingestion-First), strips out complex async worker queues for simplicity, and runs on a standard 16GB machine.

Repo: https://github.com/2dogsandanerd/Knowledge-Base-Self-Hosting-Kit

I’ve decided to start open-sourcing my internal toolset because I genuinely fear we are heading towards a massive wave of failed AI integrations.

We are currently seeing companies and devs rushing into RAG, but hitting a wall because they overlook the strict quality requirements for retrieval. They don't realize that "electronic paper" (PDFs) is not Digitalization. It's just dead data on a screen.

Unless we fix the ingestion layer and stop treating "File Upload" as a solved problem, these integrations will fail to deliver value. This kit is my attempt to provide a baseline for doing it right—locally and privately.

I’d love to hear your thoughts on the "Ingestion First" approach. For me, switching from simple text-splitting to layout-aware parsing was the game changer for retrieval accuracy.

Thanks !