PipesHub - Open Source Enterprise Search Platform(Generative-AI Powered)

Posted by Effective-Ad2060@reddit | LocalLLaMA | View on Reddit | 11 comments

Hey everyone!

I’m excited to share something we’ve been building for the past few months – PipesHub, a fully open-source Enterprise Search Platform.

In short, PipesHub is your customizable, scalable, enterprise-grade RAG platform for everything from intelligent search to building agentic apps — all powered by your own models and data.

We also connect with tools like Google Workspace, Slack, Notion and more — so your team can quickly find answers and trained on your company’s internal knowledge.

You can run also it locally and use any AI Model out of the box including Ollama.
We’re looking for early feedback, so if this sounds useful (or if you’re just curious), we’d love for you to check it out and tell us what you think!

🔗 https://github.com/pipeshub-ai/pipeshub-ai

[-]

nvrcode@reddit

Would this tool be easy to integrate into OpenWebUI?

[-]

Effective-Ad2060@reddit (OP)

If you our UI, trust me you will not like OpenWebUI user interface(our citations system works seamlessly across all file types).
Having said that, we are releasing MCP server and Open AI compatible APIs in upcoming release.

[-]

CaptTechno@reddit

how do i use a local llm? ollama or Open AI compatible APIs dont seem to be supported

[-]

Effective-Ad2060@reddit (OP)

Both Ollama and Open AI compatible APIs are supported. You should be able to see both of them in AI models provider list

[-]

CaptTechno@reddit

in the documentation it says coming soon

[-]

Effective-Ad2060@reddit (OP)

This part of the documentation is outdated. I'll fix it very soon

[-]

optimisticalish@reddit

A couple of things I don't see mentioned. 1) How many documents can it ingest and is there a practical limit? 2) Can it mingle its search results with those from the open Web - e.g. you feed it a list of 3,000 website URLs, it goes and downloads those sites and ingests them as well?

[-]

Effective-Ad2060@reddit (OP)

Thanks for the questions!

PipesHub is built to be highly scalable and fault-tolerant — it can handle millions of documents without issues.
Support for ingesting content from the open web (like a list of URLs) is coming soon! You’ll be able to crawl and index any webpage as part of your search.

[-]

optimisticalish@reddit

Thanks. The problem with crawling is that many websites (e.g. academic journals with several hundred PDFs) forbid crawlers that are not the Googlebot. Downloading the entire site locally, by an agent that looks to the site like a regular browser, then ingesting, would be the better option in such cases. I'm not talking about vast ecommerce sites - just relatively small ones (e.g. an open-access academic journal with 20 issues published).

[-]

Chromix_@reddit

This doesn't seem to be built in an extensible (easily customizable) way.

When you for example want to add a new embedding- or LLM provider then this requires editing retrieval_service.py, ai_models_named_constants.py and maybe other files. For an extensible product I would've expected a self-registering architecture, where the user can provide new types of embedding- or LLM providers that import a utility class to register itself - quick & easy via class name for example. This class name can then be specified via config to be used. That way the user can have customization side-by-side with the product, without having to maintain a local fork with merges.

[-]

Effective-Ad2060@reddit (OP)

Thanks for pointing this out — you make a great point.

Right now, most LLMs that support the OpenAI API spec and embedding models like SentenceTransformers work out of the box. But you're right — adding a custom provider isn't as smooth as it could be.

We’ll definitely think about adding support for a more extensible setup where users can register their own providers. It should relatively straightforward to support something like this

If this is something you're interested in, we’d love your input or even a small PR to get it started!