Good people of the wool, how about Deep Research?

Posted by RedParaglider@reddit | LocalLLaMA | View on Reddit | 12 comments

One thing I absolutely love about the paid platforms is the deep research system. Is there a good one on local?

I have SearXNG set up, and it's ok, it doesn't seem to pull back many google results but the resutls it can pull back are ok.

I'm more interested in the system though. It's obvious that it has a multi agent system to summarize, and maybe levels of agents to summarize those agents findings. Is there a great system to handle this sort of stuff on local currently?

[-]

SlowestGenji@reddit

flagged this one to try and adapt https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop but like another poster said, working through the search/access issues is probably the tricky part.

[-]

oldschooldaw@reddit

I am a big fan of this one.

https://github.com/LearningCircuit/local-deep-research

I have it set up running off a 3060 and I think it does a great job when I run out of gpt deep research queries.

[-]

AD7GD@reddit

A random sample from the last deep research query I did:

Research completed in 32m - 45 citations - 468 searches

So the main problem, as I see it, is that paying for ChatGPT or Claude is an order of magnitude cheaper than paying for API based search/retrieval to power your own deep research. Everything is increasingly locking down due to the volume of AI queries, so if you don't pay someone else to do it, you are in a constant battle to keep your search/retrieval tools working. I remember one of the first "deep research at home" projects I downloaded. I was confused about why it only hit one search engine despite support for multiple. Turns out, the rest had been commented out one at a time as they quit working.

[-]

AdventurousFly4909@reddit

This is LOCALllama

[-]

ai_guy_nerd@reddit

Local deep research usually comes down to how you handle the loop between the search tool and the summarizer. If you want something structured, CrewAI or AutoGen are the go-to frameworks for defining those "levels" of agents you mentioned. They let you set up a researcher agent to gather the raw data and a manager agent to critique and refine the summary.

The real trick is the search quality. SearXNG is a good start, but if you can hook into an API like Bright Data or Brave Search, the results improve drastically. For orchestrating the whole thing on a VPS or local box, OpenClaw is another interesting way to handle the execution layer.

The bottleneck is usually the context window when you start aggregating multiple pages of research. Using a proper RAG pipeline or just a very large context model like Gemini 1.5 Pro usually solves the "too much info" problem.

[-]

Borkato@reddit

Thanks Claude

[-]

frozenYogurtLover2@reddit

deer flow by bytedance is the closest i can think of

[-]

RedParaglider@reddit (OP)

One updoot for you dude. That's a cool looking project. My thought was to have something do deep research and build me a knowledge base inside my projects at night.

https://github.com/bytedance/deer-flow

[-]