Good people of the wool, how about Deep Research?
Posted by RedParaglider@reddit | LocalLLaMA | View on Reddit | 12 comments
One thing I absolutely love about the paid platforms is the deep research system. Is there a good one on local?
I have SearXNG set up, and it's ok, it doesn't seem to pull back many google results but the resutls it can pull back are ok.
I'm more interested in the system though. It's obvious that it has a multi agent system to summarize, and maybe levels of agents to summarize those agents findings. Is there a great system to handle this sort of stuff on local currently?
SlowestGenji@reddit
flagged this one to try and adapt https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop but like another poster said, working through the search/access issues is probably the tricky part.
oldschooldaw@reddit
I am a big fan of this one.
https://github.com/LearningCircuit/local-deep-research
I have it set up running off a 3060 and I think it does a great job when I run out of gpt deep research queries.
AD7GD@reddit
A random sample from the last deep research query I did:
So the main problem, as I see it, is that paying for ChatGPT or Claude is an order of magnitude cheaper than paying for API based search/retrieval to power your own deep research. Everything is increasingly locking down due to the volume of AI queries, so if you don't pay someone else to do it, you are in a constant battle to keep your search/retrieval tools working. I remember one of the first "deep research at home" projects I downloaded. I was confused about why it only hit one search engine despite support for multiple. Turns out, the rest had been commented out one at a time as they quit working.
AdventurousFly4909@reddit
This is LOCALllama
ai_guy_nerd@reddit
Local deep research usually comes down to how you handle the loop between the search tool and the summarizer. If you want something structured, CrewAI or AutoGen are the go-to frameworks for defining those "levels" of agents you mentioned. They let you set up a researcher agent to gather the raw data and a manager agent to critique and refine the summary.
The real trick is the search quality. SearXNG is a good start, but if you can hook into an API like Bright Data or Brave Search, the results improve drastically. For orchestrating the whole thing on a VPS or local box, OpenClaw is another interesting way to handle the execution layer.
The bottleneck is usually the context window when you start aggregating multiple pages of research. Using a proper RAG pipeline or just a very large context model like Gemini 1.5 Pro usually solves the "too much info" problem.
Borkato@reddit
Thanks Claude
frozenYogurtLover2@reddit
deer flow by bytedance is the closest i can think of
RedParaglider@reddit (OP)
One updoot for you dude. That's a cool looking project. My thought was to have something do deep research and build me a knowledge base inside my projects at night.
https://github.com/bytedance/deer-flow
APFrisco@reddit
I do like the idea of having a local LLM work on something like this overnight; tokens/sec metrics aren’t as important overnight, and anyways I’ve always felt like coming back to a deep research prompt after a while feels like opening a present haha.
DataPhreak@reddit
Been using DeepWiki. It doesn't get everything right, but it's good enough for what your use case sounds like: https://deepwiki.com/DataBassGit/AgentForge/ Nice flowcharts. Looks like Mermaid. Good Table of Contents.
Genebra_Checklist@reddit
I'm building a graphRAG with books and articles. A good local deep search would be amazing
KvAk_AKPlaysYT@reddit
So I built something... interesting... ya needed DEEP research right?
https://github.com/Aaryan-Kapoor/24hr-research-agent