CloseAI's DeepResearch is insanely good... do we have open source replacements?

Posted by TimAndTimi@reddit | LocalLLaMA | View on Reddit | 49 comments

IDK if such thing exists outside openai. If so, please let me know.

I am actually feeling okay with the crazy subscription fee for now because of deep research is actually very useful in terms of reading a ton of online resources in depth. (vastly superior than 4o's ordinary online search).

Still, it would be nice to run it with open sourced weights.

[-]

KonradFreeman@reddit

This is my guide on using Open Deep Research:

https://danielkliewer.com/2025/02/05/open-deep-research

You could use smolagents CodeAgent class like they did in this research:

https://huggingface.co/blog/open-deep-research

This is the repo:

https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research

This is how I converted it to use Ollama for some reason:

https://danielkliewer.com/2025/02/05/ollama-smolagents-open-deep-research

You can use any model you want with it.

[-]

Strel0k@reddit

I setup and tested the smolagents version and it's not even close to Deep Research, maybe comparable to normal Web Search functionality.

I suspect OpenAI is using some combination of scraping + vision based web browsing (Operator) to achieve such high performance.

[-]

KonradFreeman@reddit

Yes, they definitely have more resources and locally run models are not going to be even close to what they can provide, but I still like the CodeAgent class and think it is a good alternative to other frameworks. That is just me though. I have a bias towards locally run set ups.

[-]

anthonybustamante@reddit

Does this not require firecrawl or any other API? I wonder how it performs the research. thanks for sharing

[-]

KonradFreeman@reddit

It uses DuckDuckGoSearchTool : https://python.langchain.com/docs/integrations/tools/ddg/

The main aspect is that it uses the CodeAgent class which uses code rather than JSON to express its actions which leads to a much more efficient use of context.

[-]

Charuru@reddit

Does this work better/easier if I give it a predefined bunch of files?

[-]

KonradFreeman@reddit

I imagine it would for certain use cases, have not done so myself so don't know for sure.

[-]

anthonybustamante@reddit

Thanks for sharing! I’m gonna play around with this..

[-]

spookperson@reddit

I've been asking my coworkers to give me queries to run through Perplexity Deep Research, gpt-reseacher (https://github.com/assafelovic/gpt-researcher ), and HF's Open Deep Research for feedback/comparison. I use Fireworks R1 as the research strategist. The conclusion so far is that none of them are as high quality as OpenAI's but that OpenAI is not 10x as good as Perplexity (given the $20/month plan vs $200/month)

[-]

Icy_Confection6188@reddit

Trase v.03 is #1 on Gaia.

Has anyone tried this agent?

[-]

Kerim45455@reddit

It is not possible to find anything close to them because "DeepResearch" is supported by o3. We don't know how good a model the o3 is yet, but it should be far superior to the o3 mini.

[-]

Singularity-42@reddit

I know Google and Perplexity have similar seep research tools. Possibly others. How do these stack up to OpenAI's. I'd like to at least check it out but $200/mo is steeeeep.

[-]

TimAndTimi@reddit (OP)

If I want THE best and biggest model with unlimited access while having the deep research framework ready to use.... paying 200usd a month seems the cheapest way at least by Feb of 2025....

I like solutions like ollama plus some fancy UI. It's good for playing around and seeing your GPU server burning its brain to squeeze out tokens. But I do realize I have better use for the GPU server assigned to me rather than hanging a 70b/671b model in vram.

[-]

TimAndTimi@reddit (OP)

Just a random observation irrelevant to the topic.

I feel like o3 mini and o3 mini high's performance for GPT pro and GPT plus are vastly different, indicated by the pro version seems to able to one-shot a lot of code problem of mine but the plus version cannot.

[-]

gartstell@reddit

Since you're on the topic, can specific resources—such as articles, books, etc.—be added to Deep Search, like in Google NotebookLM? Or is it limited to what it finds in open access?

[-]

TimAndTimi@reddit (OP)

Do you mean adding resouces manually by uploading? Simple answer is yes.

However, for whatever reason, o1 pro does not accept documents yet. Other models can work between uploaded contents while doing deep research at the same time.

But I found the model can actually visit a lot of resource by itself and now I am more likely to just drop it the arxiv link of a paper and let it figure out how to visit the resource as well as checking the paper's code repo automatically.

[-]

ttkciar@reddit

Is it better than just using RAG with a curated database? Database lookups are a lot faster than web searches, and there's a lot of crap information on the internet.

I use RAG with a database populated with Wikipedia content, and it does a pretty good job.

[-]

TimAndTimi@reddit (OP)

Assuming I can curate this database based on vastly diverse sources... but in reality I don't have neither the time and compute power to run this service entirely locally.

openai's deep research actually works very well within my scropt of usage, e.g., read the code of a papar and then try to explain the code based on the open access paper.

[-]

vonzache@reddit

"I use RAG with a database populated with Wikipedia content, and it does a pretty good job."

How you have technically done this? Ie. is there ready made projects for this and do you use only one language wikipedias or multiple language wikipedias as content?

[-]

Orolol@reddit

Not every human knowledge is in Wikipedia

[-]

ttkciar@reddit

What Wikipedia has tends to be high quality, though, whereas the internet is full of lies, slop, and low-effort content.

I would rather have a quality, incomplete database than a huge database of shit, but you do you.

[-]

vonzache@reddit

No but with good RAG which would be dynamically updated it would work as commonly curated memory for AI.

[-]

ttkciar@reddit

I described my project recently here, but there's no need to use my special-snowflake project. Googling ["wikipedia" "retrieval augmented generation" site:github.com] brought up a few working systems, of which this looks the most promising:

https://github.com/jzbjyb/FLARE

My project only uses the english Wikipedia, but FLARE looks like it should be easy enough to add as many different wikipedia dumps as you like.

[-]

blendorgat@reddit

It is drastically better, yes. Deep research has worked great for me on topics that Wikipedia does not even mention.

[-]

ReasonablePossum_@reddit

Wikipedia really sucks on anything remotelly.connected to a party with power tho. Basically only for base sciences. Anything else is biased.

[-]

Koksny@reddit

Perplexity.

They are using R1 for their DR system.

[-]

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[-]