CloseAI's DeepResearch is insanely good... do we have open source replacements?
Posted by TimAndTimi@reddit | LocalLLaMA | View on Reddit | 47 comments
IDK if such thing exists outside openai. If so, please let me know.
I am actually feeling okay with the crazy subscription fee for now because of deep research is actually very useful in terms of reading a ton of online resources in depth. (vastly superior than 4o's ordinary online search).
Still, it would be nice to run it with open sourced weights.
spookperson@reddit
I've been asking my coworkers to give me queries to run through Perplexity Deep Research, gpt-reseacher (https://github.com/assafelovic/gpt-researcher ), and HF's Open Deep Research for feedback/comparison. I use Fireworks R1 as the research strategist. The conclusion so far is that none of them are as high quality as OpenAI's but that OpenAI is not 10x as good as Perplexity (given the $20/month plan vs $200/month)
Icy_Confection6188@reddit
Trase v.03 is #1 on Gaia.
Has anyone tried this agent?
Kerim45455@reddit
It is not possible to find anything close to them because "DeepResearch" is supported by o3. We don't know how good a model the o3 is yet, but it should be far superior to the o3 mini.
Singularity-42@reddit
I know Google and Perplexity have similar seep research tools. Possibly others. How do these stack up to OpenAI's. I'd like to at least check it out but $200/mo is steeeeep.
TimAndTimi@reddit (OP)
If I want THE best and biggest model with unlimited access while having the deep research framework ready to use.... paying 200usd a month seems the cheapest way at least by Feb of 2025....
I like solutions like ollama plus some fancy UI. It's good for playing around and seeing your GPU server burning its brain to squeeze out tokens. But I do realize I have better use for the GPU server assigned to me rather than hanging a 70b/671b model in vram.
TimAndTimi@reddit (OP)
Just a random observation irrelevant to the topic.
I feel like o3 mini and o3 mini high's performance for GPT pro and GPT plus are vastly different, indicated by the pro version seems to able to one-shot a lot of code problem of mine but the plus version cannot.
gartstell@reddit
Since you're on the topic, can specific resources—such as articles, books, etc.—be added to Deep Search, like in Google NotebookLM? Or is it limited to what it finds in open access?
TimAndTimi@reddit (OP)
Do you mean adding resouces manually by uploading? Simple answer is yes.
However, for whatever reason, o1 pro does not accept documents yet. Other models can work between uploaded contents while doing deep research at the same time.
But I found the model can actually visit a lot of resource by itself and now I am more likely to just drop it the arxiv link of a paper and let it figure out how to visit the resource as well as checking the paper's code repo automatically.
ttkciar@reddit
Is it better than just using RAG with a curated database? Database lookups are a lot faster than web searches, and there's a lot of crap information on the internet.
I use RAG with a database populated with Wikipedia content, and it does a pretty good job.
TimAndTimi@reddit (OP)
Assuming I can curate this database based on vastly diverse sources... but in reality I don't have neither the time and compute power to run this service entirely locally.
openai's deep research actually works very well within my scropt of usage, e.g., read the code of a papar and then try to explain the code based on the open access paper.
vonzache@reddit
"I use RAG with a database populated with Wikipedia content, and it does a pretty good job."
How you have technically done this? Ie. is there ready made projects for this and do you use only one language wikipedias or multiple language wikipedias as content?
Orolol@reddit
Not every human knowledge is in Wikipedia
ttkciar@reddit
What Wikipedia has tends to be high quality, though, whereas the internet is full of lies, slop, and low-effort content.
I would rather have a quality, incomplete database than a huge database of shit, but you do you.
vonzache@reddit
No but with good RAG which would be dynamically updated it would work as commonly curated memory for AI.
ttkciar@reddit
I described my project recently here, but there's no need to use my special-snowflake project. Googling ["wikipedia" "retrieval augmented generation" site:github.com] brought up a few working systems, of which this looks the most promising:
https://github.com/jzbjyb/FLARE
My project only uses the english Wikipedia, but FLARE looks like it should be easy enough to add as many different wikipedia dumps as you like.
blendorgat@reddit
It is drastically better, yes. Deep research has worked great for me on topics that Wikipedia does not even mention.
ReasonablePossum_@reddit
Wikipedia really sucks on anything remotelly.connected to a party with power tho. Basically only for base sciences. Anything else is biased.
Koksny@reddit
Perplexity.
They are using R1 for their DR system.
Brave-History-6502@reddit
It’s good but not even close to OpenAI’s performance unfortunately
Koksny@reddit
True
Charuru@reddit
Doesn’t matter what the price is if it’s useless
my_name_isnt_clever@reddit
It's far from useless, I use it many times a day now. And Perplexity's Pro plan gives access to multiple other closed models, basic image gen, and monthly API credits.
Charuru@reddit
Fair enough, it doesn't produce anything like the reports I get out of OAIDR daily but I understand it may be fine for other usecases.
my_name_isnt_clever@reddit
What's the actual difference between them? I haven't used OAI's.
Charuru@reddit
It produces 20 50k word research papers with less errors, unlike the 5k word responses from perplexity that has like 30% errors.
mosthumbleuserever@reddit
I use it and I find it quite useful for plenty of use cases.
Koksny@reddit
It's good enough for me and my job, don't see a reason to pay 100x more for something 10% better.
Neomadra2@reddit
What does 10% better mean for you? If perplexity hallucinates 11% of all paragraphs, and OAI deep research only 1% it is like night and day. In the former it would be literally unusable because you'd need to crosscheck everything
Koksny@reddit
It means it's capable of solving 90% problems i have, with enough accuracy to actually solve them. And i'm happy to pay a peanut a year instead instead of $2400 to get remaining 10% of my problems solved.
my_name_isnt_clever@reddit
What's the actual difference though? From someone who will never pay OpenAI that much money but uses Perplexity's daily.
mosthumbleuserever@reddit
I have been looking for some information somewhere to confirm what model they are using for their deep research and I have not seen them disclose it anywhere. How do you know it's R1?
Koksny@reddit
https://x.com/AravSrinivas/status/1886497667024609683 from Perplex CEO.
mosthumbleuserever@reddit
Interesting this did not come up in my search. Thank you! Maybe I should've used deep research.
Koksny@reddit
...i've googled it. Old habits i guess.
Calcidiol@reddit
RemindMe! 2 days
RemindMeBot@reddit
I will be messaging you in 2 days on 2025-02-22 23:24:28 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
Low_Reputation_122@reddit
Claude is much better than Sam’s stupid AI
KonradFreeman@reddit
This is my guide on using Open Deep Research:
https://danielkliewer.com/2025/02/05/open-deep-research
You could use smolagents CodeAgent class like they did in this research:
https://huggingface.co/blog/open-deep-research
This is the repo:
https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research
This is how I converted it to use Ollama for some reason:
https://danielkliewer.com/2025/02/05/ollama-smolagents-open-deep-research
You can use any model you want with it.
anthonybustamante@reddit
Does this not require firecrawl or any other API? I wonder how it performs the research. thanks for sharing
KonradFreeman@reddit
It uses DuckDuckGoSearchTool : https://python.langchain.com/docs/integrations/tools/ddg/
The main aspect is that it uses the CodeAgent class which uses code rather than JSON to express its actions which leads to a much more efficient use of context.
Charuru@reddit
Does this work better/easier if I give it a predefined bunch of files?
KonradFreeman@reddit
I imagine it would for certain use cases, have not done so myself so don't know for sure.
anthonybustamante@reddit
Thanks for sharing! I’m gonna play around with this..
LLMtwink@reddit
there are quite a few replications, the most common one probably being open deep research, none nearly as good as the real thing but might prove useful nonetheless
spectracide_@reddit
https://youtu.be/4M7RIbQZ_-w
pornstorm66@reddit
Google deep research?
CodigoTrueno@reddit
Yes, it does, and is insanely good. https://docs.gptr.dev/docs/gpt-researcher/getting-started/getting-started