We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local

Posted by ComplexIt@reddit | LocalLLaMA | View on Reddit | 64 comments

LDR maintainer here. Thanks to the strong support of r/LocalLLaMA community LDR got very far. I haven't reported in a while because I thought I was not ready for another prominent post in one of the leading outlets of Local LLM research.

But I think the LDR community finally there again. I think it is finally time to report again.

Setup

Benchmarks (fully local LLM with web search)

Model SimpleQA xbench-DeepSearch
Qwen3.6-27B 95.7% (287/300) 77.0% (77/100)
Qwen3.5-9B 91.2% (182/200) 59.0% (59/100)
gpt-oss-20B 85.4% (295/346)

sample size is small, but the benchmarks were not rerun multiple times and you can see from the other rows that this is unlikely just chance. Full leaderboard: https://huggingface.co/datasets/local-deep-research/ldr-benchmarks

Important framing — these are agent + search scores, not closed-book

However, also note that these are similar benchmarks results to Perplexity Deep Research (93.9%), tavily (93.3%) etc. [Tavily forces the LLM to answer only from retrieved docs (pure retrieval test). Perplexity Deep Research is an end-to-end agent and discloses no grader or sample size. ]

Even if our results where only 90% it would already be a great success.

Also I can confirm from using it daily that these results feel consistent with my performance on random querries I do for daily questions.

Caveats:

The thing that surprised me:

Results seem to track tool-calling quality more than raw size for local deep research. The langgraph_agent strategy hammers the model with multi-iteration tool calls, parallel subagent decomposition, and structured output — exactly the axis where the newer Qwen generations have improved most. Hypothesis only; if anyone wants to design an ablation we'd love the data.

Some cool LDR features that I want to additionally highlight:

Repo: https://github.com/LearningCircuit/local-deep-research

Happy to share strategy configs, help reproduce the Qwen runs

Thanks to all the academic and other open source foundational work that made this repo possible.