Looking for a local AI tool that can extract any info from high-quality sources (papers + reputable publications) with real citations

Posted by Inflation_Artistic@reddit | LocalLLaMA | View on Reddit | 12 comments

I’m trying to set up a fully local AI workflow (English/Chinese) that can dig through both scientific papers and reputable publications things like Bloomberg, Economist, reputable industry analyses, tech reports, etc.

The main goal:
I want to automatically extract any specific information I request, not just statistics, but any data, like:

numbers
experimental details
comparisons
anything else I ask for

And the most important requirement:
The tool must always give real citations (article, link, page, paragraph) so I can verify every piece of data. No hallucinated facts.

Ideally, the tool should:

run 100% locally
search deeply and for long periods
support Chinese + English
extract structured or unstructured data depending on the query
keep exact source references for everything
work on an RTX 3060 12GB

Basically, I’m looking for a local “AI-powered research engine” that can dig through a large collection of credible sources and give me trustworthy, citation-backed answers to complex queries.

Has anyone built something like this?
What tools, models, or workflows would you recommend for a 12GB GPU?

[-]

ekaj@reddit

Yes and no. I have built something like what you want, but it’s not easily usable by non technical people yet, but it also sounds like you want a deep research solution as well? The biggest limiter is your VRAM and using only local models for answer generation.

Also, you will have to build a custom ETL for any data you’re ingesting as you’re describing your wanted solution as having structured/unstructured data ingest for a variety of media formats(no matter the solution you go with). Could just rip out the media ingestion module and the RAG pipeline and use those as starter pieces to help you save some time building.

https://github.com/rmusser01/tldw_server

[-]

mahmood454@reddit

Can I have a local AI tool just for extracting text out of images?
Just the text no deep thinking or anything
I have I7 8550U 8GB ram no graphic card, so is it possible?

[-]

ekaj@reddit

yea, you want OCR https://blog.ngxson.com/using-ocr-models-with-llama-cpp

[-]

No-Consequence-1779@reddit

Yes. The deep research (downloading a buncha stuff) is the easiest part.

[-]

Melodic_Coffee_833@reddit

Why local is a must have
What Is the use case , industry : usage frequency
Do you consider MS Sharepoint, Azure as local ? or IT MUST be on your machine ?

[-]

Inflation_Artistic@reddit (OP)

Local execution is important mainly for cost and simplicity. My assumption is that deep, long-running searches over large corpora (papers and reports) would be expensive in a cloud or API-based setup, especially if a single query is allowed to run for hours or even days. Running locally avoids recurring API costs, server orchestration overhead. I am fine with slow execution as long as it is reproducible and verifiable.
The domain is chemistry and physics (scientific research). Usage frequency is low, but queries are very deep: one precise, complex query that can run for hours or days. The goal is to extract small but critical pieces of information (numerical values, experimental conditions, comparisons, assumptions) scattered across many papers and reports. This is not interactive Q\&A; it is closer to an offline research pipeline.
By "local" I mean self-hosted within my own network, not cloud-managed services. I am open to using two machines on the same local network if that helps with parallelization. For example, one weaker machine could handle parts of the pipeline such as document ingestion, OCR, indexing, or retrieval (RAG), while the main machine (RTX 3060 12GB) would handle LLM inference and information extraction. As long as everything is self-hosted, offline-capable, and under my control, a distributed local setup is acceptable.

[-]

Melodic_Coffee_833@reddit

I have been building RAG for 2 years, and would like to share :

The most intense part should be ingestion of massive documents and creation of their indexing in 3 layers (dense, sparse, graph) - local brings nothing here
Search is instant on ranking similarity top X, reranking top Y , even if you deep dive it in multiple iterations you are talking 30s max..
Local business case would mean you have gov secret sources, you don't want to vectorize, even in a multi tenants env.
The current set up you are talking about would cost 50X what elastic cloud would help you do..

[-]

thatguyinline@reddit

Lightrag.

[-]

exaknight21@reddit

I posted a few days ago about this very use case. You wanna use the qwen3-2b-VL for this. Strictly because the accuracy was my original key concern too.

Coincidentally, I used 3060 12 GB too.

The github: https://github.com/ikantkode/qwen3-2b

[-]

beppled@reddit

You can try the Jan series of models, if you're thinking of things like mcp tools and browser use and they'll fit on your GPU perfectly ...

But coming to the main part of your question ... Honestly, from what I've experienced, you'd be better off using Claude's Research Feature or even Perplexity (if you could snag a 1 year free somewhere). I just wanna save you some frustration 🥹

Local Model are great but they are task and domain specific ... Jan maybe great at using tools, but I've seen it hallucinate left and right. Gemma 3 12B is great, but bad at tools.

[-]

Inflation_Artistic@reddit (OP)

It's not that I don't want to spend money on a subscription, I just don't think it's what I need. In my case, I need to read through as much data as possible and get the most out of it. So I'll probably have to process hundreds of files, and ordinary deep research tools can only handle a few dozen at most.

[-]

Permtato@reddit

I'm not affiliated in any way but used this a fair bit last year and found it pretty good, using both local models and external.

kotaemon

There's probably similar more recently updated / less open issues repos but should do what you're looking for if you hook it up with ollama for local model support.

For 'Chinese + English' and with 12GB ram, you could comfortably use one of the deepseek R1 distils like DeepSeek-R1-Distill-Qwen-1.5B.