Local query autocomplete with "classical" ML, no LLM needed

Posted by Scared-Tip7914@reddit | LocalLLaMA | View on Reddit | 8 comments

Hey guys! I know this is not fully LLM related (its still local though :D), mods feel free to delete this if you think its off topic, but I just wanted to share something I experimented with, local autocomplete without the use of LLMs or full elastisearch.

My main area is RAG and we realised that there is a bit of gap in the search box autocomplete funcionalty where you dont have to spend a bunch of time generating sample questions that users might ask in order to autocomplete their queries. So I created this tool where you just take the same pdf, docx or txt files that you use for the underlying RAG and throw them into this thing, it creates a local db, and as users type it shows suggestions to them based on the text in the docs themselves so the suggestions list is actually relevant and might guide them somewhere useful.

It uses some of the lingustic algos that predate LLMs, specifically Kneser-Ney scoring and the OG fuzzy match, so its language agnosic, with the caveat that it doesnt support logogramic languages like chinese and japanese (for now).

Check the thing out here on pypi: https://pypi.org/project/query-autocomplete/
And the repo: https://github.com/MarcellM01/query-autocomplete

ALSO if you think the idea is pure garbage or there are easier ways to do this I am also open to that lol because I have no desire to replicate/maintain something thats already solved.

[-]

DarkVoid42@reddit

this is awesome

[-]

Scared-Tip7914@reddit (OP)

Thanks, appreciate it!!

[-]

MoodDelicious3920@reddit

I didn't exactly understand ur project but what i do actually is that i have given my agent a semantic n keyword search tool where it can write its own query based on user query., so like user uploads documents--> user asks a query-->that query directly isn't used for searching ; instead the model regenerates a detailed query ---> calls the search tool

[-]

Scared-Tip7914@reddit (OP)

Thanks! Yeah i might have been a bit unclear with the explaination, this thing is basically for getting a google search style dropdown of potential queries as the user types their question.

[-]

MoodDelicious3920@reddit

Sounds good but i mean rag is for searching something right? Rather than seeing suggestions on what to search... and fuzzy mayching results will automatically reveal the related content right...maybe i am not able to understand..

[-]

Scared-Tip7914@reddit (OP)

Noo you got it right, its more of a convenience feature than anything else. Absolutely not necessary, especially when the users know what to search for, or will eventually find their way to the info in multi turn convos. Its to make the product look more “polished” if that makes any sense, so users already feel like there is some interactivity even before the first question is fired off.

[-]

MoodDelicious3920@reddit

Yea it will look good fs..btw how r u processing pdfs...i first convert each page to text using ocr, then contatenate to get a .txt file

[-]

Scared-Tip7914@reddit (OP)

Thank you I really appreciate it! This thing here just extracts text direct from the pdf if there is any, but in my rag flows I found docling to be the best option, it has an OCR + Layout Parser flow that preserves headings tables etc. Then convert the chunks to markdown so they are more LLM friendly.