LLMs as a way to browse the web
Posted by mayodoctur@reddit | LocalLLaMA | View on Reddit | 20 comments
This is a current hot topic that is being explored and I'd like to explore it for my final year project. Using LLMs to browse the web and scrape data. For example, show me 5 reddit posts about xyz or tell me the news from china in the last 2 days. It scrapes the web for this data and relays it back to the user by scraping this data.
For my final year project as an undergraduate student, I'd like to do something like this but before I spend the next 6 moths trying this out, what are some limitations or struggles I might face doing this. Is this even complicated enough to spend as my final year project ?
What would be the scope for this type of project
Narrotapp@reddit
uhm.. like perplexity.ai?
mayodoctur@reddit (OP)
oh wow just discovered this, this was exactly what I was thinking of lol. I might still pursue it as a learning opportunity
ParaboloidalCrest@reddit
Check out this attempt to create an open source alternative to Perplexity https://github.com/ItzCrazyKns/Perplexica
Longjumping_Ad5434@reddit
Or this https://docs.openwebui.com/tutorials/features/web_search/
Yapper_Zipper@reddit
This is just my thought, take it as a pinch of salt. AI-assisted searching is something being done quite a lot. You have Open AI, Perplexity, Bing, Gemini and all of these Search Engines and Research groups have built something similar. Like others suggested, implementing is going to be fairly straightforward.
What would be more interesting is how the models collects and you be able to asses the model's output quality. In most of search engine, there is no "trust" factor. You get the results, you don't even bother to confirm the validity of the post. I know its a common problem and even without AI, the traditional search algorithms can fail. (e.g at some point everyone in the world thought mr.bean died but in fact he was alive and racing his hearts out)
A more interesting idea that you can build upon is, how can you asses the quality of the model's output. Can you come with algorithm that would take the original source of truth and the model's response and give me some % that tells, "Hey, the model is 99% sure that mr.bean is alive". Blindly trusting AI would lead to disaster at some point.
mayodoctur@reddit (OP)
interesting comment, I see what you mean, it would be a straight forward project, but it would be interesting having to web scrape different sites and write different formatting etc. Web scraping google news when the user asks for news for example. I like your idea of assessing the quality of the models output, this would involve training a model using a bunch of test data / validation data correct ?
I was thinking, could I do something like Will LLMs replace search engines and include all of the research and projects under this. Do you think training the model and assessing the quality will be too complex of a project for 5 months. I have til March/April to complete the project along with a dissertation
Yapper_Zipper@reddit
Regarding the model quality, again this is just my 2 cents. There are already measurement scores people use to validate how good the model response is (E.g ROGUE score in summarization). One could use these scores itself but have a way to collect news article related to same topic from different source to generate source of truth with the model response. That's one way and there could be other approaches too.
If there is a possibility, I'd suggest discussing the topic with your professors. This requires some effort from research and without proper guidance it would be difficult.
mayodoctur@reddit (OP)
Id also like to incorporate some web scraping into it, can I combine all of these things into one project/dissertation
Inevitable-Start-653@reddit
I have a repo here:
https://github.com/RandomInternetPreson/Lucid_Autonomy
That sort of does this, it's still experimental but it uses a vision model to identify ui elements for the llm to click on.
JohnnyDaMitch@reddit
It seems like a good size for a final project. Particularly if it can work with different kinds of web sources, as your examples suggest.
Don't overcomplicate. Spend some time coming up with a solid design, but one you're pretty sure you can get through without running into any walls. You'll have time to refine it.
mayodoctur@reddit (OP)
looks like theres already a project out there called perplexity.ai for this exact thing
JohnnyDaMitch@reddit
Rome wasn't built in a day.
matteogeniaccio@reddit
You have at least three components to be implemented: * A web browser and preprocessor. You can't just download the raw html because most content is dynamically loaded and formatted. Reddit for example does this. * Something that makes decisions, it could be a LLM combined with some control logic * The content processor that analyzes the data and produces the output. This is the LLM
If you need an example, I implemented this as a hobby project using selenium+ readability to load data, ReAct + reflexion as control logic and some prompt templates to process the data. The agent can answer questions like "give me a summary of the article on the home page of hacker news that is most likely related to language models." or "what is the recipe of pasta alla carbonara?"
You can check the code at GraphLLM
mayodoctur@reddit (OP)
Hi this seems like the exact thing I was looking to do. Do you think it is complicated enough to work on for the year ?
matteogeniaccio@reddit
I'm not sure. The answer could range from "very easy" to "virtually impossible". You can code a proof of concept in a week and a basic prototype in a month, but you need to know exactly what you are doing.
There is quite a bit of research to do: you need to understand how website are rendered and how to extract information from them, how to use a API to control a browser, how to prompt effectively a LLM, hot to make decisions from the output, how to handle hallucinations, how to code your algorithm, etc...
As a starting point you could try to simulate your program manually by sending prompts to aistudio or chatgpt and watching the result. For example you could write your question and ask the model what it should search on google to find the answer, then you give it the results and ask which website it should open...
BobFloss@reddit
Bro I have a shortcut on my iPhone that does this. It is very easy. You can use Tavily API and just enter a query, then make a prompt that says to use the data from the JSON to generate the response. You could also use spider.cloud (not affiliated just a good service) and Serper or Brave search engine APIs if Tavily is too expensive, too slow, or feels like cheating or something.
The specifics with getting a certain quantity seem to be the hardest part of this because you will need looping and self correcting mechanisms that use LLM-as-a-judge (just feeding in what happened to an LLM and asking if it worked, then retrying if it didn't).
Sounds to me like you already totally have it figured out! Aim higher you seem smart enough to do it. My hunch is that since you already have this good of an idea of how it works just off the top of your head, you should try to think of the idea more intstead of how complicated it will be. You can figure it out, you already did.
The best thing you can do to quickly test something without having to hunch over at a computer is, if you have an iPhone, iPad, or Mac...open the Shortcuts app, and don't worry about code yet. Just use the Get Contents of URL node and you can do insane stuff chaining prompts together as long as each takes under 30 seconds. It's a stupid workflow but if you can't code or are an extremely lazy spineless weakling like myself, it's the best thing ever. You can make menus and test your prompts out on all kinds of LLMs and change it so easily. I recommend using Gemini Flash 1.5 and the Tavily API to start out. I'm sure there are Android apps that can do this kind of thing too, not sure how good they are but for me it's really cool because you can even use the share button in Safari and it will extract the body of any article you're on and feed it into the Shortcut so you can literally have summarization or whatever tf you want everywhere. It's insanely powerful and super slow and annoying but so easy to fuck with to make work
mayodoctur@reddit (OP)
Hey man, thanks for the comment, Tavily API is the type of thing I'm looking to create but more as like a search engine if you know what I mean. Issue is, if I want a good grade on this project, it has to be researchy and complex enough. Thanks for the links I'll have a look.
Outrageous_Umpire@reddit
To be within scope for your project, LangGraph would be a good option. It (and LangChain) have ready-made function calling tools for things like web searching. You could set up nodes to evaluate the and refine the results.
mayodoctur@reddit (OP)
By using langchain and langgraph, would it make the project too easy ? I might start the report with something like Will LLMs replace search engines, and slowly develop the project. My only worry is will the project be too small since it might not require too much coding.
Ill_Yam_9994@reddit
My first thought is:
I think that's more or less how the web result aspects of commercial solutions work.