How are you handling web access for local models without destroying context quality?

Posted by SharpRule4025@reddit | LocalLLaMA | View on Reddit | 6 comments

Running Llama 3.3 70B locally for a research project and the biggest friction point has been web access. Fetching a page and dumping it into context is brutal. A typical Wikipedia article in raw markdown is 15,000-30,000 tokens before you get to the actual content.

Been experimenting with a preprocessing step that strips navigation, extracts just the article body, and converts to clean text. It helps but feels like reimplementing something that should already exist.

What are others doing for web context with local models?

Reader APIs that return cleaned article text work for blog and article pages but fail on product pages, docs, and anything JS-heavy.

HTML to markdown then a cheap API call to extract relevant sections. Works but adds latency and cost.

Running a small local model specifically for web content extraction before passing to the main model. Interesting but complex to maintain.

Context window constraints are tighter for local models. Any approaches that work well across different page types?

[-]

vincespeeed@reddit

Try the Openwebui tools or functions.I've tried a lot of browsers. I suggest you check out suitable tools on the OpenWebBui marketplace.

SharpRule4025@reddit (OP)

OpenWebUI tools are fine for the interface layer but they don't solve the actual extraction problem. You still need something that hits the page, handles JS rendering, and pulls out just the relevant content before it touches your context window.

That's the part that eats tokens. A product page with all the navigation, footer, and script tags dumped as markdown will burn through your context budget fast. We built an AI extraction layer at alterlab.io that handles this. You point it at a URL, tell it what data you want in plain English, and it returns structured JSON. Cuts token usage by 80 to 95 percent compared to dumping the full page markdown. Handles JS-heavy pages, anti-bot protection, the whole chain.

For a local LLM setup, you'd hit the API to extract what you need, feed just that cleaned data to your model. Keeps your context window for actual reasoning instead of parsing HTML noise.

scrapling[fetchers] I found this, in case you want to give it a try.