How are you handling web access for local models without destroying context quality?

Posted by SharpRule4025@reddit | LocalLLaMA | View on Reddit | 6 comments

Running Llama 3.3 70B locally for a research project and the biggest friction point has been web access. Fetching a page and dumping it into context is brutal. A typical Wikipedia article in raw markdown is 15,000-30,000 tokens before you get to the actual content.

Been experimenting with a preprocessing step that strips navigation, extracts just the article body, and converts to clean text. It helps but feels like reimplementing something that should already exist.

What are others doing for web context with local models?

Reader APIs that return cleaned article text work for blog and article pages but fail on product pages, docs, and anything JS-heavy.

HTML to markdown then a cheap API call to extract relevant sections. Works but adds latency and cost.

Running a small local model specifically for web content extraction before passing to the main model. Interesting but complex to maintain.

Context window constraints are tighter for local models. Any approaches that work well across different page types?