How are you handling web crawling? Firecrawl is great, but I'm hitting limits.

Posted by Robertshee@reddit | LocalLLaMA | View on Reddit | 24 comments

Be⁤en expe⁤rimenting with web sear⁤ch and content extra⁤ction for a smal⁤l AI assi⁤stant project, and I'm hitting a few bottlenecks. My current setup is basically 1) Se⁤arch for a batch of URLs 2) Scrape and extract the text and 3) Feed it to an LL⁤M for answers.

It wor⁤ks decently, but the main issue is managing multiple services - dealing with search APIs, scraping infrastructure, and LLM calls separately , and maintaining that pipeline feels heavier than it should.

Is there a better way to handle this? Ideally something that bundles search + content extraction + LLM generation together. All this without having to constantly manage multiple services manually.

Basically: I need a simpler dev stack for AI-powered web-aware assistants that handles both data retrieval and answer generation cleanly. I wanna know if anyone has built this kind of pipeline in production