Semantic search over 100M rows of data?

Posted by cryptoguy23@reddit | LocalLLaMA | View on Reddit | 19 comments

Hi - I’m working with a large dataset and looking to build a quick search engine that can take unstructured queries as input to find matching products. Each product has a description, product name, color, weight, images, shipping time, sku. About 100M products. The file in csv is 164gb.

Would fine-tuning llama 3.1 405b work for this?

What’s the best stack for this? What approv cost am I looking at?