Do you pay for curated datasets, or is scraped/free data good enough?

Posted by Lost_Transportation1@reddit | LocalLLaMA | View on Reddit | 18 comments

Genuine question about how people source training data for fine-tuning projects.

If you needed specialist visual data (say, historical documents, architectural drawings, handwritten manuscripts), would you:

a) Scrape what you can find and deal with the noise

b) Use existing open datasets even if they're not ideal

c) Pay for a curated, licensed dataset if the price is right.

And if (c), what price range makes sense? Per image, per dataset, subscription?