The Synthetic Data Playbook: Generating Trillions of the Finest Tokens
Posted by joelinho95@reddit | LocalLLaMA | View on Reddit | 10 comments
Introducing the Synthetic Data Playbook: We generated over a 1T tokens in 90 experiments with 100k+ GPUh to figure out what makes good synthetic data and how to generate it at scale
[https://huggingface.co/spaces/HuggingFaceFW/finephrase](https://huggingface.co/spaces/HuggingFaceFW/finephrase)
https://preview.redd.it/hq6abr3p3ung1.png?width=1200&format=png&auto=webp&s=1dd47fa704669648c5fab08b1a02552c0b2fe8ce
10 Comments
Due-Cat6317@reddit
joelinho95@reddit (OP)
Middle_Bullfrog_6173@reddit
Due-Cat6317@reddit
Middle_Bullfrog_6173@reddit
ClimateBoss@reddit
Middle_Bullfrog_6173@reddit
Xamanthas@reddit
Expensive-Paint-9490@reddit
Xamanthas@reddit