I think we should train LLMs in increasing complexity while avoiding material on the internet.

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 17 comments

I think the current idea of training LLMs on internet information is the wrong way. Instead, I feel we should train an LLM how a child learns. Start with books you should show an infant, then toddler, then child, etc. Eventually, you train it on graduate level material, Always using textbook quality material. The issue I have with internet material is that the information might not actually be correct, but most people think it so since it gets repeated so often. Also I feel that information should be taught in levels or layers, with easiest concepts being taught first, and increasing in complexity and depth. It shouldn't only be taught STEM. Consider psychology, sociology, criminal justice, nursing. I'm a nurse by trade, and I feel that nursing specifically is really good material to train on. On a lot of ways, the material covers a ton of disciplines from medicine, psychology, sociology and math and more importantly, integrates it together. Finally, for fine tuning, written works of all types should be the focus. Teach the LLM how to write and be personable. Also, most of the content on the internet is generated by AI now. You don't want hallucinated material in your training data. I'm thinking out loud. I don't work in tech, but I find LLMs fascinating.