Minimum viable LLM

Posted by Down_The_Rabbithole@reddit | LocalLLaMA | View on Reddit | 30 comments

After having used the 125M MobileLLM released by Meta today and getting actually far more decent and coherent replies than expected I wonder what the absolute minimum size is of an LLM is that can still produce coherent text. I should probably define what I mean with "coherent text". * Text should be grammatically correct and understandable (English) * Text should be at least related to whatever the user has replied to it. A greeting should receive a greeting back, A question about a dog should receive an answer back at least related to dogs, even if not factually correct or useful. What is the absolute lowest we could go with parameters that could still produce these outcomes? I honestly didn't expect we would get coherent models under GPT-2 size which was about ~1.5B After experiencing a 125M model that clearly outperforms GPT-2 made me rethink this entirely. How small can we go? 50M? 10M? 1M? 100K?