Minimum viable LLM
Posted by Down_The_Rabbithole@reddit | LocalLLaMA | View on Reddit | 30 comments
After having used the 125M MobileLLM released by Meta today and getting actually far more decent and coherent replies than expected I wonder what the absolute minimum size is of an LLM is that can still produce coherent text.
I should probably define what I mean with "coherent text".
* Text should be grammatically correct and understandable (English)
* Text should be at least related to whatever the user has replied to it. A greeting should receive a greeting back, A question about a dog should receive an answer back at least related to dogs, even if not factually correct or useful.
What is the absolute lowest we could go with parameters that could still produce these outcomes? I honestly didn't expect we would get coherent models under GPT-2 size which was about ~1.5B
After experiencing a 125M model that clearly outperforms GPT-2 made me rethink this entirely. How small can we go? 50M? 10M? 1M? 100K?
30 Comments
FishDave@reddit
Down_The_Rabbithole@reddit (OP)
zrail@reddit
hapliniste@reddit
Effective-Distance53@reddit
jack-in-the-sack@reddit
OfficialHashPanda@reddit
asankhs@reddit
genuinelytrying2help@reddit
besabestin@reddit
martinerous@reddit
M4xM9450@reddit
Ill_Yam_9994@reddit
sluuuurp@reddit
Deathcrow@reddit
sluuuurp@reddit
Everlier@reddit
Dead_Internet_Theory@reddit
Felladrin@reddit
ColorlessCrowfeet@reddit
Brave-Car-9482@reddit
Journeyj012@reddit
kif88@reddit
MoffKalast@reddit
LinuxSpinach@reddit
Radiant_Dog1937@reddit
Everlier@reddit
Jean-Porte@reddit
Ok-Succotash-7945@reddit
No_Afternoon_4260@reddit