Minimum viable LLM

Posted by Down_The_Rabbithole@reddit | LocalLLaMA | View on Reddit | 30 comments

After having used the 125M MobileLLM released by Meta today and getting actually far more decent and coherent replies than expected I wonder what the absolute minimum size is of an LLM is that can still produce coherent text. I should probably define what I mean with "coherent text". * Text should be grammatically correct and understandable (English) * Text should be at least related to whatever the user has replied to it. A greeting should receive a greeting back, A question about a dog should receive an answer back at least related to dogs, even if not factually correct or useful. What is the absolute lowest we could go with parameters that could still produce these outcomes? I honestly didn't expect we would get coherent models under GPT-2 size which was about ~1.5B After experiencing a 125M model that clearly outperforms GPT-2 made me rethink this entirely. How small can we go? 50M? 10M? 1M? 100K?

30 Comments

[-]

FishDave@reddit

I recently experimented with the boundaries of token/parameter ratio of lms where I created synthetic data sets and distilled texts for noise reduction to yield the cleanest texts in simple English. The results were pretty interesting in a 15M parameter model. [tiny-lm-15M](https://huggingface.co/sixf0ur/tiny-lm-15M)

Down_The_Rabbithole@reddit (OP)

Wow, that's very impressive, thanks for sharing.

zrail@reddit

Is there an existing type of LLM that one can stream context updates into continuously and then ask questions about without having to re-parse the context every time? I'm thinking specifically about something like Home Assistant, where I could hook the LLM into the event feed and have it track the current state of every entity in real time, rather than having to send everything wholesale in the prompt.

hapliniste@reddit

I'm waiting for text models to be trained with RAG and masking so they do not train on the knowledge but instead on how they use it. Imagine a reflection model like o1 but for every query it does some RAG and get 100k tokens of context from a knowledge base and Web searches. With this, it do axiomatic reflection using the context. This way, it can outout thousands of tokens per second and be trained to achieve way better axiomatic reflection instead of having a 2% error rate. This will not solve everything because it wouldn't allow for "soft reflection" but could be very good for some tasks like knowledge retrieval, solving logic problems and that sort of tasks. You could then call it from a bigger LLM when that sort of thinking is required.

Effective-Distance53@reddit

You mean..., perplexity??

jack-in-the-sack@reddit

I thought the same thing a few months ago but I hadn't put it as eloquently as you did.

OfficialHashPanda@reddit

I think everyone has thought this, but it's not so easy to think of every detail on how to actually do it.

asankhs@reddit

We have implemented something similar in optillm - https://github.com/codelion/optillm We have short term memory that can make the context of LLMs to be unbounded and then we already have ability to read content from URLs. We did benchmark it recently with FRAMES dataset from Google - https://www.reddit.com/r/LocalLLaMA/s/dGIZvCP7ww

genuinelytrying2help@reddit

Not for nothing, keep up the good work, but when we say we're "waiting" for this, all we mean is that even the best proprietary versions of this methodology still act like a grad student that's about to flunk out instead of what laypeople need to be able to actually rely on any results (the middle-aged phd who runs the lab, who's accountable for writing the damn grants).

besabestin@reddit

I have personally thought about this a lot, but I have always been puzzled by the question if you can separate knowledge from language or to what extent. Your general level of understanding is embedded inside your ability to articulate as well. I understand that people that are well read know where to refer but to what level this could be translated to current style of transformers based language models, I wonder.

martinerous@reddit

Imagine something like Google's AlphaProof, not for math only but also for general logic and basic science and world rules that cannot be overridden by any kind of text. Still, it would require at least basic language training to be able to understand and generate human text. However, a single language should be enough. If it knows English, it could "learn" any other language through something like a "language LORA" or even a RAG. Not sure if that's possible though, not with current architectures.

M4xM9450@reddit

The first part rings eerily similar to Google’s REALM or RETRO models. It kind of over complicated things compared to current RAG which just relies on the Decoder only model + context in prompt.

Ill_Yam_9994@reddit

This makes a lot of sense to me. Focus on giving the LLM soft skills, "common sense", and the ability to parse and retrieve data really intelligently instead of just getting it to memorize everything. Would help with hallucinations and stuff I'd imagine too.

sluuuurp@reddit

You could probably meet your specifications with 1000 if statements in python. I’d say thousands of parameters is definitely possible. But probably you actually want something different than your specifications.

Deathcrow@reddit

> You could probably meet your specifications with 1000 if statements in python Absolutely not. Not even close. Traditional chat bots are incredibly difficult to make and turn out to be garbage if you deviate from the script even slightly.

OP says it can be garbage as long as it’s on-topic garbage.

Everlier@reddit

100k can be coherent with a vocabulary of 40-70 words

Dead_Internet_Theory@reddit

Dr. Seuss [could write a book ](https://en.wikipedia.org/wiki/Green_Eggs_and_Ham#Writing_and_release)with that many words.

Felladrin@reddit

I once trained a 32M in English chat (trained it from scratch with less than 1 billion tokens), and it does give answers related to the user's questions when running with Transformers python lib. (The GGUF version doesn't work well, so I don't recommend it.) It's on Hugging Face if you want to check it out: [Minueza-32M-UltraChat](https://huggingface.co/Felladrin/Minueza-32M-UltraChat)

ColorlessCrowfeet@reddit

Here you go -- less than 0.01B parameters! >In this work, we introduce **TinyStories**, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (**below 10 million total parameters**), or have much simpler architectures (with only one transformer block), yet **still produce fluent and consistent stories** with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities. **TinyStories: How Small Can Language Models Be and Still Speak Coherent English?** [**https://arxiv.org/abs/2305.07759**](https://arxiv.org/abs/2305.07759)

Minimum viable LLM

Reply to Post

30 Comments

FishDave@reddit

Down_The_Rabbithole@reddit (OP)

zrail@reddit

hapliniste@reddit

Effective-Distance53@reddit

jack-in-the-sack@reddit

OfficialHashPanda@reddit

asankhs@reddit

genuinelytrying2help@reddit

besabestin@reddit

martinerous@reddit

M4xM9450@reddit

Ill_Yam_9994@reddit

sluuuurp@reddit

Deathcrow@reddit

sluuuurp@reddit

Everlier@reddit

Dead_Internet_Theory@reddit

Felladrin@reddit

ColorlessCrowfeet@reddit

Brave-Car-9482@reddit

Journeyj012@reddit

kif88@reddit

MoffKalast@reddit

LinuxSpinach@reddit

Radiant_Dog1937@reddit

Everlier@reddit

Jean-Porte@reddit

Ok-Succotash-7945@reddit

No_Afternoon_4260@reddit