Why do base models give gibberish and need further 'fine tuning'
Posted by QFGTrialByFire@reddit | LocalLLaMA | View on Reddit | 14 comments
I'm trying to understand why does something like say llama 3.1 8b need further instruction by something like alpaca? If you just load the base model and ask something of it it just responds with gibberish. If you train it with say even just 1000 samples of alpaca data it starts responding coherently. But why does that happen when the original is already trained on next token generation? The q/a instruction training is also next token generation why does a little nudge in the weights from alpaca or other small data sets suddenly get it to respond with coherent responses. When I've looked around in sites etc it just says the further instruction gets the model to align to respond but doesn't say why. How come a few samples (say just 1000 alpaca samples) of 'fine tuning' next token generation suddenly go from gibberish to coherent responses when that is also just doing next token generation as well. I get its training directed towards producing responses to questions so it would shift the weights towards that but the original next token training would have had similar q/a data sets in it already so why doesn't it already do it?
Feztopia@reddit
Base models do not give gibberish. Like the question isn't even right. I have no words.
Mart-McUH@reddit
I don't use base models much but you need different way of thinking when using them, eg to understand it is just continuing text. So if you ask question, prefill it with start of answer and then it should follow naturally I guess. Like:
---
What is the smallest city in Slovakia?
The smallest city in Slovakia is
---
And then it should provide some answer (probably wrong :-)) because that is most likely continuation of given text. Being pedantic if you give just random question the most likely continuation (at least for human) would be something along the lines "I don't know" (not very useful).
Another problem is they are often not really trained to stop. So it will then likely continue with whatever ramblings about the provided answer and later can lead to anything not even related. Depends on model I suppose, been ages since I tried true base model.
phree_radical@reddit
Just think of how many different types of documents it's trained on.
And fine-tuning to imitate a chat isn't really a "need," instead you should experiment with few-shot prompting like the examples here https://www.reddit.com/r/LocalLLaMA/comments/1c7r2jw/wow_llama38bs_incontext_learning_is_unbelievable/
callmedevilthebad@reddit
Is it true for SLMs as well ?
phree_radical@reddit
Not those of today. Smaller models need way more training to reach in-context learning capabilities
QFGTrialByFire@reddit (OP)
Thanks for that, your link and summary is really useful I was searching around and didn't come across that with my searching. Yes, I've noticed you need to run it on a little bit of something like alpaca for understanding instruction following (if that's what you need) then switch to the pattern of input/output you want it to start focusing on and that works. I guess I was expecting the base model to just respond to normal questions and was surprised it doesn't respond normally and then surprised again how little further training was needed to get it to suddenly sound coherent.
Herr_Drosselmeyer@reddit
Given an actual 'system prompt', i.e. a paragraph or two of explanation of the situation like "This is a conversation between an AI assistand and a user. You are the AI assistant, giving helpful answers (etc.)", even base models should not produce nonsense.
Instruct tuned models are basically just trained a little further on the question --> answer format, along with some special tokens, to make them more reliable in their role as a conversation partner vs just being text completion machines.
I also think it depends on how the base model was trained. Many models these days include a large amount of synthetic data, such as from ChatGPT, so if the base already has a lot of this in it, instruct training becomes less important.
synn89@reddit
In the US, when I meet a stranger and say "How are you doing?", I have a high probability of an automatic "I'm fine, thank you." If I did the same thing in the UK, the "How are you doing?" would likely confuse the stranger because it's a very personal question to be asking a stranger.
The difference is that in the US we've been trained to understand that the "How are you doing" is a simple alternative pattern to "Hello" with a specific response, "I'm fine, thanks." The words are sort of "gibberish", unless you've been trained on the pattern of the back and forth flow of a specific style of conversation.
A base LLM has the understanding of language structure, but not the patterns of back and forth, question and response patterns that we call spoken language. A spoken language isn't simply proper sentences, but specific patterns trained in popular culture that we all recognize and pattern match back to.
Another example of this is that some US Americans learn Japanese from watching Anime. To someone in Japan when they hear them speak, it feels "off". Like, too formal, not really grounded in the modern day to day way of speaking in Japan. You may understand the words and how the words can connect in Japanese, but you also need the patterns for the proper back and forth of the constructed sentences that the modern speaker uses.
Wheynelau@reddit
Base models are frequently trained with raw text, rather than instruction like "question", "answer" or chat format. You can imagine the first stage is sort of like general grammar and vocabulary training, which usually means the output should actually be coherent, but may not be what you are looking for.
Pretraining usually does not have any QA sets. When you said gibberish, do you mean coherent gibberish or really gibberish?
GatePorters@reddit
Why do babies give gibberish and need further fine tuning?
The first phase of training is making a bunch of random concepts in the higher dimensional latent space.
The second phase is to structure those into a more directed and cohesive network.
Any further phases are to fine tune based on use.
Kooshi_Govno@reddit
Base models are nothing more than autocomplete. You can actually use this to make them respond somewhat coherently if you give them a bunch of context so that it looks like they should respond to you. They will also just continue after that though, also responding as you.
They're very fun to play with in their own way.
YouDontSeemRight@reddit
Interesting concept, it's almost like a conversation simulation. If you could steer the conversation it could be helpful
Kooshi_Govno@reddit
It's been a couple years since I used it, but oobabooga's text-generation-webui has a nice Notebook mode where you can do this. It's basically just a text editor window where you can have the LLM start autocompleting, then you can stop it, edit, and have it continue.
YouDontSeemRight@reddit
Base models are internet regurgitation machines. Go to an obscure website and copy paste half a paragraph into the model. See if it reproduces the other half.