Why do base models give gibberish and need further 'fine tuning'

Posted by QFGTrialByFire@reddit | LocalLLaMA | View on Reddit | 14 comments

I'm trying to understand why does something like say llama 3.1 8b need further instruction by something like alpaca? If you just load the base model and ask something of it it just responds with gibberish. If you train it with say even just 1000 samples of alpaca data it starts responding coherently. But why does that happen when the original is already trained on next token generation? The q/a instruction training is also next token generation why does a little nudge in the weights from alpaca or other small data sets suddenly get it to respond with coherent responses. When I've looked around in sites etc it just says the further instruction gets the model to align to respond but doesn't say why. How come a few samples (say just 1000 alpaca samples) of 'fine tuning' next token generation suddenly go from gibberish to coherent responses when that is also just doing next token generation as well. I get its training directed towards producing responses to questions so it would shift the weights towards that but the original next token training would have had similar q/a data sets in it already so why doesn't it already do it?