DISTILLATION is so underrated. I spent an hour and got a neat improvement in accuracy while keeping the costs low

I agree with you. It’s not about the percentage split between the train and test sets - it’s about how large the test set is in absolute terms. It needs to be large enough to form a representative sample of the distribution you are modeling.

[-]

waiting_for_zban@reddit

Can someone please explain why I'm being downvoted?

You could ask your local llm this question, but to save you few minutes, in such splits, a common, well known problem as overfitting arise.

[-]

coldrolledpotmetal@reddit

Yeah I'm aware that overfitting can be a problem, but splits can range anywhere from 50-50 to 95-5. Some LLM tasks even require a bit of overfitting anyways, if you really want to reduce hallucinations. OP shouldn't be getting downvoted so hard for saying something that isn't outlandish at all

[-]

Ambitious_Anybody855@reddit (OP)

Thanks u/coldrolledpotmetal for having my back <3

[-]

Su1tz@reddit

Soo, synthetic data fine tuning?

[-]

V0dros@reddit

But also has to be from a big model to a smaller one to be considered distillation

[-]

dp3471@reddit

knowledge distillation != model distillation != distillation

bad op

[-]

Ambitious_Anybody855@reddit (OP)

What should I interpret from this? Yes this is knowledge distillation but does that make the results incorrect or change anything?

[-]

SirRece@reddit

Cool work. I love how everyone just assumes blindly you don't know what distillation is when you clearly do. Love seeing homegrown stuff like this. 😂

[-]

ColorlessCrowfeet@reddit

Distillation == Fine-tuning?

[-]

Ambitious_Anybody855@reddit (OP)

Use cases are different for each. Distillation ensures a smaller model performs on par with a much larger model. It's 14x cheaper in this my example.
Finetuning is more to improve a model's performance on a specific task/domain. Not always done for a cost benefit

[-]

Psychological_Cry920@reddit

How did this explanation got so many down votes?

[-]

SirRece@reddit

Big distillation

[-]

ShadowbanRevival@reddit

If it is ensured why not distill the distilled model on and on until you get AGI in your basement?

[-]

Ambitious_Anybody855@reddit (OP)

Hahah! Spare me lord english is my second language

[-]

eleqtriq@reddit

You can combine the processes. You could distill domain knowledge into the smaller model, too.

[-]

getmevodka@reddit

do you have some video tutorials on the process to learn it for me ? i would love to create some distilled versions from bigger models on my m3 ultra :)

[-]

Ambitious_Anybody855@reddit (OP)

Not a video but a detailed step by step guide. Check my colab notebook for sentiment analysis here: https://github.com/bespokelabsai/curator
Tell me how it works out!!

[-]

getmevodka@reddit

hey thanks ! ill take a look, but i need to finish a feature at my own program first xD

[-]

verbari_dev@reddit

Distillation is specifically fine tuning a smaller model with a larger model's outputs

[-]

Unlucky_Lecture_7606@reddit

Do you have RAFT vs RAG vs Base comparison anywhere?

[-]

Ambitious_Anybody855@reddit (OP)

I don't have it but interesting idea

[-]

ReadyAndSalted@reddit

wait how did distillation give you an improvement in accuracy? The new smaller model be worse than the original larger model... When you say "improvement in accuracy", what are you comparing your new small model against?

[-]

Ambitious_Anybody855@reddit (OP)

I am comparing base small model with finetuned small model. Annotations from large model are treated as ground truth. In essence, I am able to replicate the performance of large model via the finetuned model at 92% accuracy (all this while being 14x cheaper than large model).
Hope this helps

[-]

You_Wen_AzzHu@reddit

Those accuracy rates of 90% and above seem almost too good to believe, tbh.

[-]

Ambitious_Anybody855@reddit (OP)

Not so much if the base model already had a 82.5% accuracy,right? Here's my colab notebook if you would like to check out where I could have gone wrong. https://colab.research.google.com/drive/1Zfl3g7POsqqYQqkzXdyhYRSAymLhZugn?usp=sharing

[-]

You_Wen_AzzHu@reddit

Thank you brother, I would love to know a way to improve my model like what you said.

[-]

Ambitious_Anybody855@reddit (OP)

Colab notebook added on my Github: https://github.com/bespokelabsai/curator