Dropping learning rate fixed my Qlora fine-tune more than anything else i tried
Posted by Scared-Biscotti2287@reddit | LocalLLaMA | View on Reddit | 11 comments
Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha. Nothing realy changed.
Dropped the learning rate from 2e-4 to 1e-4 and bumped epochs from 3 to 5. Ran it on a 5090 I rent on Hyperai since our lab machines are always booked. Completley different results. Same data, same everything else.
2e-4 is just too agressive when your dataset is that small. The model overfits in the first epoch and then just goes in circles for the rest of training. Lower lr gave it more room to converge without blowing past everything.
Also ended up cutting about a third of my dataset, mostly mislabeled and ambiguous stuff. Eval got better with less data which yeah yeah everyone says that but its different when you see the numbers yourself lol
2e-4 is the default everywhere and i dont think it works well below a certain size.
Far_Suit575@reddit
Anyone know if there are platforms with better pricing for quick experiments? I'm just doing small fine-tuning jobs and don't want to deal with per-minute billing
PrestigiousHeron827@reddit
Flat rates make way more sense for small jobs. I think hyperai have some intro deal for new accounts, like 20hrs for $1 or something. Probably worth checking out before committing to any one platform.
FullOf_Bad_Ideas@reddit
lr is model specific, batch size specific and lora rank specific, it's really different depending on your detailed configuration and even length of your samples or whether you use sample packing, there's no real default.
do you track validation loss?
There are many moving parts with QLoRA, I had decent experience with loraplus and RSLoRA on top of lora.
8k samples is tiny so you can throw it in optuna and let it optimize hyperparams for lowest validation loss overnight.
Scared-Biscotti2287@reddit (OP)
Good call on validation loss. Will look into loraplus and try the optuna approach.
OldComposerbruh@reddit
Yes I prefer 1e-4 over a higher rate
llama-impersonator@reddit
5 epochs? bruh, make some more data.
Scared-Biscotti2287@reddit (OP)
Can certainly try more.
silenceimpaired@reddit
What are you using for training? If it’s unsloth you should recommend they dynamically set learning rate based on your dataset
Scared-Biscotti2287@reddit (OP)
using unsloth yeah. i usually just set it manually out of habit but dynamic makes sense for this.
BlueDolphinCute@reddit
The data pruning thing is real. Cut almost 40 percent of a dataset once and eval went up. Noise kills qlora runs more than missing volume does
Little_Tangelo2196@reddit
Had a similar issue with a different task. 2e-4 is fine for 50k+ samples but below that you gotta drop it. I usually start at 1e-4 and go lower if needed