How to convert my fine tuning from adamw to muon in pytorch?
Posted by Ok_Warning2146@reddit | LocalLLaMA | View on Reddit | 1 comments
My fine tuning code was originally adamw. I heard that the new muon optimizer uses much less VRAM, so maybe I can take advantage of that. So I upgraded my pytorch to 2.10.0 and changed just one line of my TrainingArguments:
`training_args = TrainingArguments(`
`output_dir=OUTPUT_DIR,`
`save_strategy="steps",`
`# optim="adamw_apex_fused",`
`optim=torch.optim.Muon(model.parameters(),adjust_lr_fn="match_rms_adamw"),`
`save_steps=32*197,`
`learning_rate=2e-5,`
`per_device_train_batch_size=BATCH_SIZE, # Adjust based on GPU memory`
`num_train_epochs=4,`
`weight_decay=0.01,`
`tf32=True,`
`gradient_checkpointing=True,`
`torch_compile=True,`
`torch_compile_backend="inductor",`
`dataloader_pin_memory=True,`
`dataloader_num_workers=3,`
`logging_dir='./logs',`
`logging_steps=197,`
`report_to="none"`
`)`
However, I am getting this error:
`ValueError: Muon only supports 2D parameters whereas we found a parameter with size: torch.Size([512])`
How do people get around this? Thanks a lot in advance.
1 Comments
Velocita84@reddit