How to convert my fine tuning from adamw to muon in pytorch?

Posted by Ok_Warning2146@reddit | LocalLLaMA | View on Reddit | 1 comments

My fine tuning code was originally adamw. I heard that the new muon optimizer uses much less VRAM, so maybe I can take advantage of that. So I upgraded my pytorch to 2.10.0 and changed just one line of my TrainingArguments: `training_args = TrainingArguments(` `output_dir=OUTPUT_DIR,` `save_strategy="steps",` `# optim="adamw_apex_fused",` `optim=torch.optim.Muon(model.parameters(),adjust_lr_fn="match_rms_adamw"),` `save_steps=32*197,` `learning_rate=2e-5,` `per_device_train_batch_size=BATCH_SIZE, # Adjust based on GPU memory` `num_train_epochs=4,` `weight_decay=0.01,` `tf32=True,` `gradient_checkpointing=True,` `torch_compile=True,` `torch_compile_backend="inductor",` `dataloader_pin_memory=True,` `dataloader_num_workers=3,` `logging_dir='./logs',` `logging_steps=197,` `report_to="none"` `)` However, I am getting this error: `ValueError: Muon only supports 2D parameters whereas we found a parameter with size: torch.Size([512])` How do people get around this? Thanks a lot in advance.