Cuda out of memory for fine tuning downloaded llama 3.1 model on instruction dataset

Posted by Reasonable-Phase1881@reddit | LocalLLaMA | View on Reddit | 6 comments

Hi guys i have 24 gb vram gpu nvidia gtx 4090 with integrated 64 gb intel gpu.

When i am running downloaded llama 3.1 8B model on linux system for fine tuning with my instruct dataset. I am getting an error cuda out of memory. Earlier i was getting the error for float32 and then m for float16 also.

Don't want it quantized for 4 or 8 bit. So, to run it I tried 128gb gpu also. Again the same problem cuda out of memory.

Should i use vlm? Any code documentation available for funing tuning without peft/lora as i have enough computational memory. I think i need to do cuda semantics.