Our 3rd AMA: Unsloth Team, Creators of the lightning-fast Unsloth fine-tuning library! (Wednesday, 10 AM-1 PM PST)

[-]

yoracale@reddit

It was live here: https://www.reddit.com/r/LocalLLaMA/comments/1ndjxdt/ama_with_the_unsloth_team/ We're still answering any questions people may have!

Reply

[-]

I have a question I've really never seen addressed well in all of the many fine-tuning videos, blogs, articles, etc. as most of them focus on training LLMs to respond to chats or instructions in a certain style or format. At our work we use a specialized piece of software which is similar to VB but highly customized to the point where even a coding LLM that was trained on VB would still get things wrong. I have plenty of code examples as well as the developer documentation which is highly-detailed and definitely contains everything one would need to know in order to properly script something. I understand the concepts of fine tuning and have done it plenty of times with text and image based models, but when it comes to training a coding LLM I get stuck. If you know of any good resources that go into greater detail on how best to do this I'd love to know about them. Perhaps you might even consider creating a fine-tuning notebook or blog article specifically about best practices for training a coding model. Ideally, I'd like to have a model (or two, depending on suggestions) that can both generate code (input the requirements, get code out) as well as something that can be used conversationally to answer questions about the language, suggest code improvements, help correct errors in code, etc. Some of the things that I get stuck on: * Should I train a base model first to let it 'learn the patterns' of the language, then do instruction tuning for generating code and answering questions, or is the current state of models / fine-tuning sufficient to where I can skip straight to an existing instruction-trained coding model (perhaps one already trained on VB)? * Between documentation, code examples, archived conversations between developers discussing the software and scripting concepts (email, forum posts) and synthetically generated Q&A or instructions/outputs, roughly how much of each should there be in the training data? * How should chunking be approached with code? Even with some of the content I've found specifically about creating training data for coding LLMs, it's for languages which are easily split into multiple files and thus an entire file can fit into the context window. In the case of my custom scripting language, all code for a particular use case must be contained in a single file and can get quite large. If I have example code that's too long for the model's context window, do I simply throw it out? Cut out what I can so that it still remains valid? Simply truncate the file and add an indicator at the cut points that it's continued from elsewhere? * When it comes to fine-tuning coding LLMs, how much training data should I aim for? (I suppose this might differ based on whether I'm using a model which is already familiar with VB vs one only trained for the usual languages, Python, HTML/CSS/JS etc) * Any model suggestions for my use case? I started down this road back when the first major Llama model came out and when Unsloth first came on the scene - I've been wanting to give it another shot with some of the newer models out there but it seems like if you stop paying attention to the space for a week you're already out of date! I know I asked a lot of questions - any guidance you can provide on any of these points would be a tremendous help! Thanks in advance and thanks for all the work you've done for the community.

Reply

[-]

thesillystudent@reddit

waiting for multi GPU training :)

Reply

[-]

danielhanchen@reddit

It technically works! See https://docs.unsloth.ai/basics/multi-gpu-training-with-unsloth - we're still working to make it much better and much more efficient!

Reply

[-]

danielhanchen@reddit

Hey guys excited to be doing the AMA tomorrow!

Reply

[-]

sammcj@reddit

Daniel you're such a legend in the community, we're lucky to have you join this!

Reply

[-]

danielhanchen@reddit

Appreciate it :))

Reply

[-]

yoracale@reddit

Also excited to participate in tomorrow's AMA. 🥰

Reply

[-]

Mother_Context_2446@reddit

Thanks for all of your hard work. Just a small query from my end. When does the team think it will be possible to fine-tune 120B GPT OSS and export to vLLM in 4bit? I believe it’s currently limited to FP16. Thanks!!!

Reply

[-]

danielhanchen@reddit

Thank you! Oh bitsandbytes 4bit?

Reply

[-]

Mother_Context_2446@reddit

That or MXFP4 - personally I have a novel use case for GOT-OSS120B and love that it can fit into 1x H100. But as far as I understand if we want to fine tune it, we have to use the FP16 version which is much higher in VRAM requirements. Thanks again

Reply

[-]

danielhanchen@reddit

Oh ok let me get back to you on this! I'll see if I can implement it ASAP!

Reply

[-]

Mother_Context_2446@reddit

You’re a legend!

Reply

[-]

danielhanchen@reddit

:)

Reply

[-]

chlobunnyy@reddit

So excited! Very cool \^-\^

Reply

[-]

danielhanchen@reddit

Pumped for tomorrow!!

Reply

[-]

TheLocalDrummer@reddit

Better dataset utilities like Axolotl

Reply

[-]

danielhanchen@reddit

Hey! Great work with the Drummer models as usual! I remember you mentioned highlighting of dataset roles during the preparation stage - is this something that's still of interest?

Reply

[-]

TheLocalDrummer@reddit

Thank you! Agatha v1 and a couple more models were tuned using Unsloth because of the insane optimization tricks you guys did. Helper functions for manipulating and previewing the dataset. In Axolotl, they do the following: * Prints several samples from the dataset for inspection. * Prints masked tokens in the color red, prints unmasked tokens in the color green. * Prints the respective token id and attention mask values beside every token in the sample. * Sample packing for even distribution (e.g., when I set seq\_len to 16k with sample packing, then I know the model is exposed to \~16k \* bsz in every training step) There's probably a bunch more I've forgotten since we discussed these a few months ago.

Reply

[-]

danielhanchen@reddit

Oh ok thanks! Appreciate it! I'll jot these down and work on them! Thanks for the suggestions!

Reply

[-]

yoracale@reddit

What specific dataset preparation features would you like to see in Unsloth? We currently have training on completions which is actually very hard to implement Data preparation for vision datasets Tokenizer chat template preparation Synthetic data generation and more! But we're always looking to improve unsloth so please list your top things you want to include and we'll try to make it happen

Reply

[-]

TheLocalDrummer@reddit

Are you referring to chat completions? Since prepping text completion is just tokenizing everything for training.

Reply

[-]

danielhanchen@reddit

Masking out tokens for the assistant prompt generally increases accuracy by 1% or more as seen in the [QLoRA paper](https://arxiv.org/pdf/2305.14314) https://preview.redd.it/x9of8euyb7of1.jpeg?width=1200&format=pjpg&auto=webp&s=61e87499af8b77772be796a0729a31714f653585 The issue is it's actually very complex since tokenizers can tokenize combined tokens or newlines differently, so one has to be careful about masking out the correct tokens. Simply tokenizing assistant and user prompts separately unfortunately do not work, so we had to create a universal custom masking also in Unsloth. More details in our hyper parameters [guide](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/lora-hyperparameters-guide#training-on-completions-only-masking-out-inputs)

Reply

[-]

Educational_Rent1059@reddit

What’s up with all these axolotl fanboys zerging every unsloth thread/topic? Did you even read the comment and reply or is trolling the only thing you are seeking? Also, that’s not how training on completions work, ”just tokenize everything”, do you have anh clue? Like wtf are you on about? Why not reply to the question? What utilities? Jesus…

Reply

[-]

Educational_Rent1059@reddit

Love to listen to you guys! Looking forward to this, big thanks 🙏

Reply

[-]

danielhanchen@reddit

Thank you! :)

Reply

[-]

XMasterrrr@reddit (OP)

Hi r/LocalLLaMA 👋 We're excited for tomorrow's guests, **The Unsloth Team!** They're the folks behind the blazing-fast Unsloth fine-tuning library and a slew of community notebooks. **Kicking things off tomorrow (Wednesday, Sept. 10th) 10 AM–1 PM PST** ⚠️ **Note:** The AMA itself will be hosted in a **separate thread,** please don’t post questions here.

Reply

Our 3rd AMA: Unsloth Team, Creators of the lightning-fast Unsloth fine-tuning library! (Wednesday, 10 AM-1 PM PST)

Reply to Post

28 Comments

Rukelele_Dixit21@reddit

yoracale@reddit

samplebitch@reddit

thesillystudent@reddit

danielhanchen@reddit

danielhanchen@reddit

sammcj@reddit

danielhanchen@reddit

yoracale@reddit

Mother_Context_2446@reddit

danielhanchen@reddit

Mother_Context_2446@reddit

danielhanchen@reddit

Mother_Context_2446@reddit

danielhanchen@reddit

chlobunnyy@reddit

danielhanchen@reddit

TheLocalDrummer@reddit

danielhanchen@reddit

TheLocalDrummer@reddit

danielhanchen@reddit

yoracale@reddit

TheLocalDrummer@reddit

danielhanchen@reddit

Educational_Rent1059@reddit

Educational_Rent1059@reddit

danielhanchen@reddit

XMasterrrr@reddit (OP)