TheaterFire

Our 3rd AMA: Unsloth Team, Creators of the lightning-fast Unsloth fine-tuning library! (Wednesday, 10 AM-1 PM PST)

Posted by XMasterrrr@reddit | LocalLLaMA | View on Reddit | 28 comments

Our 3rd AMA: Unsloth Team, Creators of the lightning-fast Unsloth fine-tuning library! (Wednesday, 10 AM-1 PM PST)

Reply to Post

28 Comments

Rukelele_Dixit21@reddit

AMA When ?
View on Reddit #66253076

yoracale@reddit

It was live here: https://www.reddit.com/r/LocalLLaMA/comments/1ndjxdt/ama_with_the_unsloth_team/ We're still answering any questions people may have!
View on Reddit #66394875

samplebitch@reddit

I have a question I've really never seen addressed well in all of the many fine-tuning videos, blogs, articles, etc. as most of them focus on training LLMs to respond to chats or instructions in a certain style or format. At our work we use a specialized piece of software which is similar to VB but highly customized to the point where even a coding LLM that was trained on VB would still get things wrong. I have plenty of code examples as well as the developer documentation which is highly-detailed and definitely contains everything one would need to know in order to properly script something. I understand the concepts of fine tuning and have done it plenty of times with text and image based models, but when it comes to training a coding LLM I get stuck. If you know of any good resources that go into greater detail on how best to do this I'd love to know about them. Perhaps you might even consider creating a fine-tuning notebook or blog article specifically about best practices for training a coding model. Ideally, I'd like to have a model (or two, depending on suggestions) that can both generate code (input the requirements, get code out) as well as something that can be used conversationally to answer questions about the language, suggest code improvements, help correct errors in code, etc. Some of the things that I get stuck on: * Should I train a base model first to let it 'learn the patterns' of the language, then do instruction tuning for generating code and answering questions, or is the current state of models / fine-tuning sufficient to where I can skip straight to an existing instruction-trained coding model (perhaps one already trained on VB)? * Between documentation, code examples, archived conversations between developers discussing the software and scripting concepts (email, forum posts) and synthetically generated Q&A or instructions/outputs, roughly how much of each should there be in the training data? * How should chunking be approached with code? Even with some of the content I've found specifically about creating training data for coding LLMs, it's for languages which are easily split into multiple files and thus an entire file can fit into the context window. In the case of my custom scripting language, all code for a particular use case must be contained in a single file and can get quite large. If I have example code that's too long for the model's context window, do I simply throw it out? Cut out what I can so that it still remains valid? Simply truncate the file and add an indicator at the cut points that it's continued from elsewhere? * When it comes to fine-tuning coding LLMs, how much training data should I aim for? (I suppose this might differ based on whether I'm using a model which is already familiar with VB vs one only trained for the usual languages, Python, HTML/CSS/JS etc) * Any model suggestions for my use case? I started down this road back when the first major Llama model came out and when Unsloth first came on the scene - I've been wanting to give it another shot with some of the newer models out there but it seems like if you stop paying attention to the space for a week you're already out of date! I know I asked a lot of questions - any guidance you can provide on any of these points would be a tremendous help! Thanks in advance and thanks for all the work you've done for the community.
View on Reddit #66235835

thesillystudent@reddit

waiting for multi GPU training :)
View on Reddit #66211691

danielhanchen@reddit

It technically works! See https://docs.unsloth.ai/basics/multi-gpu-training-with-unsloth - we're still working to make it much better and much more efficient!
View on Reddit #66214037

danielhanchen@reddit

Hey guys excited to be doing the AMA tomorrow!
View on Reddit #66181135

sammcj@reddit

Daniel you're such a legend in the community, we're lucky to have you join this!
View on Reddit #66210939

danielhanchen@reddit

Appreciate it :))
View on Reddit #66214005

yoracale@reddit

Also excited to participate in tomorrow's AMA. 🥰
View on Reddit #66181891

Mother_Context_2446@reddit

Thanks for all of your hard work. Just a small query from my end. When does the team think it will be possible to fine-tune 120B GPT OSS and export to vLLM in 4bit? I believe it’s currently limited to FP16. Thanks!!!
View on Reddit #66190169

danielhanchen@reddit

Thank you! Oh bitsandbytes 4bit?
View on Reddit #66190652

Mother_Context_2446@reddit

That or MXFP4 - personally I have a novel use case for GOT-OSS120B and love that it can fit into 1x H100. But as far as I understand if we want to fine tune it, we have to use the FP16 version which is much higher in VRAM requirements. Thanks again
View on Reddit #66190867

danielhanchen@reddit

Oh ok let me get back to you on this! I'll see if I can implement it ASAP!
View on Reddit #66191474

Mother_Context_2446@reddit

You’re a legend!
View on Reddit #66191514

danielhanchen@reddit

:)
View on Reddit #66193636

chlobunnyy@reddit

So excited! Very cool \^-\^
View on Reddit #66189437

danielhanchen@reddit

Pumped for tomorrow!!
View on Reddit #66190195

TheLocalDrummer@reddit

Better dataset utilities like Axolotl
View on Reddit #66181906

danielhanchen@reddit

Hey! Great work with the Drummer models as usual! I remember you mentioned highlighting of dataset roles during the preparation stage - is this something that's still of interest?
View on Reddit #66183110

TheLocalDrummer@reddit

Thank you! Agatha v1 and a couple more models were tuned using Unsloth because of the insane optimization tricks you guys did. Helper functions for manipulating and previewing the dataset. In Axolotl, they do the following: * Prints several samples from the dataset for inspection. * Prints masked tokens in the color red, prints unmasked tokens in the color green. * Prints the respective token id and attention mask values beside every token in the sample. * Sample packing for even distribution (e.g., when I set seq\_len to 16k with sample packing, then I know the model is exposed to \~16k \* bsz in every training step) There's probably a bunch more I've forgotten since we discussed these a few months ago.
View on Reddit #66184453

danielhanchen@reddit

Oh ok thanks! Appreciate it! I'll jot these down and work on them! Thanks for the suggestions!
View on Reddit #66185362

yoracale@reddit

What specific dataset preparation features would you like to see in Unsloth? We currently have training on completions which is actually very hard to implement Data preparation for vision datasets Tokenizer chat template preparation Synthetic data generation and more! But we're always looking to improve unsloth so please list your top things you want to include and we'll try to make it happen
View on Reddit #66182385

TheLocalDrummer@reddit

Are you referring to chat completions? Since prepping text completion is just tokenizing everything for training.
View on Reddit #66183856

danielhanchen@reddit

Masking out tokens for the assistant prompt generally increases accuracy by 1% or more as seen in the [QLoRA paper](https://arxiv.org/pdf/2305.14314) https://preview.redd.it/x9of8euyb7of1.jpeg?width=1200&format=pjpg&auto=webp&s=61e87499af8b77772be796a0729a31714f653585 The issue is it's actually very complex since tokenizers can tokenize combined tokens or newlines differently, so one has to be careful about masking out the correct tokens. Simply tokenizing assistant and user prompts separately unfortunately do not work, so we had to create a universal custom masking also in Unsloth. More details in our hyper parameters [guide](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/lora-hyperparameters-guide#training-on-completions-only-masking-out-inputs)
View on Reddit #66185249

Educational_Rent1059@reddit

What’s up with all these axolotl fanboys zerging every unsloth thread/topic? Did you even read the comment and reply or is trolling the only thing you are seeking? Also, that’s not how training on completions work, ”just tokenize everything”, do you have anh clue? Like wtf are you on about? Why not reply to the question? What utilities? Jesus…
View on Reddit #66184336

Educational_Rent1059@reddit

Love to listen to you guys! Looking forward to this, big thanks 🙏
View on Reddit #66183089

danielhanchen@reddit

Thank you! :)
View on Reddit #66183164

XMasterrrr@reddit (OP)

Hi r/LocalLLaMA 👋 We're excited for tomorrow's guests, **The Unsloth Team!** They're the folks behind the blazing-fast Unsloth fine-tuning library and a slew of community notebooks. **Kicking things off tomorrow (Wednesday, Sept. 10th) 10 AM–1 PM PST** ⚠️ **Note:** The AMA itself will be hosted in a **separate thread,** please don’t post questions here.
View on Reddit #66180389