Unsloth announces Unsloth Studio - a competitor to LMStudio?

[-]

danielhanchen@reddit

Oh hey! There's a tonne of features in it! 1. Chat UI has auto healing tool calling, **Python & bash code execution**, web search, image, docs input + more! 2. Finetune audio, vision, LLMs with an **Auto Assist data prep** (all local) 3. Supports GGUFs, **Mac, Windows, Linux** \+ **audio generation** as well 4. Has SVG rendering, exporting to GGUF inside of it 5. gpt-oss harmony rendering, all inference params are pre-set to recommended ones 6. A Data designer + **synthetic data generation** system! 7. Fast parallel data prep as well + embedding finetuning! 8. And much more at https://github.com/unslothai/unsloth. To install it, try:  pip install unsloth unsloth studio setup unsloth studio -H 0.0.0.0 -p 8888

Reply

[-]

thereisonlythedance@reddit

Brilliant. Any plans to integrate RAG? Apart from OpenWebUI (which has its issues) there’s not a great UI that supports it.

Reply

[-]

danielhanchen@reddit

Definitely yes :)

Reply

[-]

mecshades@reddit

In RE: to RAG, please consider allowing parsing a directory. The company I work for has a messy file store of documentation semi-organized in directories by name. Open WebUI fails because it requires the user to manually upload & manage files or we are forced to use an API to upload files individually. Ideally, the best RAG should let us point to a directory and process all knowledge in that directory, eliminating the need of hitting an API or uploading manually. That should be baked in, IMO!

Reply

[-]

pepe256@reddit

It sounds like your workflow could benefit from using an agentic harness like Claude Code/Codex/OpenCode/etc. They're able to look at whole directories, and grep to find info, run any other commands, or read whole files. I heard there are RAG MCPs you can connect them to. I don't personally use any because I don't need them

Reply

[-]

Psychological-Lynx29@reddit

Yeah, this is not so cool when you dont want to share private information.

Reply

[-]

zxyzyxz@reddit

But you can do all that locally too, harnesses and MCPs aren't limited to the cloud.

Reply

[-]

richardstevenhack@reddit

Look at MSTY KnowledgeStacks.

Reply

[-]

BrewboBaggins@reddit

Where do I type this in at? a command prompt? power shell? is it going to ask me where I want to install it to? is going to install everything in a virtual environment or will it overwrite all the stuff I already have installed. How about a "windows.exe" like LMS,Jan,Kobold,etc for us non coders. or at least a windows.bat like Ooba. Help a NonTechBro out here...

Reply

[-]

Refefer@reddit

Am I reading this right? Linux, Mac, and Windows work out of the box?

Reply

[-]

danielhanchen@reddit

Yes! So finetuning not yet if you don't have a GPU, but CPU inference works on all 3!

Reply

[-]

power97992@reddit

What kind of gpu? Cuda only? What about mlx and MPS?

Reply

[-]

inevitabledeath3@reddit

They only support Nvidia for now but are planning to add Intel, AMD, and Apple support in the future.

Reply

[-]

NoahFect@reddit

What's up with this? [WARN] Node v22.9.0 / npm 10.8.3 too old. Installing Node.js LTS via winget... [ERROR] Could not install Node.js automatically. Please install Node.js >= 20 from https://nodejs.org/

Reply

[-]

FORNAX_460@reddit

So there is no option to load already downloaded models for chatting!? All i see is that its only reading from the hf cache directory.

Reply

[-]

anantj@reddit

Does it support AMD gpus on windows? Is Fine tuning supported on windows+AMD?

Reply

[-]

white_december@reddit

ANE inference support?

Reply

[-]

Hot-Section1805@reddit

This happens on my Mac Mini: Collecting xformers>=0.0.27.post2 (from unsloth) Downloading xformers-0.0.35.tar.gz (4.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 4.7 MB/s 0:00:00 Installing build dependencies ... error error: subprocess-exited-with-error × installing build dependencies for xformers did not run successfully. │ exit code: 1 ╰─> [4 lines of output] Collecting setuptools>=64 Using cached setuptools-82.0.1-py3-none-any.whl.metadata (6.5 kB) ERROR: Could not find a version that satisfies the requirement torch>=2.10 (from versions: 1.9.0, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.1, 2.8.0) ERROR: No matching distribution found for torch>=2.10 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'xformers' when installing build dependencies for xformers

Reply

[-]

Devatator_@reddit

So, I could install this on my VPS and access it from any device?

Reply

[-]

Succubus-Empress@reddit

Exl2?

Reply

[-]

danielhanchen@reddit

For now llama.cpp is powering it - we'll add vLLM, but I can check Exl2!

Reply

[-]

Zestyclose_Yak_3174@reddit

Please try to also support MLX in the future. This will make your app so much better for Apple Silicon users. I use both Llama.cpp and MLX for inference, I recon many others too. Especially since your quants are just better.

Reply

[-]

ambassadortim@reddit

Does this tool have a web interface so I can use it from my phone?

Reply

[-]

danielhanchen@reddit

Yes yes it can be accessed throughout the world!

Reply

[-]

BobbingtonJJohnson@reddit

Does that include GGUF finetuning?

Reply

[-]

fiery_prometheus@reddit

It would be awesome, if you can do checks for contamination as part of the suite, would do the community a huge favor, having an easy way to check and understand these things.

Reply

[-]

-Cubie-@reddit

Does it also work with the new embedding finetuning?

Reply

[-]

danielhanchen@reddit

Yes it does!

Reply

[-]

Pro-editor-1105@reddit

Another banger from you guys!

Reply

[-]

danielhanchen@reddit

Thanks!

Reply

[-]

j_osb@reddit

In what world was LM Studio the go-to solution for 'advanced' users? That was always vLLM or directly llama.cpp.

Reply

[-]

TheLexoPlexx@reddit

In the same world of people that don't know ollama wraps llama.cpp

Reply

[-]

k_means_clusterfuck@reddit

"Actually ollama is newer and c++ is old it's from likevel the 20th century" type shii

Reply

[-]

RedParaglider@reddit

C++ is just a wrapper for Assembly.

Reply

[-]

NewMetroid@reddit

Assembly is just a wrapper for binary.

Reply

[-]

DominusIniquitatis@reddit

Binary is just a wrapper for electricity.

Reply

[-]

thrownawaymane@reddit

Electricity is just a wrapper for electrons

Reply

[-]

Ill_Barber8709@reddit

Electrons are just a wrapper around the nucleus.

Reply

[-]

apidekachu@reddit

Electrons are just a wrapper around strings

Reply

[-]

NoahFect@reddit

HDL: "Oh hai. I didn't see you all the way over there."

Reply

[-]

j_osb@reddit

If I am forced to use VHDL over verilog (or like, chisel) again I WILL put that person to justice.

Reply

[-]

NoahFect@reddit

Yeah, you'll notice that the letter V is conspicuous by its absence

Reply

[-]

RedParaglider@reddit

Everything is a wrapper. Except Ed. Ed is the default UNIX text editor. It says so right in the man page.

Reply

[-]

temperature_5@reddit

I have no idea what you guys are talking about. Now to get back to wire wrapping the TTL gates for this logic circuit...

Reply

[-]

Electrical-Risk445@reddit

Calling my core memory on that one.

Reply

[-]

NewAlexandria@reddit

LLM's are just compilers that use natural language

Reply

[-]

huffalump1@reddit

"ollama is easier to use, just `ollama run`, llama cpp is more complex" That makes me wanna say: you have all the knowledge of the world at your fingertips, it is not hard to ask an LLM to read llama.cpp docs and give you a command :|

Reply

[-]

Aerroon@reddit

> it is not hard to ask an LLM to read llama.cpp docs and give you a command :| So you gotta first run ollama to tell you how to run llamacpp, I see.

Reply

[-]

Conscious-content42@reddit

It's like dating apps, "the app that is made to be deleted." Once you make your match with llama.cpp

Reply

[-]

edankwan@reddit

It is like when I newly installed Windows to use Edge to download Chrome and then delete Edge

Reply

[-]

Conscious-content42@reddit

Then use chrome to install Firefox and delete chrome😁

Reply

[-]

TheLexoPlexx@reddit

Then delete Windows and use Linux

Reply

[-]

Conscious-content42@reddit

Then use Linux to download a BSD image, use a USB boot drive and overwrite Linux partition.

Reply

[-]

catch-10110@reddit

It’s more like: Ollama is easy and it works great. I have never seen a reason to use llama.cpp directly other than for philosophical open source reasons. If you’re not “into” open source as a philosophy then as far as I can tell Ollama is the best and easiest solution. I am entirely willing to be told that I am wrong about this to be clear. But I have asked before on this sub and the only answer I got came down to people not liking Ollama for reasons of open source philosophy rather than any substantive reason. I would actually love to be told I am wrong. I love learning about all of this as a relative newcomer.

Reply

[-]

qrayons@reddit

I don't understand all the ollama hate here. I feel like ollama is built for ease of use and llama.cpp is built for performance. If I'm messing around with some new models then I don't care if it runs a bit slower on ollama, especially if it saves me the headache of getting stuff working in the first place. I'm sure llama.cpp is super intuitive and easy to use for everyone that has been using it every day for over a year, but that could be said about any software that someone uses daily.

Reply

[-]

tmflynnt@reddit

I hear you and tbh despite being a huge llama.cpp fan and contributor I do realize it has some significant gaps in regard to the intimidation and ease of use factors. It is improving majorly though through things like its evolving web UI, model routing features, "fit" parameters for easy config, and ongoing efforts toward easier install scripts and such, but it still has some ways to go and some of these elements are probably never going to be its top strongsuit. For those reasons I do understand that some new people to the scene will often start out with other apps but I do feel llama.cpp is worth the time investment once people feel comfortable to explore a bit and I feel that time investment has been getting more and more manageable. But, in regard to Ollama and the negative feelings people have around it, I would just add that there are some [very specific reasons](https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing_ollama_isnt_just_a_pleasure_its_a_duty/o3lcjz5/) that people feel that way if you care to look into it more.

Reply

[-]

CatEatsDogs@reddit

Totally agree. I can install and run llamma.cpp but I don't like to build it constantly by myself and remembering all these parameters to launch a lmm. And does it have a "pull" command? It's like a ready-to-cook product vs ready to use hamburger.

Reply

[-]

DeepOrangeSky@reddit

>Ollama is easy and it works great Not for me it wasn't as someone who came into all this a few months ago as a complete and total beginner, as someone who doesn't know anything about how to use computers for things besides checking email, browsing youtube, etc. I.e. no clue what JSON files are, or templates or how to use command lines or any of that kind of stuff. Slowly starting to learn, but only like 1% of the way so far. I really wish I had started with LM Studio instead of Ollama, tbh. The reason I started with Ollama was when I was browsing on here, people basically said what you said in the threads, that it was the beginner friendly easy one, and they said that LM Studio was the other beginner/easy one, except that one was closed-source and Ollama was open-source or something like that. So, I figured in that case I would start off with Ollama. The only problem was, as soon as I started trying to acquire models directly from huggingface to try to make sure I got the exact quants of whatever obscure fine-tunes/merges I wanted to try, and save them to my external hard drive, it was this whole nightmare with blobs and modelfiles, etc, and having to ask Gemini how to point Ollama to my drive and use some weird two-terminal thing where I served models to myself (or whatever the terminology would be, I dunno, I felt like I was a golden retriever trying to type hacker stuff like in that 1990s move Hackers, and pressing keys with my paws. It fucking sucked, dude. And if you download the model with ollama with the pull command, then if you want to move it off your computer to your external storage to make room for more models, then it'll break Ollama, because you aren't allowed to just move models around I guess, or else it shatters the blobs like some delicate glass vase and you have to like, I dunno, delete all your models that you ever pulled with ollama onto your computer all off your computer and delete ollama and restart from scratch. And if you just download a GGUF from huggingface, and then try using the FROM thing to create a text file and then use the ollama create command to turn it into an ollama modelfile to try to use it via ollama, then if you don't know about the that list of template stuff and params/prompt/stops etc stuff that you need to go find somewhere and paste somewhere, your model either runs super weird or doesn't work properly at all (which might sound like a rookie mistake, but well, yea, like I said at the start, I was a complete beginner, so that didn't go so great either). Just started trying LM Studio out very recently the past week or so. I'm realizing now like, oh shit, I fucked up baaaaaad by starting with Ollama, lol. I should've just started with LM Studio. I was paranoid that since it is closed source that means they can just install some like, I dunno, government backdoor or some shit to make sure you didn't invent a quark star in your garage by asking DeepSeek Speciale how to do quark-fusion or some shit (lol), so I was like "fuck closed source, the men in black are going to wipe my memory with one of those silver eyeball-tampon things! Noooo!" and things of that nature. But yea, I should've just started off with LM Studio tbh. Also, I didn't even realize apparently Ollama's GUI isn't even open source anyway (not sure if that's correct, but I think so, right?) although I was ironically using the Terminal to run Ollama even though I'd never used terminal before, since I hated the GUI app thing so much. I dunno, I'm sure there is some person out there who likes the modelfile system for some reason of some sort (supposedly it is good for "dev teams" or something? I dunno, I'm just some random individual noob, no, not really sure). But, yea, I didn't like it at all, and not because it was too nice and clean and easy, but the exact opposite, I feel like I probably could've spent the hours and hours I spent asking Gemini how to do all the annoying shit I had to figure out how to do on Ollama to just learn how to use llama.cpp instead, and then I'd be further along by this point instead of just now starting basically back to square one with a way better option in LM Studio. Although looks like since the closed source aspect was the one thing I don't like about LM Studio, maybe even that last downside will be done with if this unsloth thing ends up being good, since I can just switch to using that instead. Well, with Unsloth the main thing I am most excited about is the training-models-made-easy thing, if that ends up being legit. This whole entire time the past few months ever since I first got into all this, I've been wishing I could figure out enough about computers and become computer literate enough to be able to start doing some fine-tuning experiments on models, and have been super impatient and frustrated about not being able to do it yet, since I suck at computers. So, if that thing allows me to just click some buttons and start messing around with fine-tunes, that would be the coolest shit ever. So, pretty excited for this thing.

Reply

[-]

huffalump1@reddit

Nah man I agree. If you like it, and it works for you, that's good enough *(Personally I prefer more control over models/quants and settings, plus I've had worse performance with ollama, and I usually use a different web UI anyway so llama.cpp is better for my use case)*

Reply

[-]

thrownawaymane@reddit

See the comment above yours for a good summation.

Reply

[-]

NoahFect@reddit

Exactly. The LLM *is* the shell.

Reply

[-]

tiffanytrashcan@reddit

https://preview.redd.it/spys1kehfrpg1.jpeg?width=1080&format=pjpg&auto=webp&s=f94513e2f69c1b7e05e44e8c02ee5a33403eb97a

Reply

[-]

No-Refrigerator-1672@reddit

Well, given that OpenWebUI now has OpenTerminal project that gives LLM direct control over a shell - you are literally correct.

Reply

[-]

The_frozen_one@reddit

Why do people act like they are mutually exclusive? OSes of all types implement services as a distinct type of program because they represent a useful paradigm. You *could* manually run openssh or dropbear when you want enable ssh access on your computer, or you can have systemd/launchd/rc run the ssh service and never think about it like how 99% of the world does it.

Reply

[-]

droptableadventures@reddit

Also ollama users: - "Why is it doing weird things once my prompt gets over 4096 tokens" - "Why does it sometimes run really slow and not use my GPUs?" - "It says I'm running DeepSeek but I thought that was a much bigger model" - "Everyone says that model is good, but the output seems to be total garbage" - "Why is my hard disk full? Where did all these downloaded models actually go?"

Reply

[-]

citrusalex@reddit

Doesn't it also have its own separate backend that is only partially based on llama.cpp (i think gguf parts)?

Reply

[-]

deepspace86@reddit

The whole thing us basically docker engine refactored to use llama.cpp

Reply

[-]

citrusalex@reddit

What does docker engine have to do with it

Reply

[-]

deepspace86@reddit

The group that built it also built ollama, so if you understand how docker engine works, you have a pretty good understanding of how ollama works too.

Reply

[-]

relmny@reddit

[https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing\_ollama\_isnt\_just\_a\_pleasure\_its\_a\_duty/](https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing_ollama_isnt_just_a_pleasure_its_a_duty/)

Reply

[-]

tmflynnt@reddit

Thank you for posting that. Here is also the direct link to Georgi's post: [https://github.com/ggml-org/llama.cpp/pull/19324#issuecomment-3847213274](https://github.com/ggml-org/llama.cpp/pull/19324#issuecomment-3847213274)

Reply

[-]

relmny@reddit

"That was in the past, for x months is not and is all ollama now" that's what people here kept saying, meanwhile: [https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing\_ollama\_isnt\_just\_a\_pleasure\_its\_a\_duty/](https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing_ollama_isnt_just_a_pleasure_its_a_duty/)

Reply

[-]

BreakfastFriendly728@reddit

mlx: in the same way i was ignored

Reply

[-]

Craygen9@reddit

Agree, they should have said average user. To be fair LM Studio makes it easy to download and evaluate models and settings before deploying on llama.cpp

Reply

[-]

inevitabledeath3@reddit

Yes. I can work with llama.cpp directly but why bother when LMStudio exists and has helpful features like estimating the RAM usage when playing with different loading parameters. There is also the whole LMStudio link thing I find quite helpful. I could setup reverse proxies or Tailscale myself and manage API keys. It's much easier to just download LMStudio on a second machine and setup the link, and that also gives me an easy way to load and unload models remotely as well.

Reply

[-]

No-Refrigerator-1672@reddit

Actual reasons: 1) LM Studio, being a wrapper around llama.cpp, is slow. Really slow. You don't feel it when running chats, but the moment you want your AI to do deep research, edit a large document, do agentic coding, give a summary about hour long video, etc - you'll find out that waiting half an hours for the task to complete with errors just to rerun it isn't very fun, and very quickly learn how to use vllm. 2) LM Studio is too local. By hosting OpenWebUI, I get to use my personal AI from my smartphone, my tablet, my job laptop, my gaming pc, as well as share it with friends offsetting some of the hardware costs, all without carrying around the beefy PC that can actually do AI.

Reply

[-]

fastheadcrab@reddit

Using vLLM is the main reason but it can be a pain to setup, especially if you've never seen a console before in your life.

Reply

[-]

No-Refrigerator-1672@reddit

Maybe, if you're on Windows. If you're on Linux, it is actually easier to setup than llama.cpp: it gets installed with one cli command; and you don't even need to understand lauch options, vllm docs have Ai chat companion that uses RAG over docs and github and will help you set up your launch parameters with zero prior knowledge. P.S. oh, and llama.cpp actually requires console too.

Reply

[-]

inevitabledeath3@reddit

This is a lie. VLLM has quite specific requirements. I guess you could use the docker container instead, but even then you can spend forever messing with parameters just to get it to fit the model in your GPU. The default config is not VRAM efficient at all. SGLang might be better, but it has it's own issues. Generally speaking these tools are made for the big players, not for running a few models at home.

Reply

[-]

No-Refrigerator-1672@reddit

I'm answering only to a single message to not split the conversation. Yes, vllm is more vram intensive. It pays for itself with unparalleled speed. I'm runing all of my models on it, and I have published intesnive comparison of performance for all possible scenarios [here](https://www.reddit.com/r/LocalLLaMA/s/hMTXaBKZDp). It is anywhere from 2x to 5x faster than llama.cpp on Ampere for single request. You might get bad result just because you're trying to get cpu offloaded model into vllm - it's designed to have the full model in vram, this alonhside having a Volta or newer GPU is the only real requirement to get it running.

Reply

[-]

inevitabledeath3@reddit

Thought I would reply after trying VLLM some more. You are right it is indeed faster in many cases, though I think there are some model architectures and edge cases where ik_llama.cpp or even llama.cpp is still faster. Talking with you was actually the push I needed to practice using tools like VLLM and SGLang.

Reply

[-]

No-Refrigerator-1672@reddit

Hi! Thank you, I love to see that my conversation efforts push people to learn more. From my experience, I'd say that llama cpp or ik_llama is faster for extremely short sequences (less than 1k tokens), which I tend to dismiss as this is too rare of a scenario, they are faster for unsupported quants: i.e. nvfp4 for previous gen cards, and they handle cpu offloading better. I would also recommend you to install litellm (the one that's shipped as docker container), as it'll allow you to measurw how long are your real prompts, including system messages, see the details about ai requests made, and gather usage statistics that is split by different apps you have. It'll allow you to understand your specific workload better.

Reply

[-]

inevitabledeath3@reddit

Nope, model easily fit in VRAM using llama.cpp.

Reply

[-]

No-Refrigerator-1672@reddit

If it fits with llama.cpp, it also fits with vllm. The only real difference is in KV cache handling, vllm mostly forces fp16 cache with quantized models, while llama.cpp allows to quantize cache easily; and vllm will take at most 1 gb extra for cuda graphs and compute buffers.

Reply

[-]

inevitabledeath3@reddit

As I said it fit fully in VRAM even in vLLM. So what's your excuse for it being slower?

Reply

[-]

No-Refrigerator-1672@reddit

There could be only one reason: you failed to read the docs and deploy the model properly. I have given you a link to my tests, with comprehensive numbers and exact cli commands to launch each engine, you can study it to learn how to do things.

Reply

[-]

inevitabledeath3@reddit

I've looked at your tests. You aren't using the same model or architecture I am looking at. I can also see instances in your tests where llama.cpp is faster anyway, although it's only a few tests where this is the case.

Reply

[-]

No-Refrigerator-1672@reddit

Yeah, it's faster for sequences under 4k tokens, which is shorter than system prompt for like any AI tool. This is irrelevant. It is also faster for mxfp4 in some scenarios, which is unsupported on Ampere. As GPT-OSS is basically the only model that uses this forman natively, I'm not accepting that as llama.cpp being universally faster. >You aren't using the same model or architecture I am looking at. I have tried vLLM and llama.cpp with most popular models that were released since November 2025 and fit inside 40GB VRAM. vLLM always wins; the shape of the graph stays the same, only the vertical axis changes.

Reply

[-]

inevitabledeath3@reddit

Yeah so November 2025 was a while ago now. Is it faster in Qwen 3.5? How about GLM 4.7 Flash or Nemotron V3 or LFM 2 24B A2B? Also I was looking at LM Studio GGUF Vs vLLM AWQ since my understanding was that AWQ was better supported in vLLM than GGUF format was and was probably faster anyway. Not having KV Cache quantisation would be a serious limitation. I know there is FP8 available but I am not sure that will perform well on a RTX 3090.

Reply

[-]

No-Refrigerator-1672@reddit

Read carefully, >with most popular models that were released **since** November 2025 Yes, it is faster for sequences over 8k for Qwen 3 VL 30B, 32B, and Qwen 3.5 27B, 35B. It is faster for GLM 4.7 Flash. I did not try nemotron or LFM. >LM Studio GGUF Vs vLLM AWQ vLLM Docs actually state that GGUF isn't optimised yet (and, looks like it won't be ever, there's no need). You should use AWQ if you're running vLLM. >Not having KV Cache quantisation would be a serious limitation. This is questionable. First of all, vLLM does support KV quantization down to 4 bits; but when running quantized models, you're supposed to provide calibration coefficients. The broader picture is that any KV quantisation degrades performance of the model, so you need to offset this back by computing the corrections. llama.cpp quantizes cache without this, and you're hurting yourself without knowing about it. vLLM provides tools for you to create a specially quantized model that can to both weight and KV quantization preserving as much intelligence as possible, you can find those prepared models by "W8A8" or "W4A4" keyword on huggingface.

Reply

[-]

inevitabledeath3@reddit

I think part of the issue I had was trying to do tensor parallelism without NVLink. Now I have an NVLink bridge it seems to work much better. Getting 60-70 token/s with Qwen 3.5 27B which is much faster than LM Studio. Will have to try some more models. How well does vLLM work with GGUF? Or is it better to just use AWQ?

Reply

[-]

No-Refrigerator-1672@reddit

Tensor parallel workload hits inter-gpu bandwidth very hard. It does, indeed, require NVLink, or at least full PCIe x16 to each of the card, which never happens on consume motherboards. vLLM docs officialy [state](https://docs.vllm.ai/en/stable/features/quantization/gguf/) that GGUF is unoptimized, and that GGUF does not support loading vision encoder. I exercise "AWQ or nothing" approach when working with vLLM. It also decently supports GPTQ which is an older method, you'll find a lot of 1+ year old models in it, should you need to run one.

Reply

[-]

inevitabledeath3@reddit

I had full Gen 4 X16 thanks to it being an Epyc board rather than consumer, but I guess that wasn't enough.

Reply

[-]

inevitabledeath3@reddit

Fair enough that makes sense. I think I will try again at some point, maybe I was doing something wrong. I did not realize that llama.cpp had such issues with KV Cache quantisation. I remember trying vLLM and it taking forever to even get a model running using it without having out of memory errors, then to find the performance was visibly slower than llama.cpp or LM Studio I basically just gave up. IMHO someone should make something like LM Studio or Unsloth Studio for vLLM. All the easy to use and setup tools are llama.cpp based. I was intending to deploy SGLang, vLLM, or TensorRT LLM at work on some DGX Sparks as we are intending to run some small to medium models at scale and need all the performance we can get for that. I just didn't think it made sense to do that on a home setup.

Reply

[-]

fastheadcrab@reddit

I'm referring to LMStudio lmao

Reply

[-]

inevitabledeath3@reddit

I've tried VLLM. It eats way more VRAM and was actually slower than LM Studio on my hardware. It's designed for inference at large scale with many parallel generations, not for inference at home with a limited amount of hardware and few parallel generations. It's also much harder to use than you are making out. I had initially assumed that MTP would allow VLLM to be faster even at small scale, but the actual tests proved me wrong at least for the setup I tried. I actually host my LM Studio on a separate machine and use the API with things like OpenWebUI and various AI agents and tools. It even has a Daemon version specifically for servers and headless setups. So it's not "too local". You just don't know how to actually use it. Skill issue. I've looked at ik_llama.cpp as well. It's a fork of llama.cpp with faster kernels for certain specific model architectures and support for a few more quantisation techniques. If you aren't using those features it's pointless using that fork. Much like ktransformers it's a fairly specialised tool.

Reply

[-]

FusionX@reddit

I was playing around with local LLMs after Qwen's release. Personally, I was surprised when LM Studio ran quite a bit slower for me with its defaults, than llama.cpp. In some cases, it also ran out of memory much faster. I'm sure it has to do with some runtime parameters, but its curious that the defaults for llama.cpp, intended for intermediate users, achieved much better results than LM Studio, which is much more opinionated and geared towards an average user.

Reply

[-]

shing3232@reddit

average user don't use unsloth. it's a training first software

Reply

[-]

Hoodfu@reddit

For all the people running large models on macs. Llama.cpp is much slower and we can't run vllm. Lm studio is the best way to handle multiple models for mlx.

Reply

[-]

egomarker@reddit

What? Llama.cpp is much slower? )))

Reply

[-]

Hoodfu@reddit

yeah on mac it's much slower. mlx converted safetensors utilize all of the M\* chip cores whereas ggufs etc do not so you're losing about 30% of the possible speed.

Reply

[-]

egomarker@reddit

Sigh, you have no idea what you are talking about.

Reply

[-]

Velocita84@reddit

Well that has nothing to do with the GGUF ecosystem then does it

Reply

[-]

Hoodfu@reddit

There are an ever growing number of unsloth releases that have been converted to mlx. As macs are increasingly utilized here, one hopes that the unsloth will also support mlx with their new tool.

Reply

[-]

arkham00@reddit

try this [https://github.com/jundot/omlx](https://github.com/jundot/omlx)

Reply

[-]

footyballymann@reddit

Wait for real? I don’t own a Mac but how did this interact with the whole metal thing? Mlx?

Reply

[-]

BlobbyMcBlobber@reddit

I'd say advanced users probably use vllm or sglang.

Reply

[-]

the__storm@reddit

I use vllm at work, but llama.cpp at home - Vulkan is very convenient.

Reply

[-]

BlobbyMcBlobber@reddit

Same. llama.cpp is much easier to tinker with.

Reply

[-]

atape_1@reddit

same. Going to be honest, I've heard of LM Studio, googled it once, and ignored it ever since.

Reply

[-]

Eupolemos@reddit

Eh, LLM Studio was nice. You could visually see your options like flash attention, cache quantization, context length, gpu offload. Even predicts the mem usage of your settings. But now I can't update it nor fully uninstall it, and every time I say something to my model, the ethernet spikes 😐 So it's high time to move onwards to llama.cpp

Reply

[-]

uncoolcat@reddit

I had a similar issue with LM Studio where I was unable to update it. You may have tried this already, but in my case to correct it: first a reboot (to ensure all LM Studio setup-related processes/files had been closed) and then manually downloaded the latest setup from their website and reinstalled it. All of my settings/models/etc were retained after the reinstall. I haven't noticed any network spikes while using it, though now I'm curious and will do a wireshark and/or procmon trace to see if similar is happening with my install.

Reply

[-]

Eupolemos@reddit

Username doesn't check out.

Reply

[-]

atape_1@reddit

Ethernet spikes? There goes all my fanfiction porn...

Reply

[-]

iMakeSense@reddit

oh damn didn't know about the ethernet spikes. Oops.

Reply

[-]

Eyelbee@reddit

Yeah it's closed source freeware. Probably has a decent ui but no reason to use it over llama.cpp

Reply

[-]

Cool-Hornet4434@reddit

The only thing lm studio does right is MCP... I've not seen any other that works as smoothly with the same protection as Claude's MCP tools. But yeah, lm studio is a closed source UI on top of open source.... also lm studio's SWA implementation sucks

Reply

[-]

marhalt@reddit

It allows for a lot of flexibility. I can load models, use the backend for my own scripts, see what the server receives and send, change the model, use a small model to do something and a big model to do something else, both loaded into memory... All of it in a nice UI, with easy to see settings... I don't get the snobbery of people for good GUI tools. Not everything has to be a CLI, and this is one of those cases where I have no interest in learning the 3,200 command line parameters I need to run llama.cpp to use a MLX model or to run a model with a different context length and different parameters... The whole idea of CLI was for simple, easy to use and chain tools. Loading LLMs is the opposite of that - it needs an intuitive interface unless people are willing to invest a lot of time to master commands of 100+ characters.

Reply

[-]

ldn-ldn@reddit

In the world where you need multiple runners, advanced options and features. But if you're a casual, go ahead, run llama.cpp directly.

Reply

[-]

Far-Low-4705@reddit

tbf, if you are using it on the same machine you are running, a standalone, self contained app is more convenient even if you're an "advanced user"

Reply

[-]

inevitabledeath3@reddit

I thought vLLM was mainly used with safetensors, not with GGUF. This may be out of date but I heard that vLLM support for GGUF was experimental compared to llama.cpp or even SGLang.

Reply

[-]

j_osb@reddit

Eh. It does have somewhat decent support nowadays. Overtaken llama.cpp in speed outside of TTFT on a lot of gguf models nowadays for me.

Reply

[-]

Broad_Fact6246@reddit

IYKYK! I used LM Studio as a more capable OpenClaw before OpenClaw became a thing. When I need granular HITL, it has been my go-to from the beginning. (AnythingLLM was promising but unnecessarily complicated). I load 5 to 10 MCP tools in an LMS chat and build, configure, and test entire projects, bootstrapping backends and even using Playwright to configure web UI's in some cases. LMS Tool calls are very verbose for me to interactively stop and steer agents before catastrophe, looping, or for getting on-track. Easy GUI to experiment with temperature and parameters and easily reload. Linux Natives and devs who know how to read the tool outputs can drive LM Studio very effectively and augment their dev abilities. Script kiddies who need a black box that does everything without a brain cell fired on their part won't do so well...and I wouldn't consider them "advanced users." That's only on the front-end. LMS expanded the REST API server and now allow has parallel processing. Serving up my local compute for Openclaw has been more stable in LM Studio than vLLM so far. I haven't tried Ollama because I don't need to. (Does ollama still import models as sha256 files? That was so inconvenient.) I only wish the GPU slicing and provisioning across multiple GPUs was more granular in LMS like in vLLM or whatnot.

Reply

[-]

Old-Storm696@reddit

Fair point - LM Studio was more for the "average advanced user" who wanted a GUI. The真正的 hardcore folks were always on llama.cpp CLI or vLLM. Unsloth Studio having training + inference in one tool could change that dynamic though.

Reply

[-]

muntaxitome@reddit

Well as far as LLM users go just merely using local llm's probably puts you in the top 1% most advanced users

Reply

[-]

robberviet@reddit

OP mean ollama for sure.

Reply

[-]

ProfessionalSpend589@reddit

In the same world where React and Linux are the two notable projects which something surpassed in stars…

Reply

[-]

Decaf_GT@reddit

In the world where people like OP use LLMs to write their Reddit posts for them and don't realize that it's filled with Twitter/LinkedIn Hypebeast language. In general, you should disregard any post/tweet that has the word "game changer" in it. The game has been "changed" so many times now...

Reply

[-]

AurumDaemonHD@reddit

Sglang - radix agentic once in blue moon i manage to get a wuant that works with some config Vllm - best community support for tp Llamacpp - cpu offloading

Reply

[-]

Flimsy_DragonFly973@reddit

I dunno but I’ve been giving it a run for the last few hours. Considering it has a GUI for post training I can see something like me making my own model with nanochat and then post training with Unsloth studio. Nanochat + Unsloth Studio = MEGATRON-GPT

Reply

[-]

quasoft@reddit

Is it possible to use the chat Feature with CPU only? Tried running \`unsloth studio setup\`, but says it does not support CPU only, and refuse to do one time setup (both pip package and install -e from main branch).

Reply

[-]

user92554125@reddit

I recall enabling long paths being a part of the default options in the Python setup/installation on Windows. It is probably related to that, but I could be wrong. Can't check or reproduce as i'm not on Windows.

Reply

[-]

jeffwadsworth@reddit

If you find out please post here

Reply

[-]

Adventurous-Gold6413@reddit

OH MY GOD A WEBUI FOR TRAINING!!! Yess

Reply

[-]

danielhanchen@reddit

+ Inference :) + Synthetic Data Gen + Exporting + Training + Much much much more!

Reply

[-]

stopbanni@reddit

Is training like finetuning or from scratch?

Reply

[-]

spaceman_@reddit

This is awesome, looking forward to AMD support to play with this! Thank you all at Unsloth for all your work!

Reply

[-]

Potential-Bet-1111@reddit

Daniel I have tons of idle compute. Tell me something cool I can do with your product.

Reply

[-]

power97992@reddit

Does it support mlx and mlx fine tuning and quantization and mechanistic interpretation tools ?

Reply

[-]

BillDStrong@reddit

They are working on mlx, not currently supported. I believe yes to quantization. Not sure yet about the others, downloading it now.

Reply

[-]

ZachCope@reddit

Commenting for posterity in Daniel Unsloth’s shadow

Reply

[-]

dreamai87@reddit

Thanks it’s really great, I did not have any issue running, tested chat model finetuning and synthetic data generation text2sql ocr.

Reply

[-]

ReasonablePossum_@reddit

Any privacy features?

Reply

[-]

Finanzamt_Endgegner@reddit

you are amazing!!!

Reply

[-]

Adventurous-Gold6413@reddit

Thanks so much! Been waiting for this for over a year!!!

Reply

[-]

exaknight21@reddit

I <3 u yall

Reply

[-]

mmkzero0@reddit

WHAT

Reply

[-]

Old-Storm696@reddit

Finally! The GGUF ecosystem has needed a proper training UI for years. LM Studio has been great for inference but training was always CLI-only or required external tools. Unsloth's training optimizations (LoRA, QLoRA) combined with a real UI is going to democratize fine-tuning for everyone who isn't comfortable with the terminal.

Reply

[-]

sean_hash@reddit

Having fine-tuning and inference in the same tool is nice, right now you need like three different projects to get that working

Reply

[-]

emprahsFury@reddit

Mozilla has had Transformers Lab for tears now.

Reply

[-]

user92554125@reddit

Pretty awesome! Had no idea it existed. Thanks.

Reply

[-]

FullOf_Bad_Ideas@reddit

tbh Llama-Factory was doing it for more than a year now. They don't have dataset builder flow though. And it's probably less polished.

Reply

[-]

danielhanchen@reddit

More coming soon to it!!

Reply

[-]

neuralnomad@reddit

Wait no AMD or vullkan? (Yet?) *kicks rocks while everyone else plays*

Reply

[-]

aliensorsomething@reddit

Interesting, if it had an installer and simple way to use I would use it over LM Studio because it has support for tts.

Reply

[-]

FatheredPuma81@reddit

Nice: \--- cmake configure --- \[FAILED\] llama.cpp build failed at step: cmake configure (0m 15.4s) To retry: delete C:\\Users\\Fathe\\.unsloth\\llama.cpp and re-run setup.

Reply

[-]

DeliciousMagician925@reddit

I got the same issue. Reinstalled Cmake with windows installer and no luck so far. Any suggestions?

Reply

[-]

FatheredPuma81@reddit

Wait. I guess they're working on it.

Reply

[-]

oh_my_right_leg@reddit

One of LM Studio's best features is the possibility to use LM Studio on a remote PC as a server and then seamlessly using LM Studio on another PC as the client. It would be awesome if you could replicate such functionality.

Reply

[-]

kavakravata@reddit

To a noobie, what does this have except for being OS, than Lm studio / jan.ai / opencode doesnt have?

Reply

[-]

Ylsid@reddit

I hope so, I'm tired of LM studio being the easiest way to test models while being closed source bs

Reply

[-]

Trick-One7944@reddit

Just keep in mind, free is relative depending on your requirements. No world is pro up to 8 gpu and free is up to 7. So we can expect a pay to play in here for the options many of us were thinking might become point and click. That ease of use gui has a cost attached, they earned it, but just a reality check https://unsloth.ai/pricing?hl=en-US

Reply

[-]

Hefty_Acanthaceae348@reddit

Very exciting, I was thinking that there were a lot of use cases were synthetic data generation (and therefore finetuning) should be easy, but finetuning isn't all that approchable. I do hope there will also be something added for easy customization of tokenizers too.

Reply

[-]

jduartedj@reddit

wait this actually includes training AND inference in the same app? thats huge if the training part works well. i've been using their notebook stuff for LoRA finetuning and its already way faster than vanilla peft, so having a proper UI for it would save me so much time setting up configs manually the synthetic data gen part is what really caught my eye though. being able to generate training data, finetune, and then test the result all in one place... thats basically the entire workflow without needing 5 different tools

Reply

[-]

Dazz9@reddit

Is there any performance loss when compared to running llama.cpp? What is the performance gain when compared to Ollama?

Reply

[-]

redoubt515@reddit

\> Until now, LMStudio has basically been the "go-to" solution for more advanced LLM users in the GGUF ecosystem I wasn't aware. I've always seen it framed as the *entry level* solution for *less technical* people. What gives you the impression LMstudio is the "go-to" for "advanced users"? (or is this an AI generated post and AI is just engaging in empty hyperbole and buzz words?)

Reply

[-]

ImpressiveSuperfluit@reddit

No idea what they mean, but there certainly is the fact that installing a software, being presented with a million models, gigabytes each, whose names tell you nothing, only to then be presented with a bunch of knobs... is already qualifying as quite advanced. It's like asking a random 14 year old to drive on the autobahn. Sure, they've been in a car before, and the autobahn is not exactly a complicated traffic situation, but they also don't know where the freaking break is, so maybe you're just in a bit of a bubble situation there. I think you're taking for granted how much background knowledge is actually required to not get immediately overwhelmed with this, seemingly simple, task. When, in reality, we're entering an era where a good chunk of people have never touched an actual desktop pc. Laptop, at best. Yea, those people are in their 20s now, shit's crazy!

Reply

[-]

TechnoByte_@reddit

> installing a software, being presented with a million models, gigabytes each, whose names tell you nothing, only to then be presented with a bunch of knobs... is already qualifying as quite advanced. That's the basic skills you need to run LLMs locally though. It's really not that hard considering how many guides there are out there and since people can just ask others or an LLM for help The post specifically says "advanced LLM users" LMStudio is a closed source electron UI designed to make running LLMs as simple as possible, you don't have to deal with the commandline For advanced LLM users, just using llama.cpp or vLLM directly makes a lot more sense

Reply

[-]

ImpressiveSuperfluit@reddit

Everything past openening a browser or simple app can easily be classified as advanced. This very much includes LLMs, anything past typing stuff into ChatGPT is advanced to the vast majority of people. You guys really do take a lot of stuff for granted that just isn't normal anymore. I understand that the vaguely millennial aged people, which includes myself, have a hard time picturing this, but people are already back to not growing up with computers, and have been for a long time. If it doesn't have an app that runs on your phone (or its browser), it's pretty much immediately an advanced topic. And LMStudio isn't plug and play, either. It technically kinda can be, but only if you ignore all the knobs and baubles and already know when you can do so. One OOM error and even the average somewhat advanced user is going home. Genuinely, spaces like this (and foss in general, looking at you, Linux community) would do well to remember that the vast majority of their knowledge can not be taken for granted, no matter how much they want to pretend that it can be. We're just not in the 2000s anymore. Owning a computing device no longer means that you have ANY idea how anything behind the UI works. Having said that, I understand, of course, that it's reasonable to call LMstudio an entry level software when you're talking to people that would ever find themselves here to begin with, but I'm bothered when people pretend that people are just randomly born with all this tech knowledge. You'd be shocked how some people can manage to somehow mess up heating water. Things are just really, really hard when you miss all the fundamentals that everyone just assumes you have.

Reply

[-]

redoubt515@reddit

>I think you're taking for granted how much background knowledge is actually required to not get immediately overwhelmed with this, seemingly simple, task. When, in reality, we're entering an era where a good chunk of people have never touched an actual desktop pc. Laptop, at best. Yea, those people are in their 20s now, shit's crazy! You make a good point, I see where you are coming from.

Reply

[-]

C_Coffie@reddit

That awesome! My only question is, will it support Strix Halo?

Reply

[-]

yoracale@reddit

Inference should work, training works in Unsloth, will work on Unsloth Studio a bit later

Reply

[-]

ComprehensiveBed5368@reddit

what about intel Arc gpus? inference and training

Reply

[-]

Odd-Ordinary-5922@reddit

llamac++ runs on the backend so yea

Reply

[-]

Legumbrero@reddit

Thanks for this guys, will definitely be trying it out but featureset and license already sounds really great!

Reply

[-]

ab2377@reddit

💯👍🥺🍩🤞

Reply

[-]

Just-Winner-9155@reddit

Unsloth Studio's Apache license and Llama.cpp compatibility are solid wins for open-source folks. I've been using it for smaller models—less resource-heavy than LMStudio, which is great for budget builds. The UI feels more streamlined, but LMStudio still nails multi-model management. If you're running 7B+ models, stick with LMStudio; for 3-5B or lower, Unsloth is a smoother ride. Just watch VRAM usage if you're mixing model sizes.

Reply

[-]

skillshub-ai@reddit

Really excited about this for domain-specific agent fine-tuning. There are thousands of high-quality SKILL.md files on GitHub now (structured agent instructions from Anthropic, Microsoft, Trail of Bits, etc). Fine-tuning a small model on 100 of these so it internalizes the methodology instead of just following instructions at runtime could be a game-changer for local agent quality.

Reply

[-]

Own-Relationship-362@reddit

This is exciting for fine-tuning models on domain-specific skills. I've been working with SKILL.md files — structured markdown that teaches agents specific methodologies — and the biggest bottleneck is that base models don't follow them consistently. A good fine-tuning UI could let you train a model specifically on your skill set. Imagine fine-tuning Llama on 100 SKILL.md files so it internalizes the methodology instead of just following instructions.

Reply

[-]

bityard@reddit

Warning: If you're like me and like to maintain strict control over your machine and home directory for both safety and security reasons, you are NOT going to want to follow the installation instructions in the docs blindly! Even after you've cloned the repo and installed the dependencies, the setup script installs even _more_ things _outside_ your virtualenv, such as node/npm **without asking**. Probably best to use the docker image, or install this in a VM if you are just testing it out. (Note: I am not implying the unsloth guys have any malicious intent whatsoever, I was just very surprised to see a Python project installing all kinds of extra stuff on my computer without at least telling me first.)

Reply

[-]

Wemos_D1@reddit

Hello I tried it and it's already a good start, I really like the new features that we don't find at any other places, and the fact it loads the correct settings directly makes me happy. But I have 2 things that I think might be usefull to implement: 1) Ability to link existing models that are already downloaded locally and be able to have it in the dropdown. 2) When I use a model in the chat, it doesn't use the graphic card ? I saw in the documentation it is said but why isn't it used ? Thank you again for your good job and I wish the project will be popular enough to be a competitor to lm studio.

Reply

[-]

Sandyyy_9866@reddit

Interesting! Unsloth Studio could shake things up, especially since it's compatible with llama.cpp. I've been using CLBlast for optimizing operations in RAG pipelines, and it's crazy how much a good runner can improve efficiency. If Unsloth nails the UI and execution speed, it might streamline workflows for those of us constantly tweaking our models. Definitely keeping an eye on this one.

Reply

[-]

fluecured@reddit

Can you use it with an older CPU without AVX instructions?

Reply

[-]

HopePupal@reddit

okay i gotta ask, and this is as someone who does have use cases for CPU inference: how did you find a CPU that old? and what is the rest of your setup like? AVX started shipping from AMD and Intel in _2011_. https://www.reddit.com/r/LocalLLaMA/comments/1muwsj6/llamacpp_nonavx_processors/ apparently you're not the only one and llama.cpp itself still supports pre-AVX CPUs with the right build options but _dang_

Reply

[-]

fluecured@reddit

I have an Intel Core i7 930 (Bloomfield) with 12 GB RAM and an Nvidia RTX 3060 with 12GB VRAM. Using Oobabooga's webui and ExLlama, I can run many exl2 or exl3 models up to 12B-14B parameters if there are suitable quants available. (Sadly, Oobabooga's dropping support for ExLlamav2, which has ten times as many models as v3.) The sweet spot for the 3060 is a Mistral-type 12B at 5.0bpw with 32768 Q4 context. Muse-12B and Mag-Mell-R1 flavors are astounding small models. I tried to build my own llama.cpp binary but got stuck configuring curl and Gemini was unable to help unstick me. Thus, I've never been able to load a gguf, only gptq, exl2, or exl3. ExLlama is pretty great since I can quantize the kv_cache to q4 and not notice any difference. I've saved up for a lab-worthy computer, but each time I've been ready to buy/build, this one guy does something wacky that turns everything into a confusing lottery. I'm kind of old and housebound, and that inertia makes it harder to deal with the scarcity of components. The *"I'm getting too old for this shit"* is real.

Reply

[-]

HopePupal@reddit

ahh with the 3060 doing most of the work, makes sense that lack of AVX wouldn't be a total showstopper. and yeah… really tired of everything constantly being on fire for no reason. felt.

Reply

[-]

DrBearJ3w@reddit

I purely use LMStudio out of convenience, simplicity and good Windows support. I tried ollama,few times, and it always surprised me that there are always some bug here and there that I should fix. Switched to llama CPP and never looked back. Unsloth on the other hand can be main hub for ALL operations. We will see.

Reply

[-]

IONaut@reddit

So does this allow you to run a local API endpoint?

Reply

[-]

General_Arrival_9176@reddit

unsloth studio is their training UI getting a run UI. makes sense - they have the quantize/finetune stack already, adding a playground on top is low effort. the real question is whether it competes with lmstudio for inference. lmstudio has the model discovery and easy setup thing locked down. unsloth's advantage is their quantization quality. well see if they can bridge that. i use lmstudio daily but their quantization is not unsloth level so if they ship good quants with the ui it could be a real alternative.

Reply

[-]

FRAIM_Erez@reddit

How many more ‘Studio’ apps do we need before gta 6?

Reply

[-]

fiery_prometheus@reddit

An apache 2 license and completely open source?? Praise the llamas!

Reply

[-]

FullOf_Bad_Ideas@reddit

Unsloth Studio is AGPL 3.0.

Reply

[-]

fiery_prometheus@reddit

The license in the repo says apache 2.0 [https://github.com/unslothai/unsloth/blob/main/LICENSE](https://github.com/unslothai/unsloth/blob/main/LICENSE)

Reply

[-]

FullOf_Bad_Ideas@reddit

"Unsloth now licensed under AGPL-3.0? No. The main Unsloth package is still licensed under Apache 2.0. Only certain optional components, such as the Unsloth Studio UI, are under the AGPL-3.0 open-source license. Unsloth now has dual-licensing where some parts of the codebase are licensed Apache 2.0, while others are licensed AGPL-3.0. This structure helps support ongoing Unsloth development while keeping the project open-source and enabling the ecosystem to grow." https://unsloth.ai/docs/new/studio (FAQ section)

Reply

[-]

fiery_prometheus@reddit

Would be great if this was more clear, a lot of people aren't going to dig around more than finding a license file in the root of the repo.

Reply

[-]

FullOf_Bad_Ideas@reddit

Yeah, since it's a single repo and this information is clearly laid out mainly in a blog post, sooner or later someone will take Unsloth Studio code and unknowingly act like it's Apache 2.0 /u/danielhanchen /u/yoracale As it's apparent, licensing of Unsloth Studio isn't clear to everyone who just goes on Github and skips your blog posts, so I think you should consider increasing readability of the github repo to make it more obvious.

Reply

[-]

yoracale@reddit

https://preview.redd.it/s4gorb3cropg1.png?width=694&format=png&auto=webp&s=3bb793f6f0ab663c020f9a7cfb4297fab2000685 Thank you for the feedback, if you go to our github mainpage and look at the sidebar, it says AGPL3 licensing found, also our notebooks state it as well but we'll make it more prominent. Also we put the AGPL3 license header in nearly all the file components of studio

Reply

[-]

FullOf_Bad_Ideas@reddit

Yup, I noticed the sidebar, though when I clicked on the AGPL 3.0 it was not clear that this is related to Unsloth Studio exclusively. With a single repo and 2 licenses it will be hard to communicate licensing to users in an unobtrusive way. A note in readme would make it clear but it also may be too obtrusive for you. I would think that you're doing all the right calls here, yet, in this comment chain someone didn't catch it so clearly it's not 100% clear to everyone.

Reply

[-]

yoracale@reddit

Thank you for the feedback, if you go to our github mainpage and look at the sidebar, it says AGPL3 licensing found, also our notebooks state it as well but we'll make it more prominent.Also we put the AGPL3 license header in nearly all the file components of studio https://preview.redd.it/uusw3eeoropg1.png?width=634&format=png&auto=webp&s=f4511ee4085cd6a85c9d386b6a08538aedf392a6

Reply

[-]

PM_ME_UR_COFFEE_CUPS@reddit

Apache license???? Sweeeeeeeet

Reply

[-]

ghulamalchik@reddit

Gobbless

Reply

[-]

alew3@reddit

LM Studio = inference, Unsloth Studio = training

Reply

[-]

rorowhat@reddit

Can it also finetune locally?

Reply

[-]

Best-Echidna-5883@reddit

Please fix the apparent contradiction regarding running this with CPU only for Windows. Thanks. The Windows installation requires an NVIDIA GPU.

Reply

[-]

jeffwadsworth@reddit

It states that it works for cpu only for chat but how do you install the Studio? Perhaps not possible.

Reply

[-]

jeffwadsworth@reddit

Wow. This is a dream come true. Finally something akin to that Mac Inference app.

Reply

[-]

z_latent@reddit

So glad it's open-source. Thank you Unsloth team :)

Reply

[-]

soyalemujica@reddit

Does this mean that it can run NVFP4 in Blackwell ?

Reply

[-]

danielhanchen@reddit

Not yet, but soon!

Reply

[-]

Secure_Archer_1529@reddit

So for the dgx spark nvfp4 is a no go atm. Have you solved this with nvidia or is it still the same issues carrying over to unsloth until nvidia finally fixes it? Unsloth studio sounds amazing though! Thanks for all the work you guys put into it.

Reply

[-]

inevitabledeath3@reddit

There is actually a VLLM fork with proper NVFP4 support. Here: https://github.com/jleighfields/vllm-dgx-spark I haven't gotten to try it myself yet, but if this works like they say it does then it's a viable solution.

Reply

[-]

drink_with_me_to_day@reddit

What to use in Windows? I've tried vllm, lm studio,but they fail to install or load models

Reply

[-]

NoahFect@reddit

This works in Windows (I have t running now), but you probably don't want to set it up yourself. See my other comment.

Reply

[-]

SourceCodeplz@reddit

Took Codex about 51 minutes to install it on a fresh Windows 11 with a SATA SSD. https://preview.redd.it/61gn9f4x3opg1.png?width=1267&format=png&auto=webp&s=09963f8df3411e57dbc59a27e21717c6ea57dfe7

Reply

[-]

Certain-Cod-1404@reddit

How are you guys so consistently good at everything you do ? you guys are a blessing to the open source community, thank you so much !

Reply

[-]

dreamai87@reddit

I really like the way they brought finetuning so easy for people on consumer level hardware and always shares the colab notebook. No doubt for gguf bartwaski and unsloth are always the first choice, though I appreciate others those are contributing in this space, kudos to all of you 👏 My first preference is always and will always be llama.cpp. Sure now unsloth studio will be the another one that allows finetuning/validating/inferencing models. Its great to see how everyone pushing the boundaries and making this stack accessible

Reply

[-]

toothpastespiders@reddit

>I really like the way they brought finetuning so easy for people on consumer level hardware and always share the colab notebook. Same here. Personally I just prefer using axolotl, but those notebooks are an amazing resource to help people understand how training works.

Reply

[-]

ItankForCAD@reddit

The blog post is confusing. It states that chat inference is supported by llama.cpp and transformers. However, the installation section mentions that AMD, Intel and etc support is coming soon. Is the upcoming support aimed at training or inference as well? It seems strange that only the cuda version of llama.cpp is built at installation. Building the Vulkan backend would allow all gpus to work for inference at least. Can an external llama-server instance be pointed at unsloth studio?

Reply

[-]

yoracale@reddit

For AMD and intel it's related to training. Our main unsloth package works with AMD and intel, but not Unsloth Studio. Currently inference should work for AMD devices but haven't verified

Reply

[-]

seanthenry@reddit

So llama.cpp will be build for ROCM if available?

Reply

[-]

shing3232@reddit

LMstudio is just front end for inference and Unsloth is training. How could you get it mix up？

Reply

[-]

mrkstu@reddit

Will give it a go on my Windows box, but would love it if you add proper Mac MLX/training support and be able to move over completely.

Reply

[-]

HopePupal@reddit

u/danielhanchen happy to see AMD support for inference listed, even if it's "coming soon"! are we likely to see AMD support for fine-tuning in Studio ever, or should we plan to keep using the unsloth amd Python branch directly?

Reply

[-]

RandumbRedditor1000@reddit

I hope this means more people will start fine-tuning

Reply

[-]

hugthemachines@reddit

"gamechanger" That's a red flag.

Reply

[-]

ilintar@reddit (OP)

You are absolutely right! 👍🚩

Reply

[-]

mtomas7@reddit

If US Studio can also run safetensor models, is there a particular reason to use GGUF models?

Reply

[-]

RandumbRedditor1000@reddit

Inference speed and vram requirements

Reply

[-]

jabr7@reddit

Am I reading this right? It also has an UI for fine-tuning? Is it PEFT or?

Reply

[-]

SGmoze@reddit

would it be possible to connect to colab notebook and trigger training from this?

Reply

[-]

separatelyrepeatedly@reddit

Does it havee API support?

Reply

[-]

Adventurous-Paper566@reddit

J'espère qu'on pourra attribuer une configuration GPU pour chaque modèle.

Reply

[-]

unspkblhorrr@reddit

I am a no-code local AI user so bear with me, learning this stuff like as I go... is this preferable to Open webUI? Does it have RAG?

Reply

[-]

revilo-1988@reddit

Ich mag LmStudio jedoch find ich super wenn es weitere Mitbewerber gibt

Reply

[-]

Zestyclose_Yak_3174@reddit

Awesome! Hopefully becomes a solid replacement for LM studio on my Mac

Reply

[-]

RevolutionaryLime758@reddit

Because it has studio in the name? Is that your thought process? Lmao.

Reply

[-]

BringMeTheBoreWorms@reddit

Would be nice if they could do a llamacpp compile for Vulcan and rocm instead of just cuda. It’s not a hard thing to add basic support for these days. The nvidia monopoly needs a bit of a kick in the nuts

Reply

[-]

Nodja@reddit

If the unsloth team wants this to succeed they have to make it piss-easy to install for the average user. This could be a good gateway app for getting people into training models and stuff, like A1111 did for lora trainers back in the day. Lucky for them uv exists and they can just bundle the code + uv and let uv do the hard work of installing python and setting up the venv then bootstrap the app the same way they're doing it now. I have lm studio installed and use it's local server to semi-automate certain tasks, lmstudio lets me easily load/unload models and have TTLs/etc. + the models come with sensible defaults (I only really change the context size) and makes it painless to try out new models without fucking around with llama.cpp params, hopefully unsloth studio will reach parity and I can get rid of the only closed source LLM software I have installed.

Reply

[-]

_hephaestus@reddit

I think here I pretty much just see lmstudio for mlx inference, and oMLX has replaced that for me. Really curious about mlx training going forward though.

Reply

[-]

DMmeurHappiestMemory@reddit

This is bonkers.

Reply

[-]

ArsNeph@reddit

This is genuinely amazing, props to Unsloth team for single-handedly propping up the .gguf and fine-tuning local ecosystem! I'll definitely give this a try and provide feedback when I get a chance!

Reply

[-]

Smashy404@reddit

Does it support voice?

Reply

[-]

xXprayerwarrior69Xx@reddit

fuck yes

Reply

[-]

the_real_druide67@reddit

But is the GUI really what matters here, or is it the underlying engine? For example on Apple Silicon, you can wrap llama.cpp in the prettiest UI you want : native MLX will still be significantly faster on the same model and quantization. The real competition isn't between GUIs, it's between inference backends.

Reply

[-]

Danmoreng@reddit

React frontend python backend 😩

Reply

[-]

Significant_Fig_7581@reddit

Oh I'm in

Reply

[-]

danielhanchen@reddit

:)

Reply

[-]

Technical-Earth-3254@reddit

Damn, love to see it

Reply

[-]

danielhanchen@reddit

Thanks!!

Reply

[-]

sine120@reddit

LM Studio has been my "I'm lazy and want to try this" solution that I find a little easier to test MCP's. If I'm actually minmaxxing my inference speed and want the bleeding edge of new models, I have to use llama.cpp. I love llama.cpp, but I hate messing with the commands, guessing and checking VRAM usage, etc. If someone else can come along and make it easier for me to get the performance of latest llama.cpp, host a chat page/ web search mcp's, OpenCode endpoings, etc, I'll be a happy man.

Reply

[-]

Helicopter-Mission@reddit

Agreed. It’s much easier to test different models and configs. vLLM your model when you have something you want to run for a while and already know how you configure it. If I’m model hopping, I’d just lm studio.

Reply

[-]

Potential-Leg-639@reddit

New super power unlocked by Unsloth. Congrats!

Reply

[-]

CalvaoDaMassa@reddit

A competitor? Man, I think that Unsloth Studio will become the #1 tool easily.

Reply

[-]

Relevant-Audience441@reddit

just another webui ¯\\\_(ツ)\_/¯

Reply

[-]

ambassadortim@reddit

Does this one have a web interface?

Reply

[-]

TopChard1274@reddit

isn’t LMStudio running over Llama.cpp? How’s adding another Llama.cpp wrapper into the mix a “game changer”?

Reply

[-]

Odd-Ordinary-5922@reddit

because lmstudio is closed source

Reply

[-]

Pro-editor-1105@reddit

holy shit this is amazing

Reply

[-]

Right-Law1817@reddit

This is awesome.

Reply

[-]

qwen_next_gguf_when@reddit

Go-to my blad azz.

Reply

[-]

egomarker@reddit

It's not a competitor for LM Studio, this one has emphasis on nvidia and training, LM Studio has emphasis on MCP support and good built-in api server.

Reply

[-]

Specter_Origin@reddit

This is awesome, i just hate the closed source nature of lm-studio

Reply

[-]

EffectiveCeilingFan@reddit

It isn't trying to compete with LMStudio tho. The ability actually run LLMs is just one of the features. It's moreso a model training workspace.

Reply

[-]

ilintar@reddit (OP)

It might not end up trying to compete with LMStudio, but it's ability to run GGUFs combined with Apache License will make it an automatic competitor.

Reply

[-]

_raydeStar@reddit

This one looks like it's focused on sanitizing training data and running it. In that case it's not quite apples to apples comparison. Definitely interested in playing with it. I've only ever trained image models.

Reply

[-]

Zemanyak@reddit

Oh, interesting. I'm gonna try it.

Reply

[-]

krileon@reddit

Another webui.. weeooooo.. pass. Waiting for the day LM Studio or something like it also implements image generation. Still lacking an all-in-one tool that's just a simple install and run.

Reply

[-]

willitexplode@reddit

Cool, I've been seconds away from ending my laziness and ditching lmstudio for more cli work. Now I don't have to, yay!

Reply

[-]

netikas@reddit

More like good ol' text-generation-webui

Reply

[-]

Emotional-Breath-838@reddit

Oh hell yes. Do want.

Reply

Reply to Post

270 Comments