TheaterFire

Unsloth announces Unsloth Studio - a competitor to LMStudio?

Posted by ilintar@reddit | LocalLLaMA | View on Reddit | 270 comments

Until now, LMStudio has basically been the "go-to" solution for more advanced LLM users in the GGUF ecosystem, but Unsloth releasing an (Apache-licensed) runner compatible with Llama.cpp might actually be a gamechanger.

Reply to Post

270 Comments

danielhanchen@reddit

Oh hey! There's a tonne of features in it! 1. Chat UI has auto healing tool calling, **Python & bash code execution**, web search, image, docs input + more! 2. Finetune audio, vision, LLMs with an **Auto Assist data prep** (all local) 3. Supports GGUFs, **Mac, Windows, Linux** \+ **audio generation** as well 4. Has SVG rendering, exporting to GGUF inside of it 5. gpt-oss harmony rendering, all inference params are pre-set to recommended ones 6. A Data designer + **synthetic data generation** system! 7. Fast parallel data prep as well + embedding finetuning! 8. And much more at https://github.com/unslothai/unsloth. To install it, try: ​ pip install unsloth unsloth studio setup unsloth studio -H 0.0.0.0 -p 8888
View on Reddit #80916893

thereisonlythedance@reddit

Brilliant. Any plans to integrate RAG? Apart from OpenWebUI (which has its issues) there’s not a great UI that supports it.
View on Reddit #80918622

danielhanchen@reddit

Definitely yes :)
View on Reddit #80918919

mecshades@reddit

In RE: to RAG, please consider allowing parsing a directory. The company I work for has a messy file store of documentation semi-organized in directories by name. Open WebUI fails because it requires the user to manually upload & manage files or we are forced to use an API to upload files individually. Ideally, the best RAG should let us point to a directory and process all knowledge in that directory, eliminating the need of hitting an API or uploading manually. That should be baked in, IMO!
View on Reddit #80931847

pepe256@reddit

It sounds like your workflow could benefit from using an agentic harness like Claude Code/Codex/OpenCode/etc. They're able to look at whole directories, and grep to find info, run any other commands, or read whole files. I heard there are RAG MCPs you can connect them to. I don't personally use any because I don't need them
View on Reddit #80936381

Psychological-Lynx29@reddit

Yeah, this is not so cool when you dont want to share private information.
View on Reddit #80960721

zxyzyxz@reddit

But you can do all that locally too, harnesses and MCPs aren't limited to the cloud.
View on Reddit #84190575

richardstevenhack@reddit

Look at MSTY KnowledgeStacks.
View on Reddit #80973695

BrewboBaggins@reddit

Where do I type this in at? a command prompt? power shell? is it going to ask me where I want to install it to? is going to install everything in a virtual environment or will it overwrite all the stuff I already have installed. How about a "windows.exe" like LMS,Jan,Kobold,etc for us non coders. or at least a windows.bat like Ooba. Help a NonTechBro out here...
View on Reddit #80971577

Refefer@reddit

Am I reading this right? Linux, Mac, and Windows work out of the box?
View on Reddit #80917730

danielhanchen@reddit

Yes! So finetuning not yet if you don't have a GPU, but CPU inference works on all 3!
View on Reddit #80918026

power97992@reddit

What kind of gpu? Cuda only? What about mlx and MPS? 
View on Reddit #80933479

inevitabledeath3@reddit

They only support Nvidia for now but are planning to add Intel, AMD, and Apple support in the future.
View on Reddit #80939178

NoahFect@reddit

What's up with this? [WARN] Node v22.9.0 / npm 10.8.3 too old. Installing Node.js LTS via winget... [ERROR] Could not install Node.js automatically. Please install Node.js >= 20 from https://nodejs.org/
View on Reddit #80934616

FORNAX_460@reddit

So there is no option to load already downloaded models for chatting!? All i see is that its only reading from the hf cache directory.
View on Reddit #80932973

anantj@reddit

Does it support AMD gpus on windows? Is Fine tuning supported on windows+AMD?
View on Reddit #80930114

white_december@reddit

ANE inference support?
View on Reddit #80928652

Hot-Section1805@reddit

This happens on my Mac Mini: Collecting xformers>=0.0.27.post2 (from unsloth) Downloading xformers-0.0.35.tar.gz (4.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 4.7 MB/s 0:00:00 Installing build dependencies ... error error: subprocess-exited-with-error × installing build dependencies for xformers did not run successfully. │ exit code: 1 ╰─> [4 lines of output] Collecting setuptools>=64 Using cached setuptools-82.0.1-py3-none-any.whl.metadata (6.5 kB) ERROR: Could not find a version that satisfies the requirement torch>=2.10 (from versions: 1.9.0, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.1, 2.8.0) ERROR: No matching distribution found for torch>=2.10 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'xformers' when installing build dependencies for xformers
View on Reddit #80926323

Devatator_@reddit

So, I could install this on my VPS and access it from any device?
View on Reddit #80926212

Succubus-Empress@reddit

Exl2?
View on Reddit #80918374

danielhanchen@reddit

For now llama.cpp is powering it - we'll add vLLM, but I can check Exl2!
View on Reddit #80918952

Zestyclose_Yak_3174@reddit

Please try to also support MLX in the future. This will make your app so much better for Apple Silicon users. I use both Llama.cpp and MLX for inference, I recon many others too. Especially since your quants are just better.
View on Reddit #80926049

ambassadortim@reddit

Does this tool have a web interface so I can use it from my phone?
View on Reddit #80920115

danielhanchen@reddit

Yes yes it can be accessed throughout the world!
View on Reddit #80922090

BobbingtonJJohnson@reddit

Does that include GGUF finetuning?
View on Reddit #80923650

fiery_prometheus@reddit

It would be awesome, if you can do checks for contamination as part of the suite, would do the community a huge favor, having an easy way to check and understand these things.
View on Reddit #80922441

-Cubie-@reddit

Does it also work with the new embedding finetuning?
View on Reddit #80920910

danielhanchen@reddit

Yes it does!
View on Reddit #80922100

Pro-editor-1105@reddit

Another banger from you guys!
View on Reddit #80918121

danielhanchen@reddit

Thanks!
View on Reddit #80918306

j_osb@reddit

In what world was LM Studio the go-to solution for 'advanced' users? That was always vLLM or directly llama.cpp.
View on Reddit #80913841

TheLexoPlexx@reddit

In the same world of people that don't know ollama wraps llama.cpp
View on Reddit #80914268

k_means_clusterfuck@reddit

"Actually ollama is newer and c++ is old it's from likevel the 20th century" type shii
View on Reddit #80915821

RedParaglider@reddit

C++ is just a wrapper for Assembly.
View on Reddit #80932068

NewMetroid@reddit

Assembly is just a wrapper for binary.
View on Reddit #80933506

DominusIniquitatis@reddit

Binary is just a wrapper for electricity.
View on Reddit #80940935

thrownawaymane@reddit

Electricity is just a wrapper for electrons
View on Reddit #80948752

Ill_Barber8709@reddit

Electrons are just a wrapper around the nucleus.
View on Reddit #81218218

apidekachu@reddit

Electrons are just a wrapper around strings
View on Reddit #83952704

NoahFect@reddit

HDL: "Oh hai. I didn't see you all the way over there."
View on Reddit #80934026

j_osb@reddit

If I am forced to use VHDL over verilog (or like, chisel) again I WILL put that person to justice.
View on Reddit #80935743

NoahFect@reddit

Yeah, you'll notice that the letter V is conspicuous by its absence
View on Reddit #80936026

RedParaglider@reddit

Everything is a wrapper. Except Ed. Ed is the default UNIX text editor. It says so right in the man page.
View on Reddit #80936310

temperature_5@reddit

I have no idea what you guys are talking about. Now to get back to wire wrapping the TTL gates for this logic circuit...
View on Reddit #80937527

Electrical-Risk445@reddit

Calling my core memory on that one.
View on Reddit #80940061

NewAlexandria@reddit

LLM's are just compilers that use natural language
View on Reddit #80962209

huffalump1@reddit

"ollama is easier to use, just `ollama run`, llama cpp is more complex" That makes me wanna say: you have all the knowledge of the world at your fingertips, it is not hard to ask an LLM to read llama.cpp docs and give you a command :|
View on Reddit #80923365

Aerroon@reddit

> it is not hard to ask an LLM to read llama.cpp docs and give you a command :| So you gotta first run ollama to tell you how to run llamacpp, I see.
View on Reddit #80943186

Conscious-content42@reddit

It's like dating apps, "the app that is made to be deleted." Once you make your match with llama.cpp
View on Reddit #80949863

edankwan@reddit

It is like when I newly installed Windows to use Edge to download Chrome and then delete Edge
View on Reddit #80951018

Conscious-content42@reddit

Then use chrome to install Firefox and delete chrome😁
View on Reddit #80955776

TheLexoPlexx@reddit

Then delete Windows and use Linux
View on Reddit #80968135

Conscious-content42@reddit

Then use Linux to download a BSD image, use a USB boot drive and overwrite Linux partition.
View on Reddit #81046533

catch-10110@reddit

It’s more like: Ollama is easy and it works great. I have never seen a reason to use llama.cpp directly other than for philosophical open source reasons. If you’re not “into” open source as a philosophy then as far as I can tell Ollama is the best and easiest solution. I am entirely willing to be told that I am wrong about this to be clear. But I have asked before on this sub and the only answer I got came down to people not liking Ollama for reasons of open source philosophy rather than any substantive reason. I would actually love to be told I am wrong. I love learning about all of this as a relative newcomer.
View on Reddit #80948592

qrayons@reddit

I don't understand all the ollama hate here. I feel like ollama is built for ease of use and llama.cpp is built for performance. If I'm messing around with some new models then I don't care if it runs a bit slower on ollama, especially if it saves me the headache of getting stuff working in the first place. I'm sure llama.cpp is super intuitive and easy to use for everyone that has been using it every day for over a year, but that could be said about any software that someone uses daily.
View on Reddit #80953089

tmflynnt@reddit

I hear you and tbh despite being a huge llama.cpp fan and contributor I do realize it has some significant gaps in regard to the intimidation and ease of use factors. It is improving majorly though through things like its evolving web UI, model routing features, "fit" parameters for easy config, and ongoing efforts toward easier install scripts and such, but it still has some ways to go and some of these elements are probably never going to be its top strongsuit. For those reasons I do understand that some new people to the scene will often start out with other apps but I do feel llama.cpp is worth the time investment once people feel comfortable to explore a bit and I feel that time investment has been getting more and more manageable. But, in regard to Ollama and the negative feelings people have around it, I would just add that there are some [very specific reasons](https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing_ollama_isnt_just_a_pleasure_its_a_duty/o3lcjz5/) that people feel that way if you care to look into it more.
View on Reddit #81005047

CatEatsDogs@reddit

Totally agree. I can install and run llamma.cpp but I don't like to build it constantly by myself and remembering all these parameters to launch a lmm. And does it have a "pull" command? It's like a ready-to-cook product vs ready to use hamburger.
View on Reddit #80966824

DeepOrangeSky@reddit

>Ollama is easy and it works great Not for me it wasn't as someone who came into all this a few months ago as a complete and total beginner, as someone who doesn't know anything about how to use computers for things besides checking email, browsing youtube, etc. I.e. no clue what JSON files are, or templates or how to use command lines or any of that kind of stuff. Slowly starting to learn, but only like 1% of the way so far. I really wish I had started with LM Studio instead of Ollama, tbh. The reason I started with Ollama was when I was browsing on here, people basically said what you said in the threads, that it was the beginner friendly easy one, and they said that LM Studio was the other beginner/easy one, except that one was closed-source and Ollama was open-source or something like that. So, I figured in that case I would start off with Ollama. The only problem was, as soon as I started trying to acquire models directly from huggingface to try to make sure I got the exact quants of whatever obscure fine-tunes/merges I wanted to try, and save them to my external hard drive, it was this whole nightmare with blobs and modelfiles, etc, and having to ask Gemini how to point Ollama to my drive and use some weird two-terminal thing where I served models to myself (or whatever the terminology would be, I dunno, I felt like I was a golden retriever trying to type hacker stuff like in that 1990s move Hackers, and pressing keys with my paws. It fucking sucked, dude. And if you download the model with ollama with the pull command, then if you want to move it off your computer to your external storage to make room for more models, then it'll break Ollama, because you aren't allowed to just move models around I guess, or else it shatters the blobs like some delicate glass vase and you have to like, I dunno, delete all your models that you ever pulled with ollama onto your computer all off your computer and delete ollama and restart from scratch. And if you just download a GGUF from huggingface, and then try using the FROM thing to create a text file and then use the ollama create command to turn it into an ollama modelfile to try to use it via ollama, then if you don't know about the that list of template stuff and params/prompt/stops etc stuff that you need to go find somewhere and paste somewhere, your model either runs super weird or doesn't work properly at all (which might sound like a rookie mistake, but well, yea, like I said at the start, I was a complete beginner, so that didn't go so great either). Just started trying LM Studio out very recently the past week or so. I'm realizing now like, oh shit, I fucked up baaaaaad by starting with Ollama, lol. I should've just started with LM Studio. I was paranoid that since it is closed source that means they can just install some like, I dunno, government backdoor or some shit to make sure you didn't invent a quark star in your garage by asking DeepSeek Speciale how to do quark-fusion or some shit (lol), so I was like "fuck closed source, the men in black are going to wipe my memory with one of those silver eyeball-tampon things! Noooo!" and things of that nature. But yea, I should've just started off with LM Studio tbh. Also, I didn't even realize apparently Ollama's GUI isn't even open source anyway (not sure if that's correct, but I think so, right?) although I was ironically using the Terminal to run Ollama even though I'd never used terminal before, since I hated the GUI app thing so much. I dunno, I'm sure there is some person out there who likes the modelfile system for some reason of some sort (supposedly it is good for "dev teams" or something? I dunno, I'm just some random individual noob, no, not really sure). But, yea, I didn't like it at all, and not because it was too nice and clean and easy, but the exact opposite, I feel like I probably could've spent the hours and hours I spent asking Gemini how to do all the annoying shit I had to figure out how to do on Ollama to just learn how to use llama.cpp instead, and then I'd be further along by this point instead of just now starting basically back to square one with a way better option in LM Studio. Although looks like since the closed source aspect was the one thing I don't like about LM Studio, maybe even that last downside will be done with if this unsloth thing ends up being good, since I can just switch to using that instead. Well, with Unsloth the main thing I am most excited about is the training-models-made-easy thing, if that ends up being legit. This whole entire time the past few months ever since I first got into all this, I've been wishing I could figure out enough about computers and become computer literate enough to be able to start doing some fine-tuning experiments on models, and have been super impatient and frustrated about not being able to do it yet, since I suck at computers. So, if that thing allows me to just click some buttons and start messing around with fine-tunes, that would be the coolest shit ever. So, pretty excited for this thing.
View on Reddit #80960478

huffalump1@reddit

Nah man I agree. If you like it, and it works for you, that's good enough *(Personally I prefer more control over models/quants and settings, plus I've had worse performance with ollama, and I usually use a different web UI anyway so llama.cpp is better for my use case)*
View on Reddit #80949072

thrownawaymane@reddit

See the comment above yours for a good summation.
View on Reddit #80948701

NoahFect@reddit

Exactly. The LLM *is* the shell.
View on Reddit #80934137

tiffanytrashcan@reddit

https://preview.redd.it/spys1kehfrpg1.jpeg?width=1080&format=pjpg&auto=webp&s=f94513e2f69c1b7e05e44e8c02ee5a33403eb97a
View on Reddit #80969320

No-Refrigerator-1672@reddit

Well, given that OpenWebUI now has OpenTerminal project that gives LLM direct control over a shell - you are literally correct.
View on Reddit #80943607

The_frozen_one@reddit

Why do people act like they are mutually exclusive? OSes of all types implement services as a distinct type of program because they represent a useful paradigm. You *could* manually run openssh or dropbear when you want enable ssh access on your computer, or you can have systemd/launchd/rc run the ssh service and never think about it like how 99% of the world does it.
View on Reddit #80959073

droptableadventures@reddit

Also ollama users: - "Why is it doing weird things once my prompt gets over 4096 tokens" - "Why does it sometimes run really slow and not use my GPUs?" - "It says I'm running DeepSeek but I thought that was a much bigger model" - "Everyone says that model is good, but the output seems to be total garbage" - "Why is my hard disk full? Where did all these downloaded models actually go?"
View on Reddit #80944778

citrusalex@reddit

Doesn't it also have its own separate backend that is only partially based on llama.cpp (i think gguf parts)?
View on Reddit #80922768

deepspace86@reddit

The whole thing us basically docker engine refactored to use llama.cpp
View on Reddit #80956779

citrusalex@reddit

What does docker engine have to do with it
View on Reddit #80967077

deepspace86@reddit

The group that built it also built ollama, so if you understand how docker engine works, you have a pretty good understanding of how ollama works too.
View on Reddit #81016092

relmny@reddit

[https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing\_ollama\_isnt\_just\_a\_pleasure\_its\_a\_duty/](https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing_ollama_isnt_just_a_pleasure_its_a_duty/)
View on Reddit #80973448

tmflynnt@reddit

Thank you for posting that. Here is also the direct link to Georgi's post: [https://github.com/ggml-org/llama.cpp/pull/19324#issuecomment-3847213274](https://github.com/ggml-org/llama.cpp/pull/19324#issuecomment-3847213274)
View on Reddit #81005648

relmny@reddit

"That was in the past, for x months is not and is all ollama now" that's what people here kept saying, meanwhile: [https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing\_ollama\_isnt\_just\_a\_pleasure\_its\_a\_duty/](https://www.reddit.com/r/LocalLLaMA/comments/1qvq0xe/bashing_ollama_isnt_just_a_pleasure_its_a_duty/)
View on Reddit #80974080

BreakfastFriendly728@reddit

mlx: in the same way i was ignored
View on Reddit #80918896

Craygen9@reddit

Agree, they should have said average user. To be fair LM Studio makes it easy to download and evaluate models and settings before deploying on llama.cpp
View on Reddit #80915618

inevitabledeath3@reddit

Yes. I can work with llama.cpp directly but why bother when LMStudio exists and has helpful features like estimating the RAM usage when playing with different loading parameters. There is also the whole LMStudio link thing I find quite helpful. I could setup reverse proxies or Tailscale myself and manage API keys. It's much easier to just download LMStudio on a second machine and setup the link, and that also gives me an easy way to load and unload models remotely as well.
View on Reddit #80921506

No-Refrigerator-1672@reddit

Actual reasons: 1) LM Studio, being a wrapper around llama.cpp, is slow. Really slow. You don't feel it when running chats, but the moment you want your AI to do deep research, edit a large document, do agentic coding, give a summary about hour long video, etc - you'll find out that waiting half an hours for the task to complete with errors just to rerun it isn't very fun, and very quickly learn how to use vllm. 2) LM Studio is too local. By hosting OpenWebUI, I get to use my personal AI from my smartphone, my tablet, my job laptop, my gaming pc, as well as share it with friends offsetting some of the hardware costs, all without carrying around the beefy PC that can actually do AI.
View on Reddit #80944164

fastheadcrab@reddit

Using vLLM is the main reason but it can be a pain to setup, especially if you've never seen a console before in your life.
View on Reddit #80946256

No-Refrigerator-1672@reddit

Maybe, if you're on Windows. If you're on Linux, it is actually easier to setup than llama.cpp: it gets installed with one cli command; and you don't even need to understand lauch options, vllm docs have Ai chat companion that uses RAG over docs and github and will help you set up your launch parameters with zero prior knowledge. P.S. oh, and llama.cpp actually requires console too.
View on Reddit #80946506

inevitabledeath3@reddit

This is a lie. VLLM has quite specific requirements. I guess you could use the docker container instead, but even then you can spend forever messing with parameters just to get it to fit the model in your GPU. The default config is not VRAM efficient at all. SGLang might be better, but it has it's own issues. Generally speaking these tools are made for the big players, not for running a few models at home.
View on Reddit #80950028

No-Refrigerator-1672@reddit

I'm answering only to a single message to not split the conversation. Yes, vllm is more vram intensive. It pays for itself with unparalleled speed. I'm runing all of my models on it, and I have published intesnive comparison of performance for all possible scenarios [here](https://www.reddit.com/r/LocalLLaMA/s/hMTXaBKZDp). It is anywhere from 2x to 5x faster than llama.cpp on Ampere for single request. You might get bad result just because you're trying to get cpu offloaded model into vllm - it's designed to have the full model in vram, this alonhside having a Volta or newer GPU is the only real requirement to get it running.
View on Reddit #80964965

inevitabledeath3@reddit

Thought I would reply after trying VLLM some more. You are right it is indeed faster in many cases, though I think there are some model architectures and edge cases where ik_llama.cpp or even llama.cpp is still faster. Talking with you was actually the push I needed to practice using tools like VLLM and SGLang.
View on Reddit #81831028

No-Refrigerator-1672@reddit

Hi! Thank you, I love to see that my conversation efforts push people to learn more. From my experience, I'd say that llama cpp or ik_llama is faster for extremely short sequences (less than 1k tokens), which I tend to dismiss as this is too rare of a scenario, they are faster for unsupported quants: i.e. nvfp4 for previous gen cards, and they handle cpu offloading better. I would also recommend you to install litellm (the one that's shipped as docker container), as it'll allow you to measurw how long are your real prompts, including system messages, see the details about ai requests made, and gather usage statistics that is split by different apps you have. It'll allow you to understand your specific workload better.
View on Reddit #81845154

inevitabledeath3@reddit

Nope, model easily fit in VRAM using llama.cpp.
View on Reddit #81070096

No-Refrigerator-1672@reddit

If it fits with llama.cpp, it also fits with vllm. The only real difference is in KV cache handling, vllm mostly forces fp16 cache with quantized models, while llama.cpp allows to quantize cache easily; and vllm will take at most 1 gb extra for cuda graphs and compute buffers.
View on Reddit #81070400

inevitabledeath3@reddit

As I said it fit fully in VRAM even in vLLM. So what's your excuse for it being slower?
View on Reddit #81070757

No-Refrigerator-1672@reddit

There could be only one reason: you failed to read the docs and deploy the model properly. I have given you a link to my tests, with comprehensive numbers and exact cli commands to launch each engine, you can study it to learn how to do things.
View on Reddit #81071008

inevitabledeath3@reddit

I've looked at your tests. You aren't using the same model or architecture I am looking at. I can also see instances in your tests where llama.cpp is faster anyway, although it's only a few tests where this is the case.
View on Reddit #81071165

No-Refrigerator-1672@reddit

Yeah, it's faster for sequences under 4k tokens, which is shorter than system prompt for like any AI tool. This is irrelevant. It is also faster for mxfp4 in some scenarios, which is unsupported on Ampere. As GPT-OSS is basically the only model that uses this forman natively, I'm not accepting that as llama.cpp being universally faster. >You aren't using the same model or architecture I am looking at. I have tried vLLM and llama.cpp with most popular models that were released since November 2025 and fit inside 40GB VRAM. vLLM always wins; the shape of the graph stays the same, only the vertical axis changes.
View on Reddit #81071561

inevitabledeath3@reddit

Yeah so November 2025 was a while ago now. Is it faster in Qwen 3.5? How about GLM 4.7 Flash or Nemotron V3 or LFM 2 24B A2B? Also I was looking at LM Studio GGUF Vs vLLM AWQ since my understanding was that AWQ was better supported in vLLM than GGUF format was and was probably faster anyway. Not having KV Cache quantisation would be a serious limitation. I know there is FP8 available but I am not sure that will perform well on a RTX 3090.
View on Reddit #81071839

No-Refrigerator-1672@reddit

Read carefully, >with most popular models that were released **since** November 2025 Yes, it is faster for sequences over 8k for Qwen 3 VL 30B, 32B, and Qwen 3.5 27B, 35B. It is faster for GLM 4.7 Flash. I did not try nemotron or LFM. >LM Studio GGUF Vs vLLM AWQ vLLM Docs actually state that GGUF isn't optimised yet (and, looks like it won't be ever, there's no need). You should use AWQ if you're running vLLM. >Not having KV Cache quantisation would be a serious limitation. This is questionable. First of all, vLLM does support KV quantization down to 4 bits; but when running quantized models, you're supposed to provide calibration coefficients. The broader picture is that any KV quantisation degrades performance of the model, so you need to offset this back by computing the corrections. llama.cpp quantizes cache without this, and you're hurting yourself without knowing about it. vLLM provides tools for you to create a specially quantized model that can to both weight and KV quantization preserving as much intelligence as possible, you can find those prepared models by "W8A8" or "W4A4" keyword on huggingface.
View on Reddit #81073465

inevitabledeath3@reddit

I think part of the issue I had was trying to do tensor parallelism without NVLink. Now I have an NVLink bridge it seems to work much better. Getting 60-70 token/s with Qwen 3.5 27B which is much faster than LM Studio. Will have to try some more models. How well does vLLM work with GGUF? Or is it better to just use AWQ?
View on Reddit #81124030

No-Refrigerator-1672@reddit

Tensor parallel workload hits inter-gpu bandwidth very hard. It does, indeed, require NVLink, or at least full PCIe x16 to each of the card, which never happens on consume motherboards. vLLM docs officialy [state](https://docs.vllm.ai/en/stable/features/quantization/gguf/) that GGUF is unoptimized, and that GGUF does not support loading vision encoder. I exercise "AWQ or nothing" approach when working with vLLM. It also decently supports GPTQ which is an older method, you'll find a lot of 1+ year old models in it, should you need to run one.
View on Reddit #81129527

inevitabledeath3@reddit

I had full Gen 4 X16 thanks to it being an Epyc board rather than consumer, but I guess that wasn't enough.
View on Reddit #81138296

inevitabledeath3@reddit

Fair enough that makes sense. I think I will try again at some point, maybe I was doing something wrong. I did not realize that llama.cpp had such issues with KV Cache quantisation. I remember trying vLLM and it taking forever to even get a model running using it without having out of memory errors, then to find the performance was visibly slower than llama.cpp or LM Studio I basically just gave up. IMHO someone should make something like LM Studio or Unsloth Studio for vLLM. All the easy to use and setup tools are llama.cpp based. I was intending to deploy SGLang, vLLM, or TensorRT LLM at work on some DGX Sparks as we are intending to run some small to medium models at scale and need all the performance we can get for that. I just didn't think it made sense to do that on a home setup.
View on Reddit #81078610

fastheadcrab@reddit

I'm referring to LMStudio lmao
View on Reddit #80946679

inevitabledeath3@reddit

I've tried VLLM. It eats way more VRAM and was actually slower than LM Studio on my hardware. It's designed for inference at large scale with many parallel generations, not for inference at home with a limited amount of hardware and few parallel generations. It's also much harder to use than you are making out. I had initially assumed that MTP would allow VLLM to be faster even at small scale, but the actual tests proved me wrong at least for the setup I tried. I actually host my LM Studio on a separate machine and use the API with things like OpenWebUI and various AI agents and tools. It even has a Daemon version specifically for servers and headless setups. So it's not "too local". You just don't know how to actually use it. Skill issue. I've looked at ik_llama.cpp as well. It's a fork of llama.cpp with faster kernels for certain specific model architectures and support for a few more quantisation techniques. If you aren't using those features it's pointless using that fork. Much like ktransformers it's a fairly specialised tool.
View on Reddit #80949903

FusionX@reddit

I was playing around with local LLMs after Qwen's release. Personally, I was surprised when LM Studio ran quite a bit slower for me with its defaults, than llama.cpp. In some cases, it also ran out of memory much faster. I'm sure it has to do with some runtime parameters, but its curious that the defaults for llama.cpp, intended for intermediate users, achieved much better results than LM Studio, which is much more opinionated and geared towards an average user.
View on Reddit #80928960

shing3232@reddit

average user don't use unsloth. it's a training first software
View on Reddit #80936911

Hoodfu@reddit

For all the people running large models on macs. Llama.cpp is much slower and we can't run vllm. Lm studio is the best way to handle multiple models for mlx.
View on Reddit #80915549

egomarker@reddit

What? Llama.cpp is much slower? )))
View on Reddit #80915888

Hoodfu@reddit

yeah on mac it's much slower. mlx converted safetensors utilize all of the M\* chip cores whereas ggufs etc do not so you're losing about 30% of the possible speed.
View on Reddit #80954283

egomarker@reddit

Sigh, you have no idea what you are talking about.
View on Reddit #80978419

Velocita84@reddit

Well that has nothing to do with the GGUF ecosystem then does it
View on Reddit #80916157

Hoodfu@reddit

There are an ever growing number of unsloth releases that have been converted to mlx. As macs are increasingly utilized here, one hopes that the unsloth will also support mlx with their new tool.
View on Reddit #80954347

arkham00@reddit

try this [https://github.com/jundot/omlx](https://github.com/jundot/omlx)
View on Reddit #80944807

footyballymann@reddit

Wait for real? I don’t own a Mac but how did this interact with the whole metal thing? Mlx?
View on Reddit #80917387

BlobbyMcBlobber@reddit

I'd say advanced users probably use vllm or sglang.
View on Reddit #80933839

the__storm@reddit

I use vllm at work, but llama.cpp at home - Vulkan is very convenient.
View on Reddit #80940829

BlobbyMcBlobber@reddit

Same. llama.cpp is much easier to tinker with.
View on Reddit #80962375

atape_1@reddit

same. Going to be honest, I've heard of LM Studio, googled it once, and ignored it ever since.
View on Reddit #80914598

Eupolemos@reddit

Eh, LLM Studio was nice. You could visually see your options like flash attention, cache quantization, context length, gpu offload. Even predicts the mem usage of your settings. But now I can't update it nor fully uninstall it, and every time I say something to my model, the ethernet spikes 😐 So it's high time to move onwards to llama.cpp
View on Reddit #80919746

uncoolcat@reddit

I had a similar issue with LM Studio where I was unable to update it. You may have tried this already, but in my case to correct it: first a reboot (to ensure all LM Studio setup-related processes/files had been closed) and then manually downloaded the latest setup from their website and reinstalled it. All of my settings/models/etc were retained after the reinstall. I haven't noticed any network spikes while using it, though now I'm curious and will do a wireshark and/or procmon trace to see if similar is happening with my install.
View on Reddit #80937649

Eupolemos@reddit

Username doesn't check out.
View on Reddit #80939523

atape_1@reddit

Ethernet spikes? There goes all my fanfiction porn...
View on Reddit #80920214

iMakeSense@reddit

oh damn didn't know about the ethernet spikes. Oops.
View on Reddit #80919920

Eyelbee@reddit

Yeah it's closed source freeware. Probably has a decent ui but no reason to use it over llama.cpp
View on Reddit #80914854

Cool-Hornet4434@reddit

The only thing lm studio does right is MCP... I've not seen any other that works as smoothly with the same protection as Claude's MCP tools. But yeah,  lm studio is a closed source UI on top of open source.... also lm studio's SWA implementation sucks
View on Reddit #80916218

marhalt@reddit

It allows for a lot of flexibility. I can load models, use the backend for my own scripts, see what the server receives and send, change the model, use a small model to do something and a big model to do something else, both loaded into memory... All of it in a nice UI, with easy to see settings... I don't get the snobbery of people for good GUI tools. Not everything has to be a CLI, and this is one of those cases where I have no interest in learning the 3,200 command line parameters I need to run llama.cpp to use a MLX model or to run a model with a different context length and different parameters... The whole idea of CLI was for simple, easy to use and chain tools. Loading LLMs is the opposite of that - it needs an intuitive interface unless people are willing to invest a lot of time to master commands of 100+ characters.
View on Reddit #80934340

ldn-ldn@reddit

In the world where you need multiple runners, advanced options and features. But if you're a casual, go ahead, run llama.cpp directly.
View on Reddit #80929695

Far-Low-4705@reddit

tbf, if you are using it on the same machine you are running, a standalone, self contained app is more convenient even if you're an "advanced user"
View on Reddit #80927203

inevitabledeath3@reddit

I thought vLLM was mainly used with safetensors, not with GGUF. This may be out of date but I heard that vLLM support for GGUF was experimental compared to llama.cpp or even SGLang.
View on Reddit #80921187

j_osb@reddit

Eh. It does have somewhat decent support nowadays. Overtaken llama.cpp in speed outside of TTFT on a lot of gguf models nowadays for me.
View on Reddit #80924035

Broad_Fact6246@reddit

IYKYK! I used LM Studio as a more capable OpenClaw before OpenClaw became a thing. When I need granular HITL, it has been my go-to from the beginning. (AnythingLLM was promising but unnecessarily complicated). I load 5 to 10 MCP tools in an LMS chat and build, configure, and test entire projects, bootstrapping backends and even using Playwright to configure web UI's in some cases. LMS Tool calls are very verbose for me to interactively stop and steer agents before catastrophe, looping, or for getting on-track. Easy GUI to experiment with temperature and parameters and easily reload. Linux Natives and devs who know how to read the tool outputs can drive LM Studio very effectively and augment their dev abilities. Script kiddies who need a black box that does everything without a brain cell fired on their part won't do so well...and I wouldn't consider them "advanced users." That's only on the front-end. LMS expanded the REST API server and now allow has parallel processing. Serving up my local compute for Openclaw has been more stable in LM Studio than vLLM so far. I haven't tried Ollama because I don't need to. (Does ollama still import models as sha256 files? That was so inconvenient.) I only wish the GPU slicing and provisioning across multiple GPUs was more granular in LMS like in vLLM or whatnot.
View on Reddit #80918571

Old-Storm696@reddit

Fair point - LM Studio was more for the "average advanced user" who wanted a GUI. The真正的 hardcore folks were always on llama.cpp CLI or vLLM. Unsloth Studio having training + inference in one tool could change that dynamic though.
View on Reddit #80918542

muntaxitome@reddit

Well as far as LLM users go just merely using local llm's probably puts you in the top 1% most advanced users
View on Reddit #80916690

robberviet@reddit

OP mean ollama for sure.
View on Reddit #80916447

ProfessionalSpend589@reddit

In the same world where React and Linux are the two notable projects which something surpassed in stars…
View on Reddit #80916253

Decaf_GT@reddit

In the world where people like OP use LLMs to write their Reddit posts for them and don't realize that it's filled with Twitter/LinkedIn Hypebeast language. In general, you should disregard any post/tweet that has the word "game changer" in it. The game has been "changed" so many times now...
View on Reddit #80916159

AurumDaemonHD@reddit

Sglang - radix agentic once in blue moon i manage to get a wuant that works with some config Vllm - best community support for tp Llamacpp - cpu offloading
View on Reddit #80915411

Flimsy_DragonFly973@reddit

I dunno but I’ve been giving it a run for the last few hours. Considering it has a GUI for post training I can see something like me making my own model with nanochat and then post training with Unsloth studio. Nanochat + Unsloth Studio = MEGATRON-GPT
View on Reddit #81958253

quasoft@reddit

Is it possible to use the chat Feature with CPU only? Tried running \`unsloth studio setup\`, but says it does not support CPU only, and refuse to do one time setup (both pip package and install -e from main branch).
View on Reddit #80933496

user92554125@reddit

I recall enabling long paths being a part of the default options in the Python setup/installation on Windows. It is probably related to that, but I could be wrong. Can't check or reproduce as i'm not on Windows.
View on Reddit #81176325

jeffwadsworth@reddit

If you find out please post here
View on Reddit #80940336

Adventurous-Gold6413@reddit

OH MY GOD A WEBUI FOR TRAINING!!! Yess
View on Reddit #80915994

danielhanchen@reddit

+ Inference :) + Synthetic Data Gen + Exporting + Training + Much much much more!
View on Reddit #80919069

stopbanni@reddit

Is training like finetuning or from scratch?
View on Reddit #81175048

spaceman_@reddit

This is awesome, looking forward to AMD support to play with this! Thank you all at Unsloth for all your work!
View on Reddit #80971756

Potential-Bet-1111@reddit

Daniel I have tons of idle compute. Tell me something cool I can do with your product.
View on Reddit #80948992

power97992@reddit

Does it support mlx and mlx fine tuning and quantization and mechanistic interpretation tools ? 
View on Reddit #80933393

BillDStrong@reddit

They are working on mlx, not currently supported. I believe yes to quantization. Not sure yet about the others, downloading it now.
View on Reddit #80948820

ZachCope@reddit

Commenting for posterity in Daniel Unsloth’s shadow 
View on Reddit #80941890

dreamai87@reddit

Thanks it’s really great, I did not have any issue running, tested chat model finetuning and synthetic data generation text2sql ocr.
View on Reddit #80939377

ReasonablePossum_@reddit

Any privacy features?
View on Reddit #80932910

Finanzamt_Endgegner@reddit

you are amazing!!!
View on Reddit #80923614

Adventurous-Gold6413@reddit

Thanks so much! Been waiting for this for over a year!!!
View on Reddit #80923122

exaknight21@reddit

I <3 u yall
View on Reddit #80920689

mmkzero0@reddit

WHAT
View on Reddit #80919713

Old-Storm696@reddit

Finally! The GGUF ecosystem has needed a proper training UI for years. LM Studio has been great for inference but training was always CLI-only or required external tools. Unsloth's training optimizations (LoRA, QLoRA) combined with a real UI is going to democratize fine-tuning for everyone who isn't comfortable with the terminal.
View on Reddit #80961897

sean_hash@reddit

Having fine-tuning and inference in the same tool is nice, right now you need like three different projects to get that working
View on Reddit #80914912

emprahsFury@reddit

Mozilla has had Transformers Lab for tears now.
View on Reddit #80956236

user92554125@reddit

Pretty awesome! Had no idea it existed. Thanks.
View on Reddit #81113786

FullOf_Bad_Ideas@reddit

tbh Llama-Factory was doing it for more than a year now. They don't have dataset builder flow though. And it's probably less polished.
View on Reddit #80936098

danielhanchen@reddit

More coming soon to it!!
View on Reddit #80919079

neuralnomad@reddit

Wait no AMD or vullkan? (Yet?) *kicks rocks while everyone else plays*
View on Reddit #81046306

aliensorsomething@reddit

Interesting, if it had an installer and simple way to use I would use it over LM Studio because it has support for tts.
View on Reddit #81043344

FatheredPuma81@reddit

Nice: \--- cmake configure --- \[FAILED\] llama.cpp build failed at step: cmake configure (0m 15.4s) To retry: delete C:\\Users\\Fathe\\.unsloth\\llama.cpp and re-run setup.
View on Reddit #80964975

DeliciousMagician925@reddit

I got the same issue. Reinstalled Cmake with windows installer and no luck so far. Any suggestions?
View on Reddit #81010179

FatheredPuma81@reddit

Wait. I guess they're working on it.
View on Reddit #81041796

oh_my_right_leg@reddit

One of LM Studio's best features is the possibility to use LM Studio on a remote PC as a server and then seamlessly using LM Studio on another PC as the client. It would be awesome if you could replicate such functionality.
View on Reddit #81025917

kavakravata@reddit

To a noobie, what does this have except for being OS, than Lm studio / jan.ai / opencode doesnt have?
View on Reddit #81006600

Ylsid@reddit

I hope so, I'm tired of LM studio being the easiest way to test models while being closed source bs
View on Reddit #80996417

Trick-One7944@reddit

Just keep in mind, free is relative depending on your requirements. No world is pro up to 8 gpu and free is up to 7. So we can expect a pay to play in here for the options many of us were thinking might become point and click. That ease of use gui has a cost attached, they earned it, but just a reality check https://unsloth.ai/pricing?hl=en-US
View on Reddit #80981860

Hefty_Acanthaceae348@reddit

Very exciting, I was thinking that there were a lot of use cases were synthetic data generation (and therefore finetuning) should be easy, but finetuning isn't all that approchable. I do hope there will also be something added for easy customization of tokenizers too.
View on Reddit #80980239

jduartedj@reddit

wait this actually includes training AND inference in the same app? thats huge if the training part works well. i've been using their notebook stuff for LoRA finetuning and its already way faster than vanilla peft, so having a proper UI for it would save me so much time setting up configs manually the synthetic data gen part is what really caught my eye though. being able to generate training data, finetune, and then test the result all in one place... thats basically the entire workflow without needing 5 different tools
View on Reddit #80979977

Dazz9@reddit

Is there any performance loss when compared to running llama.cpp? What is the performance gain when compared to Ollama?
View on Reddit #80977117

redoubt515@reddit

\> Until now, LMStudio has basically been the "go-to" solution for more advanced LLM users in the GGUF ecosystem I wasn't aware. I've always seen it framed as the *entry level* solution for *less technical* people. What gives you the impression LMstudio is the "go-to" for "advanced users"? (or is this an AI generated post and AI is just engaging in empty hyperbole and buzz words?)
View on Reddit #80928505

ImpressiveSuperfluit@reddit

No idea what they mean, but there certainly is the fact that installing a software, being presented with a million models, gigabytes each, whose names tell you nothing, only to then be presented with a bunch of knobs... is already qualifying as quite advanced. It's like asking a random 14 year old to drive on the autobahn. Sure, they've been in a car before, and the autobahn is not exactly a complicated traffic situation, but they also don't know where the freaking break is, so maybe you're just in a bit of a bubble situation there. I think you're taking for granted how much background knowledge is actually required to not get immediately overwhelmed with this, seemingly simple, task. When, in reality, we're entering an era where a good chunk of people have never touched an actual desktop pc. Laptop, at best. Yea, those people are in their 20s now, shit's crazy!
View on Reddit #80966276

TechnoByte_@reddit

> installing a software, being presented with a million models, gigabytes each, whose names tell you nothing, only to then be presented with a bunch of knobs... is already qualifying as quite advanced. That's the basic skills you need to run LLMs locally though. It's really not that hard considering how many guides there are out there and since people can just ask others or an LLM for help The post specifically says "advanced LLM users" LMStudio is a closed source electron UI designed to make running LLMs as simple as possible, you don't have to deal with the commandline For advanced LLM users, just using llama.cpp or vLLM directly makes a lot more sense
View on Reddit #80970198

ImpressiveSuperfluit@reddit

Everything past openening a browser or simple app can easily be classified as advanced. This very much includes LLMs, anything past typing stuff into ChatGPT is advanced to the vast majority of people. You guys really do take a lot of stuff for granted that just isn't normal anymore. I understand that the vaguely millennial aged people, which includes myself, have a hard time picturing this, but people are already back to not growing up with computers, and have been for a long time. If it doesn't have an app that runs on your phone (or its browser), it's pretty much immediately an advanced topic. And LMStudio isn't plug and play, either. It technically kinda can be, but only if you ignore all the knobs and baubles and already know when you can do so. One OOM error and even the average somewhat advanced user is going home. Genuinely, spaces like this (and foss in general, looking at you, Linux community) would do well to remember that the vast majority of their knowledge can not be taken for granted, no matter how much they want to pretend that it can be. We're just not in the 2000s anymore. Owning a computing device no longer means that you have ANY idea how anything behind the UI works. Having said that, I understand, of course, that it's reasonable to call LMstudio an entry level software when you're talking to people that would ever find themselves here to begin with, but I'm bothered when people pretend that people are just randomly born with all this tech knowledge. You'd be shocked how some people can manage to somehow mess up heating water. Things are just really, really hard when you miss all the fundamentals that everyone just assumes you have.
View on Reddit #80971344

redoubt515@reddit

>I think you're taking for granted how much background knowledge is actually required to not get immediately overwhelmed with this, seemingly simple, task. When, in reality, we're entering an era where a good chunk of people have never touched an actual desktop pc. Laptop, at best. Yea, those people are in their 20s now, shit's crazy! You make a good point, I see where you are coming from.
View on Reddit #80969246

C_Coffie@reddit

That awesome! My only question is, will it support Strix Halo?
View on Reddit #80917945

yoracale@reddit

Inference should work, training works in Unsloth, will work on Unsloth Studio a bit later
View on Reddit #80918756

ComprehensiveBed5368@reddit

what about intel Arc gpus? inference and training
View on Reddit #80970670

Odd-Ordinary-5922@reddit

llamac++ runs on the backend so yea
View on Reddit #80918788

Legumbrero@reddit

Thanks for this guys, will definitely be trying it out but featureset and license already sounds really great!
View on Reddit #80965373

ab2377@reddit

💯👍🥺🍩🤞
View on Reddit #80964645

Just-Winner-9155@reddit

Unsloth Studio's Apache license and Llama.cpp compatibility are solid wins for open-source folks. I've been using it for smaller models—less resource-heavy than LMStudio, which is great for budget builds. The UI feels more streamlined, but LMStudio still nails multi-model management. If you're running 7B+ models, stick with LMStudio; for 3-5B or lower, Unsloth is a smoother ride. Just watch VRAM usage if you're mixing model sizes.
View on Reddit #80963834

skillshub-ai@reddit

Really excited about this for domain-specific agent fine-tuning. There are thousands of high-quality SKILL.md files on GitHub now (structured agent instructions from Anthropic, Microsoft, Trail of Bits, etc). Fine-tuning a small model on 100 of these so it internalizes the methodology instead of just following instructions at runtime could be a game-changer for local agent quality.
View on Reddit #80961919

Own-Relationship-362@reddit

This is exciting for fine-tuning models on domain-specific skills. I've been working with SKILL.md files — structured markdown that teaches agents specific methodologies — and the biggest bottleneck is that base models don't follow them consistently. A good fine-tuning UI could let you train a model specifically on your skill set. Imagine fine-tuning Llama on 100 SKILL.md files so it internalizes the methodology instead of just following instructions.
View on Reddit #80961631

bityard@reddit

Warning: If you're like me and like to maintain strict control over your machine and home directory for both safety and security reasons, you are NOT going to want to follow the installation instructions in the docs blindly! Even after you've cloned the repo and installed the dependencies, the setup script installs even _more_ things _outside_ your virtualenv, such as node/npm **without asking**. Probably best to use the docker image, or install this in a VM if you are just testing it out. (Note: I am not implying the unsloth guys have any malicious intent whatsoever, I was just very surprised to see a Python project installing all kinds of extra stuff on my computer without at least telling me first.)
View on Reddit #80954787

Wemos_D1@reddit

Hello I tried it and it's already a good start, I really like the new features that we don't find at any other places, and the fact it loads the correct settings directly makes me happy. But I have 2 things that I think might be usefull to implement: 1) Ability to link existing models that are already downloaded locally and be able to have it in the dropdown. 2) When I use a model in the chat, it doesn't use the graphic card ? I saw in the documentation it is said but why isn't it used ? Thank you again for your good job and I wish the project will be popular enough to be a competitor to lm studio.
View on Reddit #80954322

Sandyyy_9866@reddit

Interesting! Unsloth Studio could shake things up, especially since it's compatible with llama.cpp. I've been using CLBlast for optimizing operations in RAG pipelines, and it's crazy how much a good runner can improve efficiency. If Unsloth nails the UI and execution speed, it might streamline workflows for those of us constantly tweaking our models. Definitely keeping an eye on this one.
View on Reddit #80954310

fluecured@reddit

Can you use it with an older CPU without AVX instructions?
View on Reddit #80930238

HopePupal@reddit

okay i gotta ask, and this is as someone who does have use cases for CPU inference: how did you find a CPU that old? and what is the rest of your setup like? AVX started shipping from AMD and Intel in _2011_. https://www.reddit.com/r/LocalLLaMA/comments/1muwsj6/llamacpp_nonavx_processors/ apparently you're not the only one and llama.cpp itself still supports pre-AVX CPUs with the right build options but _dang_
View on Reddit #80935813

fluecured@reddit

I have an Intel Core i7 930 (Bloomfield) with 12 GB RAM and an Nvidia RTX 3060 with 12GB VRAM. Using Oobabooga's webui and ExLlama, I can run many exl2 or exl3 models up to 12B-14B parameters if there are suitable quants available. (Sadly, Oobabooga's dropping support for ExLlamav2, which has ten times as many models as v3.) The sweet spot for the 3060 is a Mistral-type 12B at 5.0bpw with 32768 Q4 context. Muse-12B and Mag-Mell-R1 flavors are astounding small models. I tried to build my own llama.cpp binary but got stuck configuring curl and Gemini was unable to help unstick me. Thus, I've never been able to load a gguf, only gptq, exl2, or exl3. ExLlama is pretty great since I can quantize the kv_cache to q4 and not notice any difference. I've saved up for a lab-worthy computer, but each time I've been ready to buy/build, this one guy does something wacky that turns everything into a confusing lottery. I'm kind of old and housebound, and that inertia makes it harder to deal with the scarcity of components. The *"I'm getting too old for this shit"* is real.
View on Reddit #80948282

HopePupal@reddit

ahh with the 3060 doing most of the work, makes sense that lack of AVX wouldn't be a total showstopper. and yeah… really tired of everything constantly being on fire for no reason. felt.
View on Reddit #80952130

DrBearJ3w@reddit

I purely use LMStudio out of convenience, simplicity and good Windows support. I tried ollama,few times, and it always surprised me that there are always some bug here and there that I should fix. Switched to llama CPP and never looked back. Unsloth on the other hand can be main hub for ALL operations. We will see.
View on Reddit #80950964

IONaut@reddit

So does this allow you to run a local API endpoint?
View on Reddit #80950468

General_Arrival_9176@reddit

unsloth studio is their training UI getting a run UI. makes sense - they have the quantize/finetune stack already, adding a playground on top is low effort. the real question is whether it competes with lmstudio for inference. lmstudio has the model discovery and easy setup thing locked down. unsloth's advantage is their quantization quality. well see if they can bridge that. i use lmstudio daily but their quantization is not unsloth level so if they ship good quants with the ui it could be a real alternative.
View on Reddit #80950051

FRAIM_Erez@reddit

How many more ‘Studio’ apps do we need before gta 6?
View on Reddit #80950027

fiery_prometheus@reddit

An apache 2 license and completely open source?? Praise the llamas!
View on Reddit #80921543

FullOf_Bad_Ideas@reddit

Unsloth Studio is AGPL 3.0.
View on Reddit #80936281

fiery_prometheus@reddit

The license in the repo says apache 2.0 [https://github.com/unslothai/unsloth/blob/main/LICENSE](https://github.com/unslothai/unsloth/blob/main/LICENSE)
View on Reddit #80944890

FullOf_Bad_Ideas@reddit

"Unsloth now licensed under AGPL-3.0? No. The main Unsloth package is still licensed under Apache 2.0. Only certain optional components, such as the Unsloth Studio UI, are under the AGPL-3.0 open-source license. Unsloth now has dual-licensing where some parts of the codebase are licensed Apache 2.0, while others are licensed AGPL-3.0. This structure helps support ongoing Unsloth development while keeping the project open-source and enabling the ecosystem to grow." https://unsloth.ai/docs/new/studio (FAQ section)
View on Reddit #80945199

fiery_prometheus@reddit

Would be great if this was more clear, a lot of people aren't going to dig around more than finding a license file in the root of the repo.
View on Reddit #80945438

FullOf_Bad_Ideas@reddit

Yeah, since it's a single repo and this information is clearly laid out mainly in a blog post, sooner or later someone will take Unsloth Studio code and unknowingly act like it's Apache 2.0 /u/danielhanchen /u/yoracale As it's apparent, licensing of Unsloth Studio isn't clear to everyone who just goes on Github and skips your blog posts, so I think you should consider increasing readability of the github repo to make it more obvious.
View on Reddit #80947159

yoracale@reddit

https://preview.redd.it/s4gorb3cropg1.png?width=694&format=png&auto=webp&s=3bb793f6f0ab663c020f9a7cfb4297fab2000685 Thank you for the feedback, if you go to our github mainpage and look at the sidebar, it says AGPL3 licensing found, also our notebooks state it as well but we'll make it more prominent. Also we put the AGPL3 license header in nearly all the file components of studio
View on Reddit #80947669

FullOf_Bad_Ideas@reddit

Yup, I noticed the sidebar, though when I clicked on the AGPL 3.0 it was not clear that this is related to Unsloth Studio exclusively. With a single repo and 2 licenses it will be hard to communicate licensing to users in an unobtrusive way. A note in readme would make it clear but it also may be too obtrusive for you. I would think that you're doing all the right calls here, yet, in this comment chain someone didn't catch it so clearly it's not 100% clear to everyone.
View on Reddit #80949793

yoracale@reddit

Thank you for the feedback, if you go to our github mainpage and look at the sidebar, it says AGPL3 licensing found, also our notebooks state it as well but we'll make it more prominent.Also we put the AGPL3 license header in nearly all the file components of studio https://preview.redd.it/uusw3eeoropg1.png?width=634&format=png&auto=webp&s=f4511ee4085cd6a85c9d386b6a08538aedf392a6
View on Reddit #80947683

PM_ME_UR_COFFEE_CUPS@reddit

Apache license???? Sweeeeeeeet
View on Reddit #80946945

ghulamalchik@reddit

Gobbless
View on Reddit #80946517

alew3@reddit

LM Studio = inference, Unsloth Studio = training
View on Reddit #80945831

rorowhat@reddit

Can it also finetune locally?
View on Reddit #80943284

Best-Echidna-5883@reddit

Please fix the apparent contradiction regarding running this with CPU only for Windows. Thanks. The Windows installation requires an NVIDIA GPU.
View on Reddit #80940690

jeffwadsworth@reddit

It states that it works for cpu only for chat but how do you install the Studio? Perhaps not possible.
View on Reddit #80940210

jeffwadsworth@reddit

Wow. This is a dream come true. Finally something akin to that Mac Inference app.
View on Reddit #80939460

z_latent@reddit

So glad it's open-source. Thank you Unsloth team :)
View on Reddit #80939406

soyalemujica@reddit

Does this mean that it can run NVFP4 in Blackwell ?
View on Reddit #80915803

danielhanchen@reddit

Not yet, but soon!
View on Reddit #80919093

Secure_Archer_1529@reddit

So for the dgx spark nvfp4 is a no go atm. Have you solved this with nvidia or is it still the same issues carrying over to unsloth until nvidia finally fixes it? Unsloth studio sounds amazing though! Thanks for all the work you guys put into it.
View on Reddit #80922741

inevitabledeath3@reddit

There is actually a VLLM fork with proper NVFP4 support. Here: https://github.com/jleighfields/vllm-dgx-spark I haven't gotten to try it myself yet, but if this works like they say it does then it's a viable solution.
View on Reddit #80939310

drink_with_me_to_day@reddit

What to use in Windows? I've tried vllm, lm studio,but they fail to install or load models
View on Reddit #80917578

NoahFect@reddit

This works in Windows (I have t running now), but you probably don't want to set it up yourself. See my other comment.
View on Reddit #80939021

SourceCodeplz@reddit

Took Codex about 51 minutes to install it on a fresh Windows 11 with a SATA SSD. https://preview.redd.it/61gn9f4x3opg1.png?width=1267&format=png&auto=webp&s=09963f8df3411e57dbc59a27e21717c6ea57dfe7
View on Reddit #80938396

Certain-Cod-1404@reddit

How are you guys so consistently good at everything you do ? you guys are a blessing to the open source community, thank you so much !
View on Reddit #80937882

dreamai87@reddit

I really like the way they brought finetuning so easy for people on consumer level hardware and always shares the colab notebook. No doubt for gguf bartwaski and unsloth are always the first choice, though I appreciate others those are contributing in this space, kudos to all of you 👏 My first preference is always and will always be llama.cpp. Sure now unsloth studio will be the another one that allows finetuning/validating/inferencing models. Its great to see how everyone pushing the boundaries and making this stack accessible
View on Reddit #80915188

toothpastespiders@reddit

>I really like the way they brought finetuning so easy for people on consumer level hardware and always share the colab notebook. Same here. Personally I just prefer using axolotl, but those notebooks are an amazing resource to help people understand how training works.
View on Reddit #80937878

ItankForCAD@reddit

The blog post is confusing. It states that chat inference is supported by llama.cpp and transformers. However, the installation section mentions that AMD, Intel and etc support is coming soon. Is the upcoming support aimed at training or inference as well? It seems strange that only the cuda version of llama.cpp is built at installation. Building the Vulkan backend would allow all gpus to work for inference at least. Can an external llama-server instance be pointed at unsloth studio?
View on Reddit #80918293

yoracale@reddit

For AMD and intel it's related to training. Our main unsloth package works with AMD and intel, but not Unsloth Studio. Currently inference should work for AMD devices but haven't verified
View on Reddit #80919608

seanthenry@reddit

So llama.cpp will be build for ROCM if available?
View on Reddit #80937102

shing3232@reddit

LMstudio is just front end for inference and Unsloth is training. How could you get it mix up?
View on Reddit #80936870

mrkstu@reddit

Will give it a go on my Windows box, but would love it if you add proper Mac MLX/training support and be able to move over completely.
View on Reddit #80935881

HopePupal@reddit

u/danielhanchen happy to see AMD support for inference listed, even if it's "coming soon"! are we likely to see AMD support for fine-tuning in Studio ever, or should we plan to keep using the unsloth amd Python branch directly?
View on Reddit #80935074

RandumbRedditor1000@reddit

I hope this means more people will start fine-tuning
View on Reddit #80934347

hugthemachines@reddit

"gamechanger" That's a red flag.
View on Reddit #80931078

ilintar@reddit (OP)

You are absolutely right! 👍🚩
View on Reddit #80933217

mtomas7@reddit

If US Studio can also run safetensor models, is there a particular reason to use GGUF models?
View on Reddit #80930618

RandumbRedditor1000@reddit

Inference speed and vram requirements
View on Reddit #80932094

jabr7@reddit

Am I reading this right? It also has an UI for fine-tuning? Is it PEFT or?
View on Reddit #80929108

SGmoze@reddit

would it be possible to connect to colab notebook and trigger training from this?
View on Reddit #80928974

separatelyrepeatedly@reddit

Does it havee API support?
View on Reddit #80928700

Adventurous-Paper566@reddit

J'espère qu'on pourra attribuer une configuration GPU pour chaque modèle.
View on Reddit #80928436

unspkblhorrr@reddit

I am a no-code local AI user so bear with me, learning this stuff like as I go... is this preferable to Open webUI? Does it have RAG?
View on Reddit #80926736

revilo-1988@reddit

Ich mag LmStudio jedoch find ich super wenn es weitere Mitbewerber gibt
View on Reddit #80926370

Zestyclose_Yak_3174@reddit

Awesome! Hopefully becomes a solid replacement for LM studio on my Mac
View on Reddit #80925905

RevolutionaryLime758@reddit

Because it has studio in the name? Is that your thought process? Lmao.
View on Reddit #80925685

BringMeTheBoreWorms@reddit

Would be nice if they could do a llamacpp compile for Vulcan and rocm instead of just cuda. It’s not a hard thing to add basic support for these days. The nvidia monopoly needs a bit of a kick in the nuts
View on Reddit #80925151

Nodja@reddit

If the unsloth team wants this to succeed they have to make it piss-easy to install for the average user. This could be a good gateway app for getting people into training models and stuff, like A1111 did for lora trainers back in the day. Lucky for them uv exists and they can just bundle the code + uv and let uv do the hard work of installing python and setting up the venv then bootstrap the app the same way they're doing it now. I have lm studio installed and use it's local server to semi-automate certain tasks, lmstudio lets me easily load/unload models and have TTLs/etc. + the models come with sensible defaults (I only really change the context size) and makes it painless to try out new models without fucking around with llama.cpp params, hopefully unsloth studio will reach parity and I can get rid of the only closed source LLM software I have installed.
View on Reddit #80924273

_hephaestus@reddit

I think here I pretty much just see lmstudio for mlx inference, and oMLX has replaced that for me. Really curious about mlx training going forward though.
View on Reddit #80923832

DMmeurHappiestMemory@reddit

This is bonkers.
View on Reddit #80923303

ArsNeph@reddit

This is genuinely amazing, props to Unsloth team for single-handedly propping up the .gguf and fine-tuning local ecosystem! I'll definitely give this a try and provide feedback when I get a chance!
View on Reddit #80923258

Smashy404@reddit

Does it support voice?
View on Reddit #80922427

xXprayerwarrior69Xx@reddit

fuck yes
View on Reddit #80922328

the_real_druide67@reddit

But is the GUI really what matters here, or is it the underlying engine? For example on Apple Silicon, you can wrap llama.cpp in the prettiest UI you want : native MLX will still be significantly faster on the same model and quantization. The real competition isn't between GUIs, it's between inference backends.
View on Reddit #80922273

Danmoreng@reddit

React frontend python backend 😩
View on Reddit #80922257

Significant_Fig_7581@reddit

Oh I'm in
View on Reddit #80916072

danielhanchen@reddit

:)
View on Reddit #80922127

Technical-Earth-3254@reddit

Damn, love to see it
View on Reddit #80917695

danielhanchen@reddit

Thanks!!
View on Reddit #80922121

sine120@reddit

LM Studio has been my "I'm lazy and want to try this" solution that I find a little easier to test MCP's. If I'm actually minmaxxing my inference speed and want the bleeding edge of new models, I have to use llama.cpp. I love llama.cpp, but I hate messing with the commands, guessing and checking VRAM usage, etc. If someone else can come along and make it easier for me to get the performance of latest llama.cpp, host a chat page/ web search mcp's, OpenCode endpoings, etc, I'll be a happy man.
View on Reddit #80921171

Helicopter-Mission@reddit

Agreed. It’s much easier to test different models and configs. vLLM your model when you have something you want to run for a while and already know how you configure it. If I’m model hopping, I’d just lm studio.
View on Reddit #80921823

Potential-Leg-639@reddit

New super power unlocked by Unsloth. Congrats!
View on Reddit #80920748

CalvaoDaMassa@reddit

A competitor? Man, I think that Unsloth Studio will become the #1 tool easily.
View on Reddit #80920733

Relevant-Audience441@reddit

just another webui ¯\\\_(ツ)\_/¯
View on Reddit #80913761

ambassadortim@reddit

Does this one have a web interface?
View on Reddit #80920193

TopChard1274@reddit

isn’t LMStudio running over Llama.cpp? How’s adding another Llama.cpp wrapper into the mix a “game changer”?
View on Reddit #80917664

Odd-Ordinary-5922@reddit

because lmstudio is closed source
View on Reddit #80918650

Pro-editor-1105@reddit

holy shit this is amazing
View on Reddit #80918100

Right-Law1817@reddit

This is awesome.
View on Reddit #80917727

qwen_next_gguf_when@reddit

Go-to my blad azz.
View on Reddit #80917095

egomarker@reddit

It's not a competitor for LM Studio, this one has emphasis on nvidia and training, LM Studio has emphasis on MCP support and good built-in api server.
View on Reddit #80916325

Specter_Origin@reddit

This is awesome, i just hate the closed source nature of lm-studio
View on Reddit #80916281

EffectiveCeilingFan@reddit

It isn't trying to compete with LMStudio tho. The ability actually run LLMs is just one of the features. It's moreso a model training workspace.
View on Reddit #80915741

ilintar@reddit (OP)

It might not end up trying to compete with LMStudio, but it's ability to run GGUFs combined with Apache License will make it an automatic competitor.
View on Reddit #80916131

_raydeStar@reddit

This one looks like it's focused on sanitizing training data and running it. In that case it's not quite apples to apples comparison. Definitely interested in playing with it. I've only ever trained image models.
View on Reddit #80915973

Zemanyak@reddit

Oh, interesting. I'm gonna try it.
View on Reddit #80915719

krileon@reddit

Another webui.. weeooooo.. pass. Waiting for the day LM Studio or something like it also implements image generation. Still lacking an all-in-one tool that's just a simple install and run.
View on Reddit #80915582

willitexplode@reddit

Cool, I've been seconds away from ending my laziness and ditching lmstudio for more cli work. Now I don't have to, yay!
View on Reddit #80915168

netikas@reddit

More like good ol' text-generation-webui
View on Reddit #80914674

Emotional-Breath-838@reddit

Oh hell yes. Do want.
View on Reddit #80913755