Lm Studio is very good. I have used it since its inception, and although I do not really agree with the new changes that were made to its interface, it is still an excellent option. What I really don't understand is how someone can use Ollama. Look, I have analyzed it from all points of view and I don't understand why someone would use a program like that.
Because when ollama is set up it will run in the background and launch on startup and you can just assume it’s there when your client applications need to call it.
Probably? Look, don’t get me wrong, I love LMStudio and it is my main drive for downloading and trying new models and sometimes I’ll start it server as a quick endpoint for an app.
But for stuff I want available all the time, I get set up on Ollama. It seems really smart about being able to load and unload models on demand, so with LM you’re loading the model yourself when you need it, and ejecting it when you’re done to manage memory. And if your app was connected to it, and expecting a different model, it would fail until you go into LM and manually load it.
Depends on your use case, I use it to spin up models and then I connect to the models with other tools and in that scenario I dont need all the fluff of lmstudio. sure I can do that with lmstudio but why have all the extra fluff when I'm up and running with 1 terminal command
The answer is simple: People don't want all-in-one software. Especially companies, teams, groups etc. don't want all-in-one software. They want to host one more models and expose them via an OpenAI compatible API and then connect clients of all kind to it instead of having a full blown gaming PC under each desk.
Well, there are several reasons:
1. Ollama runs as a server and you can connect different GUIs and tools, such as excellent Open-WebUI.
2. Open-source vs closed source.
3. Licensing. Ollama has MIT license, LM Studio requires you to contact them if you want to use it for work purposes.
Yes, I guess it's because that's the most famous open source one because it created early. Even a lot of developers integrate ollama into their projects first. I used ollama first because it's popular. Then I tried others and settled down with LMS.
(not proven personally and i dont have the data but) could it be that the "interface-less" console base ollama uses less system resources than lm studio? dont get me wrong, i love lm studio showing me all the settings and configs running the model
guess what im saying is if i know what im doing, i'd use the least resources required app to run the model so those resources can be allocated elsewhere (tts, stt, rag etc)
I recommend trying LM studio before all others because it runs GGUFs faster than any other program I've seen (tied with ollama but ollama is annoying to set up if you're not techy). so you'll know how fast your computer can run them when it's set up right.
I started with Oobabooga and just accepted the speeds of GGUFs were 4-9t/s. Nope. LM studio gets them up to **40t/s**.
Comparing LLM studio and Kobold is a fool's errand. You don't know what code is being used. One may be 15% faster, but still 80% slower than possible. You need to load the model yourself with barebones code to get the optimal estimate. Not claiming that's an easy task if you aren't an ML engineer, but it's the only correct way at this time.
Quite the opposite but to be honest there are a few things that you need to set up or check with Kobold. First, set GPU layers to -1 so it shows max layers then set it manually. (according to your VRAM)
Go to tokens and enable flash attention. Did you get the cuda12 exe version? (if you have a newer Nvidia GPU you should)
ps. don't mind the downvotes, people thought it was an opinion when you are just trying to learn
They might mean CPU only, in my CPU only laptop it runs way way faster then ooogabooogs does, a ton faster. Loads models instantly rather then taking forty seconds and responds so much faster
I agree, LM studio was faster than other like msty imo. I run on very high end hardware and my best performance was up to 110 token/s for a model on gpu (but was halved on others). Now I know these programs are frontends product so I am not saying that there are not capable of same performance.
But out of the box, LM studio allowed me a faster result.
Used to be a pain. If you haven't seen it, you can now add most hugging face models straight from their page to Ollama. Still prefer LM Studio over Ollama, but Msty above it.
I have a bunch of models downloaded already and making a model file is freaking annoying compared to every other app’s “tell me what folder to use and then click a button to load the model”
No, definitely not. I offloaded all 33 layers of the model I was trying in oobabooga, and all 33 in Lm studio, and used the exact same model file - not even a copy, I literally used the same file.
I believe that oobabooga installs llama.cpp differently than with all the optimizations that Lm studio does. It’s the only thing that makes sense.
Or maybe you are comparing two different values.
When i was using ooba before, it was giving t/s based on the total time which include prompt processing while lm studio give the t/s based on the generation time only which doesn't include the prompt processing.
That said, i noticed that [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) was a bit slower than normal llama.cpp but clearly not 4x less.
No. I’m talking about both cases where there’s no prompt to process other than “write a giant paragraph” AND cases where there’s a long context. Both do the same.
I think you don't get what i'm saying.
Ooba display on the last line
something like "Output generated in 3.63 seconds (23.99 tokens/s, 87 tokens, context 1050, seed 593086777)" if you compare this value to the one displayed on LM studio UI it's wrong.
You need to check eval time on the console for both something like "eval time = 1371.41 ms / 87 runs ( 15.76 ms per token, 63.44 tokens per second)"
And i just tested to confirm if it was still the case and yes it does still display the total.
Btw just compared by curiosity and the performance between both is actually close, even closer than i thought it would be.
You are actually correct!! Holy shit, I've never been more blatantly wrong in my life. I should check my condescension because that was absolutely not warranted.
I greatly appreciate you. After running about 5 tests each, the difference is about 1.25x, so a 1 minute response in Ooba would only take about 48 seconds in LM studio. That is faster, but nowhere near as fast as the token values implied. I was wrong about everything, including being able to truly tell the difference between token speeds without actually measuring it.
I gave it a long prompt, turned the temp to 0, and then told it to just repeat the prompt, and it did, and then I timed that. Really neat, again I was COMPLETELY wrong, I WAS stupid, and you were 100% right and I love you. Forgive me? :')
Bro you don’t even know. I was lowkey so mad. I was like “HOW DARE THEY THINK THAT THE HOURS I SPENT ON THIS WERE WRONG”, I just didn’t want to accept it LMAO
But I try (keyword try lol) to be a good scientist. And I’m actually really happy that you corrected me. I was so mean I need to work on that 💀 I took it like a personal attack because I spent hours and hours trying to figure it out and never did. seriously thanks lol
I did not 'downvote' your comment, but I will take the liberty of doing so for this one since it implies a distasteful conspiracy theory and is a comment one might expect to hear at a café counter. :)
Are there any particular features that would be ideal for you?
LM Studio is a gentle entry into fetching and running LLMs locally with your own docs for more recent and specialized information. It's fine for casual use. But it loses its lustre when more intensively with agentic calls and multiple sources from a large library of proprietary documents. You may want to try GPT4ALL, AnyLLM, or OpenWeb UI, but each program has its own strengths and quirks.
Surprised that nobody mentioned [https://msty.app/](https://msty.app/) yet. I started with LM Studio, also tried [jan.ai](http://jan.ai) and settled on Msty.
I've been using msty for a few months and have generally very good experience with it.
Question though: For running local models on laptop without GPU, would any other packages allow for better optimization of these models than msty could?
Tried. It's running ollama for local LLM. I hope they can run llama.cpp instead of ollama. Ollama changes model file extension to none. It's too annoying. I don't know why they do it.
Seconding Msty. I run one on my desktop, while connecting another Msty from my laptope. Shared workspaces are nice too, including RAG, and the detailed analytics.
Their site looks so......hahas
I prefer streamlit but it needs all that accounts paid plan which is annoying, I hate everything nowadays trying to build extra ecosystems, can everything just plug-in-play? you want A, they need you to install B, C, D, E, F....
While these are good, they fill an extremely different niche
Where lm studio shines is the single install executable and you're up and running like a normal Windows/Mac application, no mess no fuss no barrier of entry
Is it the best? No probably not, but the fact that it's extremely good while also being extremely easy makes it very valuable
Yeah sure I'm just pointing it out for others, and if they started with LM Studio and they're on windows, they're *likely* not the kind of user to figure out these more difficult tools, as great as they are they're just too different and too difficult for the average user
No offense to anyone who uses lm studio of course or who is an "average user"
Yeah I think it’s pretty good
I added a tutorial to run LM Studio local models on any chatbot
https://interworky.gitbook.io/interworky/getting-started/lm-studio-self-hosted-llm#running-locally-note-for-locally-hosted-servers
[Anything LLM](https://anythingllm.com/) is pretty good for simple plug n play (it's an Ollama wrapper).
I switched from Jan (too much jank) and Anything LLM (the RAG was pretty bad) over to [Nvidia's ChatRTX](https://www.nvidia.com/en-us/ai-on-rtx/chatrtx/). If you have an Nvidia card, it's the most performant option on the platform. It's also probably the only one of the many listed on this page that's actually a Windows-focused app and not an afterthought port.
Are there alternatives that also have a Python code interpreter built in, that can execute code and include it in the chat? It would be great to have an offline alternative to the ChatGPT data analyst.
I use open webui and continue.dev for ollama.
Continue is a vscode extension so it's directly in your IDE.
Open webui has artifact (like claude artifact or chatgpt canvas) for html+js. For python you can install a function (right now it's top no 2 function)
Depends on what you want to do. If you want to run a local server your services can use, I much prefer Ollama. LM studio is okay (especially how you download new models, I quite like and changing settings is pretty easy), but feels quite slow as a server and often I just want to close and my server if it crashes and it's so much faster in ollama than going to the server option in LM studio.
It's really good, but there are concerns about data security and privacy due to unaudited closed source code. It's probably okay, but who knows.
Don't feed it anything unless you don't mind 3rd parties reading and exploiting your data.
[Pinokio](https://pinokio.computer/) has many AI apps including LLMs, installs take one click. [Open WebUI](https://pinokio.computer/item?uri=https://github.com/pinokiofactory/open-webui) in Pinokio is a pretty amazing Ollama frontend. It has many features and options, plus an online [repository](https://openwebui.com/#open-webui-community) with plugins and documentation and such.
https://preview.redd.it/dfhrfz4efjvd1.png?width=2935&format=png&auto=webp&s=f6fdbe9359bea05638ef4a135b445cef64e4210a
I used to in my "exploration phase". I ended up with Kobold.cpp since for me that's what ran the fastest (using an AMD Laptop without a GPU), and that it has no installation or anything required: just a single file. Unless development gets dropped at some point, I doubt I'll ever switch to anything else.
Consider Backyard for role-play. ChatGPT4all was good last time I looked.
LM Studio uses to run quite slow but it's improved. It can run as a backend for other front-ends if you want to experiment and can't get your head around Ollama.
91 Comments
_underlines_@reddit
macheteBlade@reddit
pablogabrieldias@reddit
GimmePanties@reddit
Flimsy-Tonight-6050@reddit
Impossible-Value5126@reddit
GimmePanties@reddit
ImaginaryRea1ity@reddit
GimmePanties@reddit
ramplank@reddit
muxxington@reddit
pablogabrieldias@reddit
muxxington@reddit
Eugr@reddit
first2wood@reddit
constPxl@reddit
Ill-Total9416@reddit
cantgetthistowork@reddit
Atari-Katana@reddit
visionsmemories@reddit
bearbarebere@reddit
road-runn3r@reddit
bearbarebere@reddit
entangledloops@reddit
road-runn3r@reddit
CoyRogers@reddit
road-runn3r@reddit
mamba436@reddit
d3ftcat@reddit
bearbarebere@reddit
d3ftcat@reddit
SpareIcy8308@reddit
Deluded-1b-gguf@reddit
bearbarebere@reddit
Deluded-1b-gguf@reddit
bearbarebere@reddit
Zangwuz@reddit
bearbarebere@reddit
Zangwuz@reddit
bearbarebere@reddit
Zangwuz@reddit
bearbarebere@reddit
Zangwuz@reddit
bearbarebere@reddit
noneabove1182@reddit
a_beautiful_rhind@reddit
Elgamer_795@reddit
Pro-editor-1105@reddit
Pro-editor-1105@reddit
mamba436@reddit
Awario_time@reddit
TaggM@reddit
Orlando_orchids@reddit
BeYourBestVersion@reddit
Longjumping_Ad5434@reddit
first2wood@reddit
circlesqrd@reddit
Pristine_Income9554@reddit
maxidev0x@reddit
Historical_Scholar35@reddit
Possible-Basis-6623@reddit
Historical_Scholar35@reddit
maxidev0x@reddit
noneabove1182@reddit
Pristine_Income9554@reddit
noneabove1182@reddit
Salt-Bread4114@reddit
Desmack1@reddit
ianwill93@reddit
No-Detective-5352@reddit
ies7@reddit
MoreFoxBeans@reddit
Future_Might_8194@reddit
vaksninus@reddit
unlikely_ending@reddit
smshrimant@reddit
NextTo11@reddit
arthurtully@reddit
Different-Effect-724@reddit
AccidentAnnual@reddit
shyam667@reddit
AyraWinla@reddit
RandumbRedditor1000@reddit
RealBiggly@reddit
Revolutionary_Put475@reddit
Arkonias@reddit
Weary_Long3409@reddit
Some_Endian_FP17@reddit
-Hello2World@reddit
utf80@reddit
SAPPHIR3ROS3@reddit