A Unified Model Registry for all your Local AI Apps
Posted by EvanZhouDev@reddit | LocalLLaMA | View on Reddit | 11 comments
A problem I’ve had is that every local AI app, like Ollama, LM Studio, Jan, all download and store their own copy of any local model you use. Using multiple tools leads to multiple duplicate model files eating up disk space.
So, I created UMR, the Unified Model Registry for all your local AI Apps!
It lets you add one canonical copy of whatever model you’re using, then link it to tools like Ollama, LM Studio, or Jan. Linking uses the same model you already downloaded, doesn’t require extra storage, and is super fast.
How to Set it Up
See the second image for a more graphical step-by-step.
- Install UMR via NPM or your favorite JS package manager:
npm i -g umr-cli
-
Add any Hugging Face GGUF model that you want. This CLI will let you interactively choose a quant file if applicable. After it finishes downloading, you’ll get its UMR Model ID. HF models already available on your device will be added straight from HF Cache.
umr add hf ggml-org/gemma-4-E2B-it-GGUF
-
Use that model ID to add it to any supported local AI app. For example, for the q8 version, this is what it would look like!
Link the model to Ollama
umr link ollama gemma-4-e2b-it-q8-0
Link the model to LM Studio
umr link lmstudio gemma-4-e2b-it-q8-0
Link the model to Jan
umr link jan gemma-4-e2b-it-q8-0
Now, the model should be available to use in each of those platforms respectively!
How Does It Work?
UMR itself does not necessarily store your model. It simply knows where to find them after you register them. For example, once you add hf, the model is still downloaded/fetched from Hugging Face Cache. UMR just takes note of where it is.
You can also add a model manually with umr add ./path/to/file.gguf, which will clone it locally into UMR's own store.
Then, when you link to a Client app like LM Studio, UMR intelligently chooses between hardlinking the model file into the app's own store, or simply points the app at UMR's managed path, making the process super fast and use no extra storage.
Feedback and Contribution
I'm open to feedback, including new features/client apps you want to see me integrate, new model sources you want to see me add, and questions!
UMR is also completely Open Source on GitHub: https://github.com/EvanZhouDev/umr
Feel free to contribute!
ContextLengthMatters@reddit
Why not move off of ollama and lm studio at this point? This is solving a problem that shouldn't exist once a user knows how these systems work. It's like putting lipstick on a pig.
EvanZhouDev@reddit (OP)
If you're only using one unified place for inference, I agree! And I'm sure many higher-level users like you already have your own solutions that don't require these basic apps. But many people out there still use ollama, LM Studio, and other tools. Even if they're not the "best," they're still easy to use and get started with. That being said, UMR also helps you manage all your model paths in one place, so you can use them in
llama.cppor other runtimes directly too.ContextLengthMatters@reddit
But what I'm saying is that once you recognize that these different servers are all managing the components independently, you should be graduating into understanding how to actually configure a server to pull from a shared repository.
This is like taking the training wheels off of the ollama bike and throwing them on a motorcycle. It doesn't necessarily make sense to me. Then you are now choosing tools based on what this supports rather than just take the tiny step into understanding your underlying tool configurations because of laziness.
It needlessly puts restraints on you.
SvanseHans@reddit
Wdym?
ContextLengthMatters@reddit
You can pull from a hugging face and just configure your inference servers. You don't need yet another manager of sorts. This just adds another thing that needs to be supported to learn for no reason.
pas_possible@reddit
Why just not use symbolic links, it's made for that
EvanZhouDev@reddit (OP)
Under-the-hood, UMR does use hardlinks or direct pointers to model files in the UMR registry to link it to LM Studio, Ollama, etc. UMR itself just keeps track of where the models are (such as in HF Cache). It doesn't store them itself if it's already on your system.
DegenDataGuy@reddit
You made an enitre npm package to create a symlink? you could have just created a batch file that checks for common directories to create the symlink to the HF cache and then give you an option to download a model via the HF name.
last_llm_standing@reddit
Nobody wants another layer on top. Why don't you make a blog post on how to handle multiple models downloads, where to locate them, how to download one model and use all the other tools on top of that. That would be a more informative post
EvanZhouDev@reddit (OP)
This tool is essentially what you are describing, if I'm understanding correctly? It allows you to handle all your model downloads in one place (
umr add hf), allows you to locate them (umr listandumr show <model>), and allows you to use tools on top of that (umr link <client> <model>). This simply makes the workflow reproducible.Emotional-Baker-490@reddit
eww, ollama.