Need your honest feedback on a new LLM server I'm building.

Posted by YannMasoch@reddit | LocalLLaMA | View on Reddit | 9 comments

Hi all, I am building an hi-performance and highly customizable local LLM server wrote 100% in Rust, custom CUDA kernels, zero latency, almost immediate TTFT, and plenty of other features. It is planned to be publish it on GitHub as open-source soon.

Probably like most of you, I was not happy with Ollama, llamacpp and others, so I decided to build something new.

I'm not here to hype or promote, just a tinkerer and an user like you looking for input from the community before throwing it on GitHub.

If anyone’s interested, I'm happy to hear your honest feedback and give more details.

[-]

human_bean_@reddit

How is it better compared to llama.cpp?

[-]

YannMasoch@reddit (OP)

Good question!

llama.cpp is great, Ollama runs on top of it, and it's where a lot of us started. I actually planned to use it too, but quickly hit walls on things I needed: deep model control, custom telemetry, per-request KV cache control, KV quant tuning, and a lot of low-level details that llama.cpp either abstracts away or doesn't expose at all.

So Distropy isn't trying to replace llama.cpp or Ollama for casual use. It targets power users, researchers, and developers who want to squeeze every bit out of their hardware and actually control what's happening under the hood.

Happy to share some numbers and a quick demo soon, that'll make it more concrete than words.

[-]

RevolutionaryGold325@reddit

https://github.com/Kaden-Schutt/hipfire

[-]

YannMasoch@reddit (OP)

This is interesting, thanks for sharing it!

[-]

ExplosiveCompote@reddit

Realistically no one whose help you'd actually want is going to care until you have some results or even a technical detail to share.

Genuinely wishing you good luck though.

[-]

svachalek@reddit

Right. “Rust” is not much of a feature list.

[-]

YannMasoch@reddit (OP)

That's true. I wanted to post a quick video of the server replacing Ollama server just by changing the port on Vscode and Continue, and a few metrics. But I thought it would have been considered as self promoting.

Thank you! I really appreciate the kind words and the positive energy.

[-]

Baldur-Norddahl@reddit

I like anything Rust, but I must say it is going to be very hard to keep up. Every week there is a new model. People want instant gratification and will hate on any project that fails to add support within days of model release.

[-]

YannMasoch@reddit (OP)

You're totally right! New models or tech advancements appear almost every weeks and I've to keep up. I didn't think about users wanting get the last support immediately - good point.

What is your project, that will give me an idea.