Need your honest feedback on a new LLM server I'm building.

Posted by YannMasoch@reddit | LocalLLaMA | View on Reddit | 9 comments

Hi all, I am building an hi-performance and highly customizable local LLM server wrote 100% in Rust, custom CUDA kernels, zero latency, almost immediate TTFT, and plenty of other features. It is planned to be publish it on GitHub as open-source soon.

Probably like most of you, I was not happy with Ollama, llamacpp and others, so I decided to build something new.

I'm not here to hype or promote, just a tinkerer and an user like you looking for input from the community before throwing it on GitHub.

If anyone’s interested, I'm happy to hear your honest feedback and give more details.