3b and 7b Serving with new Hardware

Posted by No-Fig-8614@reddit | LocalLLaMA | View on Reddit | 4 comments

I don't want this to be a promotional post even though it kind of is. We are looking for people who want ot host 3b/8b models of the llama, gemma, and mistral model familys. We are working towards expanding to qwen and eventually larger model sizes: https://www.positron.ai/snap-serve

We are running an experiment to test our hardware out at $30 a month for 3b and $60 a month for 8b size models. If anyone has fine tunes. If you have a fine tune that you want running and can help test our hardware, the first 5 people will get a free month for the 3b model size and half off the 8b model size. We are looking for folks to try and test out the system on this new hardware outside Nvidia.

This isn't tiny LORA adapters running on crowded public serverless endpoints - we run your entire custom model in a dedicated instance for an incredible price with token per second rates double that of comparable NVIDIA options.

Would love for some people, and I know the parameter and model family size is not ideal but its just the start as we continue it all.