Best Cloud GPU / inference option / costs for per hour agentic coding

Posted by AdSuccessful4905@reddit | LocalLLaMA | View on Reddit | 4 comments

Hey folks,

I'm finding Copilot is sometimes quite slow and I would like to be able to chose models and hosting options instead of paying the large flat fee. I'm part of a software engineering team and we'd like to find a solution... Does anyone have any suggestions for GPU Cloud hosts that can host modern coding models? I was thinking about Qwen3 Coder, and what kind of GPU would be required to run the smaller 30B and the larger 480B parameter model- or are there newer SOTA models that outperform that as well?

I have been researching GPU Cloud providers and am curious about running our own inferencing on https://northflank.com/pricing or something like that... Do folks think that would take a lot of time to setup and that the costs would be significantly greater than using an inferencing service such as Fireworks.AI or DeepInfra?

Thanks,
Mark