Setting up local LLM system and charging tokens back to company
Posted by Wa1ker1@reddit | LocalLLaMA | View on Reddit | 7 comments
With all the recent issues with Claude and issues with codex I'm having it's more and more clear to me I need to have a large model LLM thats comparable to use for reliable work assistance.
I have a company myself but also work with another company that refuses to hire me more staff. For two weeks I've been arguing I need more on staff and have been given pushback tho they keep expanding an increased workload. They would rather outsource the work load or pay more for ai services. an example when I told them to give me 10-12k and 6k a month for an employee monthly they instead signed a 1 yr contract for 25k a month we can't even work with.
After speaking with our CFO the best solution is build out what I need out of pocket and cancel current services and bill them out monthly for token usages and fair market value prices vs buying equipment a little at a time.
this would give me the immediate deductible for equipment and allow a way to recover into a profitable status in a couple years. Also allowing me to charge other clients I work with for token usage directly and monitor extended electricity usages to charge back for.
I'll be heavily reliant on new models coming out from Kimi and minimax but possibly without the issues I currently have of downtime and the models seeming to get dumber by the day. but give a reliable system in place locally.
I'm not talking about building a system for 50 users just myself and maybe one or two more on team.
has anyone done this or thoughts on it worth it? I do have 2 companies I may contract to coming up in next couple months agreed to 10-12k equipment expense budget as well.
BobbyL2k@reddit
You can probably use one of the LLM gateways, there’s quite a few.
But if you’re going to host open models, the price on those tokens are incredibly low. I don’t think you’ll be making a return on hardware investments, especially considering your situation with lower number of users.
ahjorth@reddit
They key missing pieces of information are: what are they willing to pay per token, do they guarantee a minimal number of tokens per month, and are they loyal or will they suddenly shift to someone else. Unless you know this, you can't really put together a business case. So I'd start there.
BobbyL2k@reddit
You can probably use one of the LLM gateways, there’s quite a few.
But if you’re going to host open models, the price on those tokens are incredibly low. I don’t think you’ll be making a return on hardware investments, especially considering your situation with lower number of users.
Wa1ker1@reddit (OP)
It's my decision which AI I want to use. So they won't switch as long as I'm with them. CFO said fair market value so no issues if audit later they don't want it to seem like I'm over charging. I would have to compare to Claude API cost with enterprise and Codex and Cursor. I brought up the electrical increase I would have and his recommendation was getting a meter for the system and charge that separate as a line item cost reimbursement.
cryyingboy@reddit
Ran a similar setup for our small team last year. the hardware costs add up faster than you think tbh, especially if youre trying to match something like claude quality. we built a simple token tracking and billing layer that took way longer than expected but now it basically pays for itself. the electricity monitoring part is where most people underestimate costs imo.
Wa1ker1@reddit (OP)
Yeah electricity was my concern as well. He said to put a meter in between the hardware to track usage and bill for the usage separately per month at my local rate.
The big issue with hardware I'm concerned about is making sure the system is upgradable as new models come out that require a larger system.
Party-Special-5177@reddit
What a lovely spot to be in! The big question is do you realllyyy want to become a data center, because that is the main discriminant. (As soon as you start charging for it, people will start expecting certain minimum levels of service, etc)
Keep in mind that paying for electricity (not just for the cluster, but also the AC/cooler) is really going to hurt you here. The new ai data centers coming online don’t pay for electricity - the big thing these days is off grid ‘microgrids’ and they run the center off of solar. If you plan to grow this long term, you should consider the capital costs of that infrastructure too.
I suspect you are thinking like a hobbyist (e.g. I’ll just buy a bunch of pro 6000s [possibly also “and just stuff them in that old janky rig in the attic”] etc) but at a certain price point, buying old datacenter hardware starts to make more sense (e.g. 8xA100s start around 60k and you get sxm nvlink, vs 6 pro 6000s servers on pcie) and will be more reliable in the long run.
I looked at all this towards the close of last year and the capital outlay was surprisingly more than expected, and at the end you turn into a datacenter.
… to be fair, some guys love that idea. Not my idea of a good time lol.