Why run local? Count the money

Posted by Badger-Purple@reddit | LocalLLaMA | View on Reddit | 128 comments

I’m not a coder, but I run local models. I gave in to agent hype (I was building my own, but there is so much to do) and installed Hermes. Running with Qwen-397b out of a 2 spark cluster.
So…I asked Hermes today to tally the token count, and the result…200 million tokens. In 5 days.

At this rate, using an agent for tasks like installing software and debugging things I want to try out, what is the cost I am saving? Artificial Analysis says the price is about 1.25 dollars per million tokens on average from providers. At current pricing per Artificial Analysis, that gives me about 1250 dollars per month, and my sparks will pay themselves by 6 months.

So, caveats of course I bought them at cheaper prices than today, but it’s a simple estimate that there is some valid reasons to go local.

Like I said, I am not programming and I know there are programmers that easily triple my token count in the same time. That implies that if you use 100 million tokens per day, the return on investment is still there today, even with crazy computer prices.

To me, local AI is about the desire to utilize a cool technology without the strings attached that threaten individual privacy and intellectual property. But knowing that my investment is not just purely hobbyism gives me more conviction that local AI is the future.

I know I am preaching to the choir…So the question is, has anyone else felt their rig is becoming more sustainable now than 6 months ago, price wise? Would love to hear!!

[-]

Juan_Valadez@reddit

In my case, I run LLMs locally for all these reasons:

- Privacy.

- Availability.

- Consistency.

- Customization.

- No usage limits.

- Price.

- And simply because I like it.

[-]

iVtechboyinpa@reddit

> Availability, usage limits & pricing

I pay for the $200/month Claude Code plan. I would be surprised if they go up to $250, $300, $350, etc.

At that point, why am I paying a car payment for AI? $200 is already tough to swallow - that’s $2,400/years like WTAF lmao. But it’s fine, I’ll take it on the chin because it works, it’s simple, ion gotta configure nothing.

But if they ever raise the price, or are on their (usual) extended outage, or change usage limits and I can’t use the model, I’ve got my trusty ‘ole server @ home.

[-]

CabinetMain3163@reddit

I pay 100$ for unlimited usage featherless a month, still cheaper than getting gpu that could handle 256k tokens context and everything around that gpu...

[-]

gambit700@reddit

This is seriously why I'm considering getting a strix halo or spark. $2400 a year for claude is nuts.

[-]

CabinetMain3163@reddit

I get most of that using featherless though and front cost of gpu to do the same would be a lot

[-]

nochkin@reddit

The "I like it" part should be the first one in the list.

[-]

eribob@reddit

I agree!

[-]

No-Equivalent-2440@reddit

amen to that

[-]

kevin_1994@reddit

for me it goes something like:

is it cheaper? no
is it more performant? no
do I care about privacy? no
do I like to tell people I run chatGPT on my gaming pc? ye

[-]

OuterKey@reddit

The fact it's even possible on consumer hardware (in some cases older hardware!) and now becoming more and more useful. Price wise is iffy in the short term (gpus can be expensive) but in the long run would be cheaper than a subscription.

No need for internet is a big win too

[-]

misanthrophiccunt@reddit

I think Privacy is not written there enough times. I would have written it after every other line. People without NDAs do not realise how essential that part is.

[-]