What is the best use of local llms?

[-]

AXYZE8@reddit

With your setup I would use oMLX with Qwen3 27B + DFlash / fresh Qwen 3.6 35B for agentic coding.

On the other hand - why did you drop all you can afford on beefy Mac Studio, if you didnt research what you will use it for? Whatever we use it for may require completely different kind of hardware and that Mac Studio can be just not good enough or an overkill, this won't help you especially when majority of us are using CUDA.

[-]

Responsible-Lie-7159@reddit (OP)

From youtube videos and a bit of ai research, it showed as the most affordable one. Although slow, it’s cheaper than RTX gpus.

[-]

AXYZE8@reddit

I wrote about agentic coding with Qwen models. If that's what you wanted to do then AMD V620 32GB is \~530USD right now and would be a lot faster than Mac Studio, so your choice wasn't most affordable one for that. This is the exact problem I wrote about... you should research what you need before dropping money on that.

Apple (MLX) is a very niche and small ecosystem compared to CUDA. Tons of things are need to be done differently to the point where you need to use different LLM quants depending if you got M2 Ultra or M3 Ultra (non-quant dtype float16 for M1/M2 and bloat16 for M3+) for optimal performance.

Majority of people here are running CUDA, whatever is easy for them (for example video generation) on like cheapest RTXes like RTX 5060 8GB is out of reach to you and whatever is easy for you is hard for them.

You can do whatever you want with your money ofc, but you'll waste of time getting recommendations that are not suited for your hardware even if you think they should work because it works well on some cheap RTX.

[-]

Responsible-Lie-7159@reddit (OP)

Can you name me some budget graphic cards that will give better performance?

I will sell my mac studio to some friends if this option is indeed better.

[-]

dilberx@reddit

I think mac will be better in terms of energy efficiency. This costs 4k usd right? Why not nvidia dgx spark?

[-]

Responsible-Lie-7159@reddit (OP)

The dgx spark review vs the mac studio isn’t that great.

[-]

RottenPingu1@reddit

Local chat and teacher for tech and language. I do not have access to these things in my community.

[-]

jikilan_@reddit

Local AI assistant to speak to you, controlling home devices. Works even when internet is down

[-]

o0genesis0o@reddit

They are slowly becoming less niche toys or low-cost overnight batch processing, and more like useful personal agent.

The other day, I attached the gemma 4 e2b at Q6 to my agent harness, and hammer it with 16k system prompt right off the bat. And I made a mistake that I forgot to include one tool that I instructed the agent to use. Even the OSS 20B last year would have problem. However, to my surprise, this small model thinks coherently, reason it way around the missing tool, and finish the task with the right tool call, without over thinking. With better configuration on my side, I can see this e2b taking care of daily personal assistant tasks like briefing, scheduling, logging, web search Q&A, RAG, maybe even some simple translation.

With 64GB, you would run the bigger 26B MoE Gemma 4 model or the 35B MoE Qwen 3.6. I fully believe they would drive agent harness well enough to be useful.

[-]

Waarheid@reddit

I dropped E4B in a rather large and complex codebase (mono repo for a large data interface at work), asked it some niche questions, and it was able to explore the codebase and answer my questions. This was on the pi.dev harness. It's also excelled at simpler multi-turn tasks, like using research tools in a loop then using a notification tool to send me the result.

I had another issue that Sonnet solved, where there was some misentered configuration value overrides in a file, including the path to a local database file, causing an issue on startup for the application - I pointed it to the docker container's logs, the docker compose file, and the env file, and E4B got it wrong (it asked me to delete the db and restart). 26B-A4B got it right, arriving at the same solution as Sonnet - remove the extra database file path variable from the env file. This was basically a "can you find the simple error amongst the noise" test.

For me, I think I am happy with E4B (at Q8_K_XL) as a general agent, and 26B (Q4_K_XL) might be okay for simpler debugging and development problems, but I will stick with Sonnet for that as it is for my work. The memory and speed gains of using E4B are great, vs 26B, on my modest 32gb M1 Max.

[-]

SM8085@reddit

Summarizations for one. Feeding youtube subtitles into the bot and asking it to summarize it so I don't have to watch the video is nice. You or the bot can whip up something that uses yt-dlp to feed subtitles to a bot in Python.

Image analysis. Images and video (video is just multiple frames) can be analyzed with modern bots like Qwen3.5/3.6. Can have it tag/sort things. My video editing script took out 2/3 of a video where a lot of it was just an empty room because it was recorded live footage and the streamer was AFK, etc.

Coding. Qwen3.5-27B/Qwen3.5-122B-A10B can take care of a lot of my scrub problems. I can bump it up to a frontier model through API when it can't accomplish the task.

Audio multimodality is newer and potentially interesting. One concept I was playing with was a teleprompter that analyzed if I said the line correctly or not, and kept taking takes until I got it right. Making something that generates subtitles via bots could be interesting, compared to options like whisper.

I was testing speaker diarization with a rick and morty clip/short to see if the bot could figure it out,

Garbage goober, we got fresh shit trash.

It makes mistakes occasionally...

[-]

techno156@reddit

I use it to make descriptive filenames for things. So if I have a batch of camera photos, instead of DSC_0001.jpeg, I have some code to fire it off to a local model to make a file-name, which is then used instead of the original.

I could rename them all by hand, but I'd be spending days when the local model could do it in about an hour.

[-]

TutorDry3089@reddit

Personally, I find local LLMs inadequate for serious work. While they can handle simple tasks like cleaning up emails or summarizing news articles, they fall short of more complex demands. Serious work requires sophisticated models that are currently impractical for local setups.

[-]

Responsible-Lie-7159@reddit (OP)

I do some of my coding now too, mostly frontend development, but i want to understand if it’s more of a passion and research or does it actually deliver on things too?

[-]

TutorDry3089@reddit

In my opinion, local setups are more of a toy than a practical tool at this point. Others might disagree.