Help! What and how ro run on m3 ultra 512. (Coding)

Posted by matyhaty@reddit | LocalLLaMA | View on Reddit | 6 comments

Hello everyone

I could really do it some advice and help on what local coding ai to host on my mac stdio m3 ultra with 512gb. we will only use for coding.

As I have discovered over the last weekend, it's not just a matter of what model to run.But also what server to run it on

So far, I have discovered that l m studio is completely unusual and takes ninety percent of the time processing the prompt

I haven't had much time with olama, but have experimented with llama c p p and omlx. both of those seem better, but not perfect. them its whether to use gguf or mlx. then what qant. then what lab (unclothed, etc) and before you know it my head is fried.

As for models, we did loads of test prior to purchase and found that g l m 5 is really good, but it's quite a big model and seems quite slow

Obviously having a very large amount of vram opens a lot of doors, but also this isn't just for one user. So it's a balance between reasonable speed and quality of output. if I had to choose, I would choose quality of output above all else

welcome any opinions and thoughts. especially on things which confuse me like the server to run it, the setting for them. models.wise we will just test them all!!!

thank you.