nderstand2grow

Don't take Apple MLX too seriously, it's not going to last, here's why (serious post)

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 56 comments
Q2 models are utterly useless. Q4 is the minimum quantization level that doesn't ruin the model (at least for MLX). Example with Mistral Small 24B at Q2 ↓

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 94 comments
Here's how to turn off "thinking" in Qwen 3: add "/no_think" to your prompt or system message.

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 36 comments
ollama's enshitification has begun! open-source is not their priority anymore, because they're YC-backed and must become profitable for VCs... Meanwhile llama.cpp remains free, open-source, and easier-than-ever to run! No more ollama

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 289 comments
What LLM benchmarks actually measure (explained intuitively)

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 15 comments
Are there any attempts at CPU-only LLM architectures? I know Nvidia doesn't like it, but the biggest threat to their monopoly is AI models that don't need that much GPU compute

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 124 comments
Budget is $30,000. What future-proof hardware (GPU cluster) can I buy to train and inference LLMs? Is it better to build it myself or purchase a complete package from websites like SuperMicro?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 85 comments
Opinion: Ollama is overhyped. And it's unethical that they didn't give credit to llama.cpp which they used to get famous. Negative comments about them get flagged on HN (is Ollama part of Y-combinator?)

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 127 comments
This map of Dallas that's used in EVGateway app. Never seen it rotated like that!

Posted by nderstand2grow@reddit | Dallas | View on Reddit | 8 comments
Storing LLM models on external SSD: Is the SSD speed important? Samsung T7 w/ USB 3.2 (read: 1050MB/s) vs. Fantom w/ Thunderbolt 3/4 (read: 2800MB/s)

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 1 comments
DeepSeek R1 32B is way better than 7B Distill, even at Q4 quant

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 51 comments
Are the rumors about Dallasians attitude true? I've heard people say Dallasians think they're better than others (obviously a generalization) and want to adjust my expectations when facing people there

Posted by nderstand2grow@reddit | Dallas | View on Reddit | 8 comments
About to move to Dallas. Are the rumors about Dallasians attitude true? I've heard people say Dallasians think they're better than others (obviously a generalization) and want to adjust my expectations when facing people there

Posted by nderstand2grow@reddit | Dallas | View on Reddit | 1 comments
Which areas of Dallas are walkable to coffee shops and restaurants while still less than 30 min drive to campus? (Something like east-coast NYC vibes but in Dallas?)

Posted by nderstand2grow@reddit | askdfw | View on Reddit | 25 comments
Which areas of Dallas are walkable to coffee shops and restaurants while still less than 30 min drive to campus? (Something like east-coast NYC vibes but in Dallas?)

Posted by nderstand2grow@reddit | Dallas | View on Reddit | 14 comments
Walkable neighborhoods (with coffee shops, restaurants, etc.) for living near UT Dallas?

Posted by nderstand2grow@reddit | askdfw | View on Reddit | 18 comments
 Does Apple shifting from cars to AI mean they didn't have a secret AI project after all? Every one thought Apple was late to the party because they wanted to create the best user experience with their AI. I think the simpler explanation is that they just didn't take LLMs seriously at all

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 2 comments
Pre-configured Computers for local LLM inference be like:

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 14 comments
For those who don't know what different model formats (GGUF, GPTQ, AWQ, EXL2, etc.) mean ↓

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 10 comments
Llama 4 announced

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 75 comments
Is there a formula or rule of thumb about the effect of increasing context size on tok/sec speed? Does it *linearly* slow down, or *exponentially* or ...?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 9 comments
Auto-Approve MCP Requests in the Claude App

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 0 comments
Llama 4 performance is poor and Meta wants to brute force good results into a bad model. But even Llama 2/3 were not impressive compared to Mistral, Mixtral, Qwen, etc. Is Meta's hype finally over?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 45 comments
Gemini 2.5 Pro isn't multimodal, but IMO it's Hyped: Asked it to turn a scenic view photo be like taken at night. Its response: "this is a car".

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 8 comments
Quantization Method Matters: MLX Q2 vs GGUF Q2_K: MLX ruins the model performance whereas GGUF keeps it useable

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 37 comments
Exolab: NVIDIA's Digits Outperforms Apple's M4 Chips in AI Inference

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 196 comments
I need your expert recommendation: Best setup for <$30,000 to train, fine tune, and inference LLMs? 2xM3 Ultras vs 8x5090 vs other options?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 25 comments
Llama-3-70B is insanely good at following format instructions—if you don't like boilerplate talk, it avoids it and keeps the responses to the point

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 3 comments
"Claude 3 > GPT-4" and "Mistral going closed-source" again reminded me that open-source LLMs will never be as capable and powerful as closed-source LLMs. Even the costs of open-source (renting GPU servers) can be larger than closed-source APIs. What's the goal of open-source in this field? (serious)

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 17 comments
How to test if a model is truly UNCENSORED?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 5 comments
Apple has not released any capable open-source LLM despite their MLX framework which is highly optimized for Apple Silicon.

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 74 comments
New research shows RLHF heavily reduces LLM creativity and output variety

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 50 comments
My new LLM test that R1 got right on the first try: "Jeff has two brothers and each of his brothers has three sisters and each of the sisters has four step brothers and each of the step brothers has five step sisters. How many siblings are there in this family?"

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 12 comments
DeepSeek under DDOS attacks or simply too busy?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 7 comments
China's DeepSeek triggers global tech sell-off

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 0 comments
DeepSeek R1 Plus Subscription at $5/mo?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 3 comments
I broke DeepSeek R1 Distill Llama 8B GGUF Q8_0 with this question: """Jeff has two brothers and each of his brothers has three sisters and each of the sisters has four step brothers. How many step brothers does each brother have?"""

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 12 comments
Why are LLM benchmarks run only on individual models, and not on systems composed of models? For example, benchmarking "GPT-4" (just a model) vs "GPT-3.5 + Chain of Thought Reasoning + a bunch of other cool tricks" (a system) would've likely shown the GPT-3.5 system performs better than GPT-4...

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 5 comments
LiesLLM: I found out that Perplexity's PPLX-"online" model is not actually online. They cache/index the internet and feed to the LLM, so the LLM doesn't actually have browsing/internet-access.

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 3 comments
Asking for hardware recommendations for a personal machine capable of running +70B models. With cloud options I have to re-download the model every time. Should I bite the bullet and get Mac Studio M2 Ultra ($7000 after tax), or build a PC? What specs do you recommend?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 13 comments
LLMs be like... reminder for the GPU poor among us :‘(

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 4 comments
Based on CES '25 announcements, what's the best "stackable" GPU rig for running big models at high tok/s?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 17 comments
macro-o1 (open-source o1) gives the *cutest* AI response to the question "Which is greater, 9.9 or 9.11?" :)

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 109 comments
Doing a non-CS PhD, want to get hired in AI. What are my chances? I have extensive experience with local LLMs: running, serving, quantization, finetuning, building web apps based on LLMs, structured output using JSON and grammars, etc.

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 19 comments
Cohere's Command R Plus deserves more love! This model is at the GPT-4 league, and the fact that we can download and run it on our own servers gives me hope about the future of Open-Source/Weight models.

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 123 comments
iPhone 16 Pro: What are some local models to run on the new iPhone with only 8GB of RAM? Is the RAM really that low compared to Pixel 9 Pro which has 16GB and Galaxy S24 Ultra with 12GB? How can Apple Intelligence run on 8GB then?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 61 comments
Gemini Pro fails the famous 6 sisters vs. 1 sister test

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 5 comments
People here say they use local LLMs for story telling, chat, etc., but what "stories" are they telling and in what applications? I imagine just building a PC rig to have a story-telling LLM is overkill. Am I missing something? Please let us know if you're using LLMs in a creative way!

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 1 comments
Did OpenAI just kill llama.cpp's GBNF grammars (used for guaranteed structured outputs) without acknowledging that their idea came from open-source? What advantages do llama.cpp's grammars have now that OpenAI supports something similar?

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 11 comments
The Myth of Open Source Large Language Models: A Critical Perspective

Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 18 comments