nderstand2grow
-
Don't take Apple MLX too seriously, it's not going to last, here's why (serious post)
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 56 comments
-
Q2 models are utterly useless. Q4 is the minimum quantization level that doesn't ruin the model (at least for MLX). Example with Mistral Small 24B at Q2 ↓
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 94 comments
-
Here's how to turn off "thinking" in Qwen 3: add "/no_think" to your prompt or system message.
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 36 comments
-
ollama's enshitification has begun! open-source is not their priority anymore, because they're YC-backed and must become profitable for VCs... Meanwhile llama.cpp remains free, open-source, and easier-than-ever to run! No more ollama
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 289 comments
-
What LLM benchmarks actually measure (explained intuitively)
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 15 comments
-
Are there any attempts at CPU-only LLM architectures? I know Nvidia doesn't like it, but the biggest threat to their monopoly is AI models that don't need that much GPU compute
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 124 comments
-
Budget is $30,000. What future-proof hardware (GPU cluster) can I buy to train and inference LLMs? Is it better to build it myself or purchase a complete package from websites like SuperMicro?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 85 comments
-
Opinion: Ollama is overhyped. And it's unethical that they didn't give credit to llama.cpp which they used to get famous. Negative comments about them get flagged on HN (is Ollama part of Y-combinator?)
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 127 comments
-
This map of Dallas that's used in EVGateway app. Never seen it rotated like that!
Posted by nderstand2grow@reddit | Dallas | View on Reddit | 8 comments
-
Storing LLM models on external SSD: Is the SSD speed important? Samsung T7 w/ USB 3.2 (read: 1050MB/s) vs. Fantom w/ Thunderbolt 3/4 (read: 2800MB/s)
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 1 comments
-
DeepSeek R1 32B is way better than 7B Distill, even at Q4 quant
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 51 comments
-
Are the rumors about Dallasians attitude true? I've heard people say Dallasians think they're better than others (obviously a generalization) and want to adjust my expectations when facing people there
Posted by nderstand2grow@reddit | Dallas | View on Reddit | 8 comments
-
About to move to Dallas. Are the rumors about Dallasians attitude true? I've heard people say Dallasians think they're better than others (obviously a generalization) and want to adjust my expectations when facing people there
Posted by nderstand2grow@reddit | Dallas | View on Reddit | 1 comments
-
Which areas of Dallas are walkable to coffee shops and restaurants while still less than 30 min drive to campus? (Something like east-coast NYC vibes but in Dallas?)
Posted by nderstand2grow@reddit | askdfw | View on Reddit | 25 comments
-
Which areas of Dallas are walkable to coffee shops and restaurants while still less than 30 min drive to campus? (Something like east-coast NYC vibes but in Dallas?)
Posted by nderstand2grow@reddit | Dallas | View on Reddit | 14 comments
-
Walkable neighborhoods (with coffee shops, restaurants, etc.) for living near UT Dallas?
Posted by nderstand2grow@reddit | askdfw | View on Reddit | 18 comments
-
Does Apple shifting from cars to AI mean they didn't have a secret AI project after all? Every one thought Apple was late to the party because they wanted to create the best user experience with their AI. I think the simpler explanation is that they just didn't take LLMs seriously at all
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Pre-configured Computers for local LLM inference be like:
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 14 comments
-
For those who don't know what different model formats (GGUF, GPTQ, AWQ, EXL2, etc.) mean ↓
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Llama 4 announced
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 75 comments
-
Is there a formula or rule of thumb about the effect of increasing context size on tok/sec speed? Does it *linearly* slow down, or *exponentially* or ...?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 9 comments
-
Auto-Approve MCP Requests in the Claude App
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Llama 4 performance is poor and Meta wants to brute force good results into a bad model. But even Llama 2/3 were not impressive compared to Mistral, Mixtral, Qwen, etc. Is Meta's hype finally over?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 45 comments
-
Gemini 2.5 Pro isn't multimodal, but IMO it's Hyped: Asked it to turn a scenic view photo be like taken at night. Its response: "this is a car".
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Quantization Method Matters: MLX Q2 vs GGUF Q2_K: MLX ruins the model performance whereas GGUF keeps it useable
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 37 comments
-
Exolab: NVIDIA's Digits Outperforms Apple's M4 Chips in AI Inference
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 196 comments
-
I need your expert recommendation: Best setup for <$30,000 to train, fine tune, and inference LLMs? 2xM3 Ultras vs 8x5090 vs other options?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 25 comments
-
Llama-3-70B is insanely good at following format instructions—if you don't like boilerplate talk, it avoids it and keeps the responses to the point
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 3 comments
-
"Claude 3 > GPT-4" and "Mistral going closed-source" again reminded me that open-source LLMs will never be as capable and powerful as closed-source LLMs. Even the costs of open-source (renting GPU servers) can be larger than closed-source APIs. What's the goal of open-source in this field? (serious)
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 17 comments
-
How to test if a model is truly UNCENSORED?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 5 comments
-
Apple has not released any capable open-source LLM despite their MLX framework which is highly optimized for Apple Silicon.
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 74 comments
-
New research shows RLHF heavily reduces LLM creativity and output variety
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 50 comments
-
My new LLM test that R1 got right on the first try: "Jeff has two brothers and each of his brothers has three sisters and each of the sisters has four step brothers and each of the step brothers has five step sisters. How many siblings are there in this family?"
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 12 comments
-
DeepSeek under DDOS attacks or simply too busy?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 7 comments
-
China's DeepSeek triggers global tech sell-off
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 0 comments
-
DeepSeek R1 Plus Subscription at $5/mo?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 3 comments
-
I broke DeepSeek R1 Distill Llama 8B GGUF Q8_0 with this question: """Jeff has two brothers and each of his brothers has three sisters and each of the sisters has four step brothers. How many step brothers does each brother have?"""
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Why are LLM benchmarks run only on individual models, and not on systems composed of models? For example, benchmarking "GPT-4" (just a model) vs "GPT-3.5 + Chain of Thought Reasoning + a bunch of other cool tricks" (a system) would've likely shown the GPT-3.5 system performs better than GPT-4...
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 5 comments
-
LiesLLM: I found out that Perplexity's PPLX-"online" model is not actually online. They cache/index the internet and feed to the LLM, so the LLM doesn't actually have browsing/internet-access.
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Asking for hardware recommendations for a personal machine capable of running +70B models. With cloud options I have to re-download the model every time. Should I bite the bullet and get Mac Studio M2 Ultra ($7000 after tax), or build a PC? What specs do you recommend?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 13 comments
-
LLMs be like... reminder for the GPU poor among us :‘(
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 4 comments
-
Based on CES '25 announcements, what's the best "stackable" GPU rig for running big models at high tok/s?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 17 comments
-
macro-o1 (open-source o1) gives the *cutest* AI response to the question "Which is greater, 9.9 or 9.11?" :)
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 109 comments
-
Doing a non-CS PhD, want to get hired in AI. What are my chances? I have extensive experience with local LLMs: running, serving, quantization, finetuning, building web apps based on LLMs, structured output using JSON and grammars, etc.
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 19 comments
-
Cohere's Command R Plus deserves more love! This model is at the GPT-4 league, and the fact that we can download and run it on our own servers gives me hope about the future of Open-Source/Weight models.
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 123 comments
-
iPhone 16 Pro: What are some local models to run on the new iPhone with only 8GB of RAM? Is the RAM really that low compared to Pixel 9 Pro which has 16GB and Galaxy S24 Ultra with 12GB? How can Apple Intelligence run on 8GB then?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 61 comments
-
Gemini Pro fails the famous 6 sisters vs. 1 sister test
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 5 comments
-
People here say they use local LLMs for story telling, chat, etc., but what "stories" are they telling and in what applications? I imagine just building a PC rig to have a story-telling LLM is overkill. Am I missing something? Please let us know if you're using LLMs in a creative way!
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Did OpenAI just kill llama.cpp's GBNF grammars (used for guaranteed structured outputs) without acknowledging that their idea came from open-source? What advantages do llama.cpp's grammars have now that OpenAI supports something similar?
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 11 comments
-
The Myth of Open Source Large Language Models: A Critical Perspective
Posted by nderstand2grow@reddit | LocalLLaMA | View on Reddit | 18 comments