-
Have long context models solved attention dilution yet?
Posted by yuch85@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Any idea when RAM prices will be “normal”again?
Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 213 comments
-
PrimeIntellect is actually awesome
Posted by Icy_Gas8807@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Switching from Ollama to llama-swap + llama.cpp on NixOS: why I finally made the jump after adding a second RTX 3090
Posted by basnijholt@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Trained a chess LLM locally that beats GPT-5 (technically)
Posted by KingGongzilla@reddit | LocalLLaMA | View on Reddit | 34 comments
-
Is vLLM worth it?
Posted by Smooth-Cow9084@reddit | LocalLLaMA | View on Reddit | 44 comments
-
Users of Qwen3-Next-80B-A3B-Instruct-GGUF, How is Performance & Benchmarks?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 25 comments
-
Insert pauses into text file for kokoro
Posted by dts-five@reddit | LocalLLaMA | View on Reddit | 24 comments
-
nvidia/Orchestrator-8B · Hugging Face
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 13 comments
-
Need advice upgrading an old gaming desktop with a 5090 for AI
Posted by dtdisapointingresult@reddit | LocalLLaMA | View on Reddit | 9 comments
-
$6k AMD AI Build (2x R9700, 64GB VRAM) - Worth it for a beginner learning fine-tuning vs. Cloud?
Posted by buenavista62@reddit | LocalLLaMA | View on Reddit | 30 comments
-
Optimizing Token Generation in llama.cpp's CUDA Backend
Posted by am17an@reddit | LocalLLaMA | View on Reddit | 22 comments
-
I spent 2 years building privacy-first local AI. My conclusion: Ingestion is the bottleneck, not the Model. (Showcase: Ollama + Docling RAG Kit)
Posted by ChapterEquivalent188@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Recommendation for Production Hardware for inference and fine tuning.
Posted by Whyme-__-@reddit | LocalLLaMA | View on Reddit | 8 comments
-
DGX Spark reproducing the benchmarks by NVIDIA for training
Posted by khoka_x9@reddit | LocalLLaMA | View on Reddit | 2 comments
-
RAG of financial statements
Posted by Less_Piccolo_6218@reddit | LocalLLaMA | View on Reddit | 1 comments
-
TOON is terrible, so I invented a new format (TRON) to prove a point
Posted by No-Olive342@reddit | LocalLLaMA | View on Reddit | 74 comments
-
Recommendations for summarization and structured data extraction
Posted by cachophonic@reddit | LocalLLaMA | View on Reddit | 10 comments
-
One Bottleneck After Another - First GPU & now RAM
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 10 comments
-
unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF · Hugging Face
Posted by WhaleFactory@reddit | LocalLLaMA | View on Reddit | 110 comments
-
A 4B Model That Outperforms 32B on GUI Tasks, Fully Open-Source
Posted by Successful-Bill-5543@reddit | LocalLLaMA | View on Reddit | 11 comments
-
Can anyone share their experience on how a local LLM helps them in building software?
Posted by National-Fold-2375@reddit | LocalLLaMA | View on Reddit | 4 comments
-
Why are people on Reddit triggered about LLMs being smarter than humans?
Posted by aizvo@reddit | LocalLLaMA | View on Reddit | 101 comments
-
Looking for open source 10B model that is comparable to gpt4o-mini
Posted by bohemianLife1@reddit | LocalLLaMA | View on Reddit | 48 comments
-
To what degree to PCIe lanes x16 vs x4 or x1 matter in a multi-GPU setup for running LLMs?
Posted by fabkosta@reddit | LocalLLaMA | View on Reddit | 18 comments
-
NeKot - a terminal interface for interacting with local and cloud LLMs
Posted by Balanceballs@reddit | LocalLLaMA | View on Reddit | 25 comments
-
Gemma3 27 heretic, lower divergence than mlabonne/gemma3
Posted by coder3101@reddit | LocalLLaMA | View on Reddit | 16 comments
-
Local Suno just dropped
Posted by Different_Fix_2217@reddit | LocalLLaMA | View on Reddit | 100 comments
-
What's the best LLM Router right now, and why?
Posted by desexmachina@reddit | LocalLLaMA | View on Reddit | 56 comments
-
(very low effort) i designed a simple SSM head
Posted by smoothbrain_1947@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Best LLM router: comparison
Posted by GrandMoo1@reddit | LocalLLaMA | View on Reddit | 17 comments
-
Newbie Question about GPU choice
Posted by mundane_marietta@reddit | LocalLLaMA | View on Reddit | 11 comments
-
How the heck is Qwen3-Coder so fast? Nearly 10x other models.
Posted by CSEliot@reddit | LocalLLaMA | View on Reddit | 25 comments
-
ArliAI/gpt-oss-120b-Derestricted · Hugging Face
Posted by Arli_AI@reddit | LocalLLaMA | View on Reddit | 42 comments
-
Optimising NVIDIA’s DGX Spark (Grace + Blackwell) – 1.5× PyTorch speedup with custom build
Posted by guigsss@reddit | LocalLLaMA | View on Reddit | 28 comments
-
Docling, how does it work with VLM?
Posted by gevorgter@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Unlocked LM Studio Backends (v1.59.0): AVX1 & More Supported – Testers Wanted
Posted by TheSpicyBoi123@reddit | LocalLLaMA | View on Reddit | 11 comments
-
How do you choose your open-source LLM without having to test them all?
Posted by Holiday-Case-4524@reddit | LocalLLaMA | View on Reddit | 23 comments
-
Smart small llm for 8gb ram without censorship
Posted by Ok_Recognition9457@reddit | LocalLLaMA | View on Reddit | 8 comments
-
The official vLLM support for the Ryzen AI Max+ 395 is here! (the whole AI 300 series, ie gfx1150 and gfx1151)
Posted by waiting_for_zban@reddit | LocalLLaMA | View on Reddit | 12 comments
-
I'm the author of LocalAI (the local OpenAI-compatible API). We just released v3.7.0 with full Agentic Support (tool use!), Qwen 3 VL, and the latest llama.cpp
Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 17 comments
-
Watch as my Llama.cpp and FastAPI servers process requests from my Unity game
Posted by LandoRingel@reddit | LocalLLaMA | View on Reddit | 11 comments
-
How are you handling web crawling? Firecrawl is great, but I'm hitting limits.
Posted by Robertshee@reddit | LocalLLaMA | View on Reddit | 24 comments
-
Workflow comparison: Running Llama 3.2 locally with LangChain vs n8n. Why I stopped coding my agents.
Posted by jokiruiz@reddit | LocalLLaMA | View on Reddit | 0 comments
-
New llama.cpp options make MoE offloading trivial: `--n-cpu-moe`
Posted by Pristine-Woodpecker@reddit | LocalLLaMA | View on Reddit | 101 comments
-
Are any of you using local llms for "real" work?
Posted by hmsenterprise@reddit | LocalLLaMA | View on Reddit | 154 comments
-
Qwen3 Next imatrix GGUFs up!
Posted by noneabove1182@reddit | LocalLLaMA | View on Reddit | 41 comments
-
Yet another reason to stick with local models
Posted by nekofneko@reddit | LocalLLaMA | View on Reddit | 79 comments
-
How I replaced Gemini CLI & Copilot with a local stack using Ollama, Continue.dev and MCP servers
Posted by aaronsky@reddit | LocalLLaMA | View on Reddit | 18 comments
-
GPT2 using MLX
Posted by Disastrous-Maybe2501@reddit | LocalLLaMA | View on Reddit | 3 comments