TheaterFire

Login with Reddit

Currently browsing tags:

  • LocalLLaMA
  • Have long context models solved attention dilution yet?

    Posted by yuch85@reddit | LocalLLaMA | View on Reddit | 10 comments

  • Any idea when RAM prices will be “normal”again?

    Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 213 comments

  • PrimeIntellect is actually awesome

    Posted by Icy_Gas8807@reddit | LocalLLaMA | View on Reddit | 18 comments

  • Switching from Ollama to llama-swap + llama.cpp on NixOS: why I finally made the jump after adding a second RTX 3090

    Posted by basnijholt@reddit | LocalLLaMA | View on Reddit | 1 comments

  • Trained a chess LLM locally that beats GPT-5 (technically)

    Posted by KingGongzilla@reddit | LocalLLaMA | View on Reddit | 34 comments

  • Is vLLM worth it?

    Posted by Smooth-Cow9084@reddit | LocalLLaMA | View on Reddit | 44 comments

  • Users of Qwen3-Next-80B-A3B-Instruct-GGUF, How is Performance & Benchmarks?

    Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 25 comments

  • Insert pauses into text file for kokoro

    Posted by dts-five@reddit | LocalLLaMA | View on Reddit | 24 comments

  • nvidia/Orchestrator-8B · Hugging Face

    Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 13 comments

  • Need advice upgrading an old gaming desktop with a 5090 for AI

    Posted by dtdisapointingresult@reddit | LocalLLaMA | View on Reddit | 9 comments

  • $6k AMD AI Build (2x R9700, 64GB VRAM) - Worth it for a beginner learning fine-tuning vs. Cloud?

    Posted by buenavista62@reddit | LocalLLaMA | View on Reddit | 30 comments

  • Optimizing Token Generation in llama.cpp's CUDA Backend

    Posted by am17an@reddit | LocalLLaMA | View on Reddit | 22 comments

  • I spent 2 years building privacy-first local AI. My conclusion: Ingestion is the bottleneck, not the Model. (Showcase: Ollama + Docling RAG Kit)

    Posted by ChapterEquivalent188@reddit | LocalLLaMA | View on Reddit | 2 comments

  • Recommendation for Production Hardware for inference and fine tuning.

    Posted by Whyme-__-@reddit | LocalLLaMA | View on Reddit | 8 comments

  • DGX Spark reproducing the benchmarks by NVIDIA for training

    Posted by khoka_x9@reddit | LocalLLaMA | View on Reddit | 2 comments

  • RAG of financial statements

    Posted by Less_Piccolo_6218@reddit | LocalLLaMA | View on Reddit | 1 comments

  • TOON is terrible, so I invented a new format (TRON) to prove a point

    Posted by No-Olive342@reddit | LocalLLaMA | View on Reddit | 74 comments

  • Recommendations for summarization and structured data extraction

    Posted by cachophonic@reddit | LocalLLaMA | View on Reddit | 10 comments

  • One Bottleneck After Another - First GPU & now RAM

    Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 10 comments

  • unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF · Hugging Face

    Posted by WhaleFactory@reddit | LocalLLaMA | View on Reddit | 110 comments

  • A 4B Model That Outperforms 32B on GUI Tasks, Fully Open-Source

    Posted by Successful-Bill-5543@reddit | LocalLLaMA | View on Reddit | 11 comments

  • Can anyone share their experience on how a local LLM helps them in building software?

    Posted by National-Fold-2375@reddit | LocalLLaMA | View on Reddit | 4 comments

  • Why are people on Reddit triggered about LLMs being smarter than humans?

    Posted by aizvo@reddit | LocalLLaMA | View on Reddit | 101 comments

  • Looking for open source 10B model that is comparable to gpt4o-mini

    Posted by bohemianLife1@reddit | LocalLLaMA | View on Reddit | 48 comments

  • To what degree to PCIe lanes x16 vs x4 or x1 matter in a multi-GPU setup for running LLMs?

    Posted by fabkosta@reddit | LocalLLaMA | View on Reddit | 18 comments

  • NeKot - a terminal interface for interacting with local and cloud LLMs

    Posted by Balanceballs@reddit | LocalLLaMA | View on Reddit | 25 comments

  • Gemma3 27 heretic, lower divergence than mlabonne/gemma3

    Posted by coder3101@reddit | LocalLLaMA | View on Reddit | 16 comments

  • Local Suno just dropped

    Posted by Different_Fix_2217@reddit | LocalLLaMA | View on Reddit | 100 comments

  • What's the best LLM Router right now, and why?

    Posted by desexmachina@reddit | LocalLLaMA | View on Reddit | 56 comments

  • (very low effort) i designed a simple SSM head

    Posted by smoothbrain_1947@reddit | LocalLLaMA | View on Reddit | 2 comments

  • Best LLM router: comparison

    Posted by GrandMoo1@reddit | LocalLLaMA | View on Reddit | 17 comments

  • Newbie Question about GPU choice

    Posted by mundane_marietta@reddit | LocalLLaMA | View on Reddit | 11 comments

  • How the heck is Qwen3-Coder so fast? Nearly 10x other models.

    Posted by CSEliot@reddit | LocalLLaMA | View on Reddit | 25 comments

  • ArliAI/gpt-oss-120b-Derestricted · Hugging Face

    Posted by Arli_AI@reddit | LocalLLaMA | View on Reddit | 42 comments

  • Optimising NVIDIA’s DGX Spark (Grace + Blackwell) – 1.5× PyTorch speedup with custom build

    Posted by guigsss@reddit | LocalLLaMA | View on Reddit | 28 comments

  • Docling, how does it work with VLM?

    Posted by gevorgter@reddit | LocalLLaMA | View on Reddit | 1 comments

  • Unlocked LM Studio Backends (v1.59.0): AVX1 & More Supported – Testers Wanted

    Posted by TheSpicyBoi123@reddit | LocalLLaMA | View on Reddit | 11 comments

  • How do you choose your open-source LLM without having to test them all?

    Posted by Holiday-Case-4524@reddit | LocalLLaMA | View on Reddit | 23 comments

  • Smart small llm for 8gb ram without censorship

    Posted by Ok_Recognition9457@reddit | LocalLLaMA | View on Reddit | 8 comments

  • The official vLLM support for the Ryzen AI Max+ 395 is here! (the whole AI 300 series, ie gfx1150 and gfx1151)

    Posted by waiting_for_zban@reddit | LocalLLaMA | View on Reddit | 12 comments

  • I'm the author of LocalAI (the local OpenAI-compatible API). We just released v3.7.0 with full Agentic Support (tool use!), Qwen 3 VL, and the latest llama.cpp

    Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 17 comments

  • Watch as my Llama.cpp and FastAPI servers process requests from my Unity game

    Posted by LandoRingel@reddit | LocalLLaMA | View on Reddit | 11 comments

  • How are you handling web crawling? Firecrawl is great, but I'm hitting limits.

    Posted by Robertshee@reddit | LocalLLaMA | View on Reddit | 24 comments

  • Workflow comparison: Running Llama 3.2 locally with LangChain vs n8n. Why I stopped coding my agents.

    Posted by jokiruiz@reddit | LocalLLaMA | View on Reddit | 0 comments

  • New llama.cpp options make MoE offloading trivial: `--n-cpu-moe`

    Posted by Pristine-Woodpecker@reddit | LocalLLaMA | View on Reddit | 101 comments

  • Are any of you using local llms for "real" work?

    Posted by hmsenterprise@reddit | LocalLLaMA | View on Reddit | 154 comments

  • Qwen3 Next imatrix GGUFs up!

    Posted by noneabove1182@reddit | LocalLLaMA | View on Reddit | 41 comments

  • Yet another reason to stick with local models

    Posted by nekofneko@reddit | LocalLLaMA | View on Reddit | 79 comments

  • How I replaced Gemini CLI & Copilot with a local stack using Ollama, Continue.dev and MCP servers

    Posted by aaronsky@reddit | LocalLLaMA | View on Reddit | 18 comments

  • GPT2 using MLX

    Posted by Disastrous-Maybe2501@reddit | LocalLLaMA | View on Reddit | 3 comments

Next