-
PSA: Guide for Installing Flash Attention 2 on Windows
Posted by RokHere@reddit | LocalLLaMA | View on Reddit | 2 comments
-
DISTILLATION is so underrated. I spent an hour and got a neat improvement in accuracy while keeping the costs low
Posted by Ambitious_Anybody855@reddit | LocalLLaMA | View on Reddit | 19 comments
-
University of Hong Kong releases Dream 7B (Diffusion reasoning model). Highest performing open-source diffusion model to date. You can adjust the number of diffusion timesteps for speed vs accuracy
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 114 comments
-
Anyone with experience combining Nvidia system & mac over llama-rpc?
Posted by segmond@reddit | LocalLLaMA | View on Reddit | 6 comments
-
LMSYS (LMarena.ai) is highly susceptible to manipulation
Posted by Economy_Apple_4617@reddit | LocalLLaMA | View on Reddit | 0 comments
-
What are the best value, energy-efficient options with 48GB+ VRAM for AI inference?
Posted by PangurBanTheCat@reddit | LocalLLaMA | View on Reddit | 55 comments
-
The Candle Test - most LLMs fail to generalise at this simple task
Posted by Everlier@reddit | LocalLLaMA | View on Reddit | 174 comments
-
Just upgraded my RTX 3060 with 192GB of VRAM
Posted by Wrong_User_Logged@reddit | LocalLLaMA | View on Reddit | 81 comments
-
SGLang. Some problems, but significantly better performance compared to vLLM
Posted by Sadeghi85@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Real-Time Speech-to-Speech Chatbot: Whisper, Llama 3.1, Kokoro, and Silero VAD 🚀
Posted by martian7r@reddit | LocalLLaMA | View on Reddit | 31 comments
-
Are there official (from Google) quantized versions of Gemma 3?
Posted by lostmsu@reddit | LocalLLaMA | View on Reddit | 8 comments
-
MacBook M3, 24GB ram. What's best for LLM engine?
Posted by Familyinalicante@reddit | LocalLLaMA | View on Reddit | 42 comments
-
MN-GRAND-Gutenburg-Lyra4-Lyra-23.5B - Long Form Output / NON "AI" prose.
Posted by Dangerous_Fix_5526@reddit | LocalLLaMA | View on Reddit | 28 comments
-
Mac Studio M3 Ultra 512GB DeepSeek V3-0324 IQ2_XXS (2.0625 bpw) llamacpp performance
Posted by WhereIsYourMind@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Best bang for the buck GPU
Posted by Ok-Cucumber-7217@reddit | LocalLLaMA | View on Reddit | 85 comments
-
MacBook M4 Max isn't great for LLMs
Posted by val_in_tech@reddit | LocalLLaMA | View on Reddit | 247 comments
-
Sharing HallOumi-8B, an open-source hallucination detector usable with any LLM!
Posted by jeremy_oumi@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Promox or Native Ubuntu
Posted by pipaman@reddit | LocalLLaMA | View on Reddit | 14 comments
-
ClaudePlaysPokemon Open Sourced - Benchmark AI by letting it play Pokémon
Posted by MaruluVR@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Dual RTX 3060 Setup for Deep Learning – PCIe x8/x4 Concerns?
Posted by Ahmedsaed26@reddit | LocalLLaMA | View on Reddit | 1 comments
-
PayPal launches remote and local MCP servers
Posted by init0@reddit | LocalLLaMA | View on Reddit | 5 comments
-
Instructional Writeup: How to Make LLMs Reason Deep and Build Entire Projects
Posted by No-Mulberry6961@reddit | LocalLLaMA | View on Reddit | 22 comments
-
Is there any work towards an interactive manga translation tool?
Posted by Tmmrn@reddit | LocalLLaMA | View on Reddit | 2 comments
-
What is the best model for generating images?
Posted by rez45gt@reddit | LocalLLaMA | View on Reddit | 11 comments
-
LLM amateur with a multi-GPU question. How to optimize for speed?
Posted by William-Riker@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Smallest model capable of detecting profane/nsfw language?
Posted by ohcrap___fk@reddit | LocalLLaMA | View on Reddit | 60 comments
-
koboldcpp-1.87.1: Merged Qwen2.5VL support! :)
Posted by Snail_Inference@reddit | LocalLLaMA | View on Reddit | 4 comments
-
Why isn't the whole industry focusing on online-learning?
Posted by unraveleverything@reddit | LocalLLaMA | View on Reddit | 15 comments
-
LiveBench team just dropped a leaderboard for coding agent tools
Posted by ihexx@reddit | LocalLLaMA | View on Reddit | 55 comments
-
Qwen3 will be released in the second week of April
Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 77 comments
-
Anyone try 5090 yet
Posted by 4hometnumberonefan@reddit | LocalLLaMA | View on Reddit | 12 comments
-
PAI: your personal AI 100% local inspired by Google's Project Astra
Posted by Such_Advantage_6949@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Whats the fastest inference API provider (Not OpenAI)
Posted by Hungry-Connection645@reddit | LocalLLaMA | View on Reddit | 12 comments
-
An idea: an LLM trapped in the past
Posted by Vehnum@reddit | LocalLLaMA | View on Reddit | 49 comments
-
Nemotron-49B uses 70% less KV cache compare to source Llama-70B
Posted by Ok_Warning2146@reddit | LocalLLaMA | View on Reddit | 43 comments
-
How do I tell llama-stack where to install models?
Posted by ImpossibleBritches@reddit | LocalLLaMA | View on Reddit | 1 comments
-
You can now check if your Laptop/ Rig can run a GGUF directly from Hugging Face! 🤗
Posted by vaibhavs10@reddit | LocalLLaMA | View on Reddit | 64 comments
-
canvas for code and local model
Posted by sunole123@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Another coding model, Achieves strong performance on software engineering tasks, including 37.2% resolve rate on SWE-Bench Verified.
Posted by Ornery_Local_6814@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Does the new Jetson Orin Nano Super make sense for a home setup?
Posted by Initial-Image-1015@reddit | LocalLLaMA | View on Reddit | 84 comments
-
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Posted by ninjasaid13@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Best way to do Multi GPU
Posted by SalmonSoup15@reddit | LocalLLaMA | View on Reddit | 7 comments
-
How to do citations in local Web Search?
Posted by tilmx@reddit | LocalLLaMA | View on Reddit | 1 comments
-
How do I tell the 'llama' command where to install models?
Posted by ImpossibleBritches@reddit | LocalLLaMA | View on Reddit | 1 comments
-
DeepMind will delay sharing research to remain competitive
Posted by mayalihamur@reddit | LocalLLaMA | View on Reddit | 118 comments
-
I made a Grammarly alternative without clunky UI. Completely free with Gemini Nano (in-browser AI). Helps you with writing emails, articles, social media posts, etc.
Posted by WordyBug@reddit | LocalLLaMA | View on Reddit | 75 comments
-
Kyutai Labs finally release finetuning code for Moshi - We can now give it any voice we wish!
Posted by JawGBoi@reddit | LocalLLaMA | View on Reddit | 7 comments
-
SOTA 3d?
Posted by Charuru@reddit | LocalLLaMA | View on Reddit | 11 comments
-
Thinking about running dual 4060TIs 16gb. But is there a way to limit power on linux? Am I going to sweat myself to death in the summer?
Posted by LanceThunder@reddit | LocalLLaMA | View on Reddit | 9 comments
-
Top reasoning LLMs failed horribly on USA Math Olympiad (maximum 5% score)
Posted by Kooky-Somewhere-2883@reddit | LocalLLaMA | View on Reddit | 211 comments