-
Better quantization: Yet Another Quantization Algorithm
Posted by tsengalb99@reddit | LocalLLaMA | View on Reddit | 34 comments
-
I built an app that turns your photos into smart packing lists — all on your iPhone, 100% private, no APIs, no data collection!
Posted by w-zhong@reddit | LocalLLaMA | View on Reddit | 48 comments
-
what's the case against flash attention?
Posted by Responsible-Crew1801@reddit | LocalLLaMA | View on Reddit | 23 comments
-
Is this the largest "No synthetic data" open weight LLM? (142B)
Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 28 comments
-
Hot Take: Gemini 2.5 Pro Makes Too Many Assumptions About Your Code
Posted by HideLord@reddit | LocalLLaMA | View on Reddit | 119 comments
-
Guys real question where llama 4 behemoth and thinking ??
Posted by Independent-Wind4462@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Sparse Transformers: Run 2x faster LLM with 30% lesser memory
Posted by Economy-Mud-6626@reddit | LocalLLaMA | View on Reddit | 66 comments
-
Is there appetite for hosting 3b/8b size models at an affordable rate?
Posted by No-Fig-8614@reddit | LocalLLaMA | View on Reddit | 19 comments
-
Do LLMs have opinions?
Posted by WeAllFuckingFucked@reddit | LocalLLaMA | View on Reddit | 31 comments
-
Git for Idiots (Broken down to Four Commands)
Posted by Consistent-Disk-7282@reddit | LocalLLaMA | View on Reddit | 11 comments
-
I built a platform that generates overviews of codebases and creates a map of the codebase dependencies
Posted by ComfortableArm121@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Stop over-engineering AI apps: just use Postgres
Posted by Worldly_Expression43@reddit | LocalLLaMA | View on Reddit | 65 comments
-
AI server help, duel k80s LocalAGI
Posted by JcorpTech@reddit | LocalLLaMA | View on Reddit | 7 comments
-
Pocketflow is now a workflow generator called Osly!! All you need to do is describe your idea
Posted by Weak_Birthday2735@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Hugging Face Just Dropped it's MCP Server
Posted by eternviking@reddit | LocalLLaMA | View on Reddit | 9 comments
-
llama-server is cooking! gemma3 27b, 100K context, vision on one 24GB GPU.
Posted by No-Statement-0001@reddit | LocalLLaMA | View on Reddit | 54 comments
-
Now I need to explain this to her...
Posted by XMasterrrr@reddit | LocalLLaMA | View on Reddit | 515 comments
-
China is leading open source
Posted by TheLogiqueViper@reddit | LocalLLaMA | View on Reddit | 292 comments
-
What are the top creative writing models ?
Posted by TheArchivist314@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Smallest llm that can help in text rearrangement
Posted by Away_Expression_3713@reddit | LocalLLaMA | View on Reddit | 4 comments
-
After court order, OpenAI is now preserving all ChatGPT and API logs
Posted by iGermanProd@reddit | LocalLLaMA | View on Reddit | 279 comments
-
MiniCPM4: 7x decoding speed than Qwen3-8B
Posted by Lynncc6@reddit | LocalLLaMA | View on Reddit | 24 comments
-
Current best model for technical documentation text generation for RAG / fine tuning?
Posted by OkAstronaut4911@reddit | LocalLLaMA | View on Reddit | 1 comments
-
MiniCPM4: Ultra-Efficient LLMs on End Devices
Posted by adefa@reddit | LocalLLaMA | View on Reddit | 7 comments
-
Even DeepSeek switched from OpenAI to Google
Posted by Utoko@reddit | LocalLLaMA | View on Reddit | 174 comments
-
So cool! Imagine if it was local. Any similar localLLM projects out there?
Posted by Own-Potential-2308@reddit | LocalLLaMA | View on Reddit | 1 comments
-
New embedding model "Qwen3-Embedding-0.6B-GGUF" just dropped.
Posted by Proto_Particle@reddit | LocalLLaMA | View on Reddit | 97 comments
-
MSI PC with NVIDIA GB10 Superchip - 6144 CUDA Cores and 128GB LPDDR5X Confirmed
Posted by shakhizat@reddit | LocalLLaMA | View on Reddit | 62 comments
-
Help with Proxmox + Debian + Docker /w Nvidia 5060TI
Posted by EarEquivalent3929@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Is there an video or article or book where a lot of real world datasets are used to train industry level LLM with all the code?
Posted by Happysedits@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Build LLM from Scratch | Mega Playlist of 43 videos
Posted by OtherRaisin3426@reddit | LocalLLaMA | View on Reddit | 10 comments
-
new Bielik models have been released
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 19 comments
-
AMA – I’ve built 7 commercial RAG projects. Got tired of copy-pasting boilerplate, so we open-sourced our internal stack.
Posted by Loud_Picture_1877@reddit | LocalLLaMA | View on Reddit | 98 comments
-
How does gemma3:4b-it-qat fare against OpenAI models on MMLU-Pro benchmark? Try for yourself in Excel
Posted by Kapperfar@reddit | LocalLLaMA | View on Reddit | 21 comments
-
It is possble to run non-reasoning deepseek-r1-0528?
Posted by relmny@reddit | LocalLLaMA | View on Reddit | 21 comments
-
Initial thoughts on Google Jules
Posted by maaakks@reddit | LocalLLaMA | View on Reddit | 59 comments
-
China's Xiaohongshu(Rednote) released its dots.llm open source AI model
Posted by Fun-Doctor6855@reddit | LocalLLaMA | View on Reddit | 124 comments
-
Real-time conversation with a character on your local machine
Posted by ResolveAmbitious9572@reddit | LocalLLaMA | View on Reddit | 33 comments
-
Terrible hindi translation, missing texts, paused timeline whisper ?
Posted by jadhavsaurabh@reddit | LocalLLaMA | View on Reddit | 1 comments
-
I created a totally free and local subtitle generator and renderer that works in browser!
Posted by Qunit-Essential@reddit | LocalLLaMA | View on Reddit | 53 comments
-
What is the best value card I could buy for decent performance?
Posted by equinoxel@reddit | LocalLLaMA | View on Reddit | 6 comments
-
The new king? M3 Ultra, 80 Core GPU, 512GB Memory
Posted by Hanthunius@reddit | LocalLLaMA | View on Reddit | 294 comments
-
Can a model be so radically altered that its origin can no longer be recognized? YES!
Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 29 comments
-
Multi modality is currently terrible in open source
Posted by Unusual_Guidance2095@reddit | LocalLLaMA | View on Reddit | 28 comments
-
Real-time conversational AI running 100% locally in-browser on WebGPU
Posted by xenovatech@reddit | LocalLLaMA | View on Reddit | 121 comments
-
How Fast can I run models.
Posted by feelin-lonely-1254@reddit | LocalLLaMA | View on Reddit | 3 comments
-
3b and 7b Serving with new Hardware
Posted by No-Fig-8614@reddit | LocalLLaMA | View on Reddit | 4 comments
-
New model - Qwen3 Embedding + Reranker
Posted by koc_Z3@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Cannot even run the smallest model on system RAM?
Posted by FloJak2004@reddit | LocalLLaMA | View on Reddit | 21 comments
-
Semantic routing and caching doesn't work - task specific LLMs (TLMs) ftw!
Posted by AdditionalWeb107@reddit | LocalLLaMA | View on Reddit | 9 comments