jaMMint
Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer
Posted by One_Slip1455@reddit | LocalLLaMA | View on Reddit | 226 comments
Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer
Posted by One_Slip1455@reddit | LocalLLaMA | View on Reddit | 226 comments
jaMMint@reddit
local vibe coding
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 146 comments
jaMMint@reddit
Getting slow speeds with RTX 5090 and 64gb ram. Am I doing something wrong?
Posted by Virtual-Listen4507@reddit | LocalLLaMA | View on Reddit | 37 comments
jaMMint@reddit
Local programming vs cloud
Posted by Photo_Sad@reddit | LocalLLaMA | View on Reddit | 59 comments
jaMMint@reddit
Local programming vs cloud
Posted by Photo_Sad@reddit | LocalLLaMA | View on Reddit | 59 comments
jaMMint@reddit
Just got an RTX Pro 6000 - need recommendations for processing a massive dataset with instruction following
Posted by Sensitive_Sweet_1850@reddit | LocalLLaMA | View on Reddit | 39 comments
jaMMint@reddit
I made Soprano-80M: Stream ultra-realistic TTS in <15ms, up to 2000x realtime, and <1 GB VRAM, released under Apache 2.0!
Posted by eugenekwek@reddit | LocalLLaMA | View on Reddit | 108 comments
jaMMint@reddit
Gertrude, a 94-year-old widow, was heartbroken after her husband Harold died. She decided to end it all with his old Army pistol.
Posted by dinosaurer@reddit | Jokes | View on Reddit | 75 comments
jaMMint@reddit
I can't be the only one annoyed that AI agents never actually improve in production
Posted by GloomyEquipment2120@reddit | LocalLLaMA | View on Reddit | 12 comments
jaMMint@reddit
Most Economical Way to Run GPT-OSS-120B for ~10 Users
Posted by theSavviestTechDude@reddit | LocalLLaMA | View on Reddit | 44 comments
jaMMint@reddit
[Looking for model suggestion] <=32GB reasoning model but strong with tool-calling?
Posted by ForsookComparison@reddit | LocalLLaMA | View on Reddit | 20 comments
jaMMint@reddit
Local models handle tools way better when you give them a code sandbox instead of individual tools
Posted by juanviera23@reddit | LocalLLaMA | View on Reddit | 43 comments
jaMMint@reddit
How you get over 200 tok/s on full Kimi K2 Thinking (or any other big MoE Model) on cheapish hardware - llama.cpp dev pitch
Posted by _serby_@reddit | LocalLLaMA | View on Reddit | 35 comments
jaMMint@reddit
Dynamic LLM generated UI
Posted by ItzCrazyKns@reddit | LocalLLaMA | View on Reddit | 7 comments
jaMMint@reddit
How much VRAM needed for Qwen3-VL-235B-A22B
Posted by Ok_Television_9000@reddit | LocalLLaMA | View on Reddit | 11 comments
jaMMint@reddit
How much VRAM needed for Qwen3-VL-235B-A22B
Posted by Ok_Television_9000@reddit | LocalLLaMA | View on Reddit | 11 comments
jaMMint@reddit
Finishing touches on dual RTX 6000 build
Posted by ikkiyikki@reddit | LocalLLaMA | View on Reddit | 165 comments
jaMMint@reddit
Finishing touches on dual RTX 6000 build
Posted by ikkiyikki@reddit | LocalLLaMA | View on Reddit | 165 comments
jaMMint@reddit
Top small LLM as of September '25
Posted by _-inside-_@reddit | LocalLLaMA | View on Reddit | 40 comments
jaMMint@reddit
GPT-OSS 120B is unexpectedly fast on Strix Halo. Why?
Posted by RaltarGOTSP@reddit | LocalLLaMA | View on Reddit | 65 comments
jaMMint@reddit
GPT-OSS 120B is unexpectedly fast on Strix Halo. Why?
Posted by RaltarGOTSP@reddit | LocalLLaMA | View on Reddit | 65 comments
jaMMint@reddit
Apple M3 Ultra 512GB vs NVIDIA RTX 3090 LLM Benchmark
Posted by ifioravanti@reddit | LocalLLaMA | View on Reddit | 57 comments
jaMMint@reddit
How close can non big tech people get to ChatGPT and Claude speed locally? If you had $10k, how would you build infrastructure?
Posted by EducationalText9221@reddit | LocalLLaMA | View on Reddit | 156 comments
jaMMint@reddit
Building a RAG-based Bot with a large knowledge base.
Posted by champ_undisputed@reddit | LocalLLaMA | View on Reddit | 9 comments
jaMMint@reddit
New code benchmark puts Qwen 3 Coder at the top of the open models
Posted by mr_riptano@reddit | LocalLLaMA | View on Reddit | 103 comments
jaMMint@reddit
Testing qwen3-30b-a3b-q8_0 with my RTX Pro 6000 Blackwell MaxQ. Significant speed improvement. Around 120 t/s.
Posted by swagonflyyyy@reddit | LocalLLaMA | View on Reddit | 50 comments
jaMMint@reddit
From 4090 to 5090 to RTX PRO 6000… in record time
Posted by Fabix84@reddit | LocalLLaMA | View on Reddit | 259 comments
jaMMint@reddit
OpenAI gpt-oss-20b & 120 model performance on the RTX Pro 6000 Blackwell vs RTX 5090M
Posted by traderjay_toronto@reddit | LocalLLaMA | View on Reddit | 43 comments
jaMMint@reddit
OpenAI gpt-oss-20b & 120 model performance on the RTX Pro 6000 Blackwell vs RTX 5090M
Posted by traderjay_toronto@reddit | LocalLLaMA | View on Reddit | 43 comments
jaMMint@reddit
OpenAI gpt-oss-20b & 120 model performance on the RTX Pro 6000 Blackwell vs RTX 5090M
Posted by traderjay_toronto@reddit | LocalLLaMA | View on Reddit | 43 comments
jaMMint@reddit
Local LLM Deployment for 50 Users
Posted by NoobLLMDev@reddit | LocalLLaMA | View on Reddit | 56 comments
jaMMint@reddit
Looking for help with terrible vLLM performance
Posted by Render_Arcana@reddit | LocalLLaMA | View on Reddit | 32 comments
jaMMint@reddit
Looking for help with terrible vLLM performance
Posted by Render_Arcana@reddit | LocalLLaMA | View on Reddit | 32 comments
jaMMint@reddit
Is it just me or is Qwen3-235B is bad at coding ?
Posted by maayon@reddit | LocalLLaMA | View on Reddit | 18 comments
jaMMint@reddit
How we used NVIDIA TensorRT-LLM with Blackwell B200 to achieve 303 output tokens per second on DeepSeek R1
Posted by avianio@reddit | LocalLLaMA | View on Reddit | 16 comments
jaMMint@reddit
Most people are worried about LLM's executing code. Then theres me...... 😂
Posted by DataScientist305@reddit | LocalLLaMA | View on Reddit | 43 comments
jaMMint@reddit
I am considering buying a Mac Studio for running local LLMs. Going for maximum RAM but does the GPU core count make a difference that justifies the extra $1k?
Posted by mehyay76@reddit | LocalLLaMA | View on Reddit | 358 comments
jaMMint@reddit
I am considering buying a Mac Studio for running local LLMs. Going for maximum RAM but does the GPU core count make a difference that justifies the extra $1k?
Posted by mehyay76@reddit | LocalLLaMA | View on Reddit | 358 comments
jaMMint@reddit
I haven't seen many quad GPU setups so here is one
Posted by dazzou5ouh@reddit | LocalLLaMA | View on Reddit | 124 comments
jaMMint@reddit
I haven't seen many quad GPU setups so here is one
Posted by dazzou5ouh@reddit | LocalLLaMA | View on Reddit | 124 comments
jaMMint@reddit
I haven't seen many quad GPU setups so here is one
Posted by dazzou5ouh@reddit | LocalLLaMA | View on Reddit | 124 comments
jaMMint@reddit
o3-mini is now the SOTA coding model. It is truly something to behold. Procedural clouds in one-shot.
Posted by LocoMod@reddit | LocalLLaMA | View on Reddit | 228 comments
jaMMint@reddit
What's the Best Current Setup for Multi Document (10k+) Retrieval-Augmented Generation (RAG)? Need Accuracy and Citations
Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 13 comments
jaMMint@reddit
Running Deepseek on a CPU only cluster of machines?
Posted by jaMMint@reddit | LocalLLaMA | View on Reddit | 28 comments
jaMMint@reddit (OP)
Running Deepseek on a CPU only cluster of machines?
Posted by jaMMint@reddit | LocalLLaMA | View on Reddit | 28 comments
jaMMint@reddit (OP)
Running Deepseek on a CPU only cluster of machines?
Posted by jaMMint@reddit | LocalLLaMA | View on Reddit | 28 comments
jaMMint@reddit (OP)
Running Deepseek on a CPU only cluster of machines?
Posted by jaMMint@reddit | LocalLLaMA | View on Reddit | 28 comments
jaMMint@reddit (OP)
Mac Pro 2019 with DeepSeek R1
Posted by skipfish@reddit | LocalLLaMA | View on Reddit | 12 comments
jaMMint@reddit
Berkley AI research team claims to reproduce DeepSeek core technologies for $30
Posted by Slasher1738@reddit | LocalLLaMA | View on Reddit | 254 comments