fragment_me
In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 18 comments
In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 18 comments
fragment_me@reddit (OP)
In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 18 comments
fragment_me@reddit (OP)
I trusted random person on this subreddit and bought 3080 20gb made of chinesium
Posted by SwimmerJazzlike@reddit | LocalLLaMA | View on Reddit | 243 comments
fragment_me@reddit
I trusted random person on this subreddit and bought 3080 20gb made of chinesium
Posted by SwimmerJazzlike@reddit | LocalLLaMA | View on Reddit | 243 comments
fragment_me@reddit
I trusted random person on this subreddit and bought 3080 20gb made of chinesium
Posted by SwimmerJazzlike@reddit | LocalLLaMA | View on Reddit | 243 comments
fragment_me@reddit
I trusted random person on this subreddit and bought 3080 20gb made of chinesium
Posted by SwimmerJazzlike@reddit | LocalLLaMA | View on Reddit | 243 comments
fragment_me@reddit
I trusted random person on this subreddit and bought 3080 20gb made of chinesium
Posted by SwimmerJazzlike@reddit | LocalLLaMA | View on Reddit | 243 comments
fragment_me@reddit
Family member just passed away this morning , need a distraction. Any good 1b models you can suggest for layla ??
Posted by Opening-Ad6258@reddit | LocalLLaMA | View on Reddit | 14 comments
fragment_me@reddit
Family member just passed away this morning , need a distraction. Any good 1b models you can suggest for layla ??
Posted by Opening-Ad6258@reddit | LocalLLaMA | View on Reddit | 14 comments
fragment_me@reddit
Has anyone experimented with stabilizing low quant models with lower temp and top p?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 12 comments
fragment_me@reddit (OP)
Has anyone experimented with stabilizing low quant models with lower temp and top p?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 12 comments
fragment_me@reddit (OP)
Has anyone experimented with stabilizing low quant models with lower temp and top p?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 12 comments
fragment_me@reddit (OP)
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 24 comments
fragment_me@reddit (OP)
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 24 comments
fragment_me@reddit (OP)
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 24 comments
fragment_me@reddit (OP)
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 24 comments
fragment_me@reddit (OP)
I tested MTP on vLLM and llama.cpp for Gemma 4 & Qwen 3.6 — 3.34x faster inference, here are my findings RTX 6000 PRO.
Posted by FantasticNature7590@reddit | LocalLLaMA | View on Reddit | 24 comments
fragment_me@reddit
8GB 2017 MacBook Air breaks record with Quantum Processor help on tuning a 30B Qwen MoE model - Quantum 15,489% boost!
Posted by Overall-Importance54@reddit | LocalLLaMA | View on Reddit | 57 comments
fragment_me@reddit
Qwen3.6-27B Quantization Benchmark
Posted by bobaburger@reddit | LocalLLaMA | View on Reddit | 73 comments
fragment_me@reddit
Info: Nvidia Cuda 13.3 landed
Posted by parrot42@reddit | LocalLLaMA | View on Reddit | 47 comments
fragment_me@reddit
Info: Nvidia Cuda 13.3 landed
Posted by parrot42@reddit | LocalLLaMA | View on Reddit | 47 comments
fragment_me@reddit
SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More
Posted by CuriousPlatypus1881@reddit | LocalLLaMA | View on Reddit | 41 comments
fragment_me@reddit
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable)
Posted by Gailenstorm@reddit | LocalLLaMA | View on Reddit | 44 comments
fragment_me@reddit
Next year we're getting 0.5T model from Grok
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 200 comments
fragment_me@reddit
Tinygrad: Hacked 4090 driver to enable P2P
Posted by mrdevlar@reddit | LocalLLaMA | View on Reddit | 17 comments
fragment_me@reddit
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
AMD BC-250 and the search for Cheap Compute
Posted by dugganmania@reddit | LocalLLaMA | View on Reddit | 41 comments
fragment_me@reddit
I guess 4 units wasn’t enough.
Posted by Simple_Library_2700@reddit | LocalLLaMA | View on Reddit | 35 comments
fragment_me@reddit
Heretic has been served a legal notice by Meta, Inc.
Posted by -p-e-w-@reddit | LocalLLaMA | View on Reddit | 349 comments
fragment_me@reddit
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
Let’s talk quants of Gemma and Qwen - 16 vs Q8 vs Q4 - any experiences?
Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 93 comments
fragment_me@reddit
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
fragment_me@reddit (OP)
I hope that someday we will have a 124B Gemma.
Posted by cgs019283@reddit | LocalLLaMA | View on Reddit | 77 comments
fragment_me@reddit
Developers who use local AI - Q4_0 vs Q8_0 KV quant?
Posted by Jorlen@reddit | LocalLLaMA | View on Reddit | 89 comments
fragment_me@reddit
Developers who use local AI - Q4_0 vs Q8_0 KV quant?
Posted by Jorlen@reddit | LocalLLaMA | View on Reddit | 89 comments
fragment_me@reddit
"Elias Thorne" is what eight different LLMs name a lighthouse keeper. He's also selling cancer treatment advice on Amazon
Posted by prescorn@reddit | LocalLLaMA | View on Reddit | 54 comments
fragment_me@reddit
"Elias Thorne" is what eight different LLMs name a lighthouse keeper. He's also selling cancer treatment advice on Amazon
Posted by prescorn@reddit | LocalLLaMA | View on Reddit | 54 comments
fragment_me@reddit
MTP support merged into llama.cpp
Posted by tacticaltweaker@reddit | LocalLLaMA | View on Reddit | 108 comments
fragment_me@reddit
MTP support merged into llama.cpp
Posted by tacticaltweaker@reddit | LocalLLaMA | View on Reddit | 108 comments
fragment_me@reddit
MTP support merged into llama.cpp
Posted by tacticaltweaker@reddit | LocalLLaMA | View on Reddit | 108 comments
fragment_me@reddit
MTP support merged into llama.cpp
Posted by tacticaltweaker@reddit | LocalLLaMA | View on Reddit | 108 comments
fragment_me@reddit
MTP PR Merged!!!
Posted by Valuable_Touch5670@reddit | LocalLLaMA | View on Reddit | 101 comments
fragment_me@reddit
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 24 comments