fragment_me
-
In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Has anyone experimented with stabilizing low quant models with lower temp and top p?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 24 comments
-
Decent deal on RTX 3080 20GB on ebay - $30 per GB
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 39 comments
-
Does anyone have a usable vLLM setup with Qwen3.6 27B + pipeline parallelism + MTP?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 34 comments
-
IK_LLAMA now supports Qwen3.5 MTP Support :O
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 37 comments
-
Has anyone here successfully extended Qwen3.5 or 3.6 context length paste 260k?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 11 comments
-
Interesting new model scoring strong on SWE bench - Multilingual-Multimodal-NLP/IndustrialCoder
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Is there a way to prioritize llama-cpp VRAM allocations to maximize local LLM usage alongside other apps?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Are there any plugin or all-in-one solutions for TTS interfacing with other local models?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Info on performance (accuracy) when context window reaches a certain size?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 2 comments