shing3232
-
MLA optimization with flashattention for llama.cpp,MLA + FA now only uses K-cache - 47% saving on KV-cache size
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 45 comments
-
New DeepseekV3 as well
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 11 comments
-
Deepsee launch new DSv3 as well
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Fine-tuning LLMs to 1.58bit: extreme quantization experiment
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Day 6: One More Thing, DeepSeek-V3/R1 Inference System Overview
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 42 comments
-
Deepseek-R1 and Deepseek-R1-zero repo is preparing to launch?
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 5 comments
-
Qwen2.5: A Party of Foundation Models!
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 234 comments
-
New paper noise_step: TRAINING IN 1.58B WITH NO GRADIENT MEMORY
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Deepseekv3 release base model
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 5 comments
-
Looks like deepseekv3 API is up
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 24 comments
-
Codeqwen 1.5 is out with GQA
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 13 comments
-
QWEN1.5 110B just out!
Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 9 comments