shing3232

MLA optimization with flashattention for llama.cpp,MLA + FA now only uses K-cache - 47% saving on KV-cache size

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 45 comments
New DeepseekV3 as well

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 11 comments
Deepsee launch new DSv3 as well

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 0 comments
Fine-tuning LLMs to 1.58bit: extreme quantization experiment

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 14 comments
Day 6: One More Thing, DeepSeek-V3/R1 Inference System Overview

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 42 comments
Deepseek-R1 and Deepseek-R1-zero repo is preparing to launch？

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 5 comments
Qwen2.5: A Party of Foundation Models!

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 234 comments
New paper noise_step: TRAINING IN 1.58B WITH NO GRADIENT MEMORY

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 14 comments
Deepseekv3 release base model

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 5 comments
Looks like deepseekv3 API is up

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 24 comments
Codeqwen 1.5 is out with GQA

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 13 comments
QWEN1.5 110B just out!

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 9 comments