muxxington

brute-llama - A llama.cpp llama-server testbench

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 9 comments
Lads, time to recompile llama.cpp

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 56 comments
😭

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 9 comments
Solution for Qwen3-Coder-Next with llama.cpp/llama-server and Opencode tool calling issue

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 10 comments
Conclusion: Sesame has shown us a CSM. Then Sesame announced that it would publish... something. Sesame then released a TTS, which they obviously misleadingly and falsely called a CSM. Do I see that correctly?

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 106 comments
Poor mans x79 motherboard ETH79-X5

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 20 comments
:|

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 18 comments
There it is https://github.com/SesameAILabs/csm

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 76 comments
OpenAI compatible API for Flowise

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
Gottcha!

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 17 comments
Gottcha!

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
I managed to reduce Tesla P40 idle power consumption

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 40 comments
P40 still worth it?

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 16 comments
How to unlock cheap mining boards like the ETH79-X5 to support unsupported GPUs

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
How to unlock cheap mining boards like the ETH79-X5 to support unsupported GPUs

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
How to unlock cheap mining boards like the ETH79-X5 to support unsupported GPUs

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
Is there an aider equivalent for sysadmins?

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 8 comments
Improved handling of multiple Tesla P40/P100 with multiple llama.cpp instances

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 15 comments
gppm now manages your llama.cpp instances seamlessly with a touch of kubernetes ...besides saving 40 Watt of idle power per Tesla P40 or P100 GPU

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 5 comments
gppm now handles your llama.cpp instances with a touch of kubernetes ...beside safing 40 Watt of idle power per GPU

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
gppm now launches llama.cpp with Tesla P40 or P100 with a touch of kubernetes

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
gppm now launches llama.cpp with Tesla P40 or P100 with a touch of kubernetes

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
For those who run multiple llama.cpp instances sharing Tesla P40

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 0 comments
Question on FP32 FlashAttention in llama.cpp

Posted by muxxington@reddit | LocalLLaMA | View on Reddit | 8 comments