Total-Resort-3120

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 33 comments
gemma-4-31B-it-DFlash has been released

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 33 comments
ZAYA1-8B: Frontier intelligence density.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 26 comments
Ban phrases on llama.cpp with this script.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 31 comments
Want your LLM to use the internet? Here's an MCP server for that.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 7 comments
Let your LLM browse books locally so that it can write better stories.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 1 comments
DFlash: Block Diffusion for Flash Speculative Decoding.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 127 comments
DFlash: Block Diffusion for Flash Speculative Decoding.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 0 comments