Total-Resort-3120
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 33 comments
-
gemma-4-31B-it-DFlash has been released
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 33 comments
-
ZAYA1-8B: Frontier intelligence density.
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 26 comments
-
Ban phrases on llama.cpp with this script.
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 31 comments
-
Want your LLM to use the internet? Here's an MCP server for that.
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 7 comments
-
Let your LLM browse books locally so that it can write better stories.
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 1 comments
-
DFlash: Block Diffusion for Flash Speculative Decoding.
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 127 comments
-
DFlash: Block Diffusion for Flash Speculative Decoding.
Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 0 comments