New_Spray_7886
server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 40 comments
What is the smallest amount of RAM sufficient to run any available on HF GGUF LLM model locally?
Posted by alex20_202020@reddit | LocalLLaMA | View on Reddit | 36 comments
New_Spray_7886@reddit
Using Intel Arc Pro series, any thoughts ?
Posted by BikerBoyRoy123@reddit | LocalLLaMA | View on Reddit | 32 comments
New_Spray_7886@reddit
If human brains are equivalent to 100T param LLMs and current SOTA local models are 1-2T params (basically cat brains) are we going to hit an intelligence wall for local models soon?
Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 38 comments
New_Spray_7886@reddit
AMD Hipfire - a new inference engine optimized for AMD GPU's
Posted by Thrumpwart@reddit | LocalLLaMA | View on Reddit | 87 comments
New_Spray_7886@reddit
FINAL-Bench/Darwin-36B-Opus · Hugging Face
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 21 comments
New_Spray_7886@reddit
RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.
Posted by marlang@reddit | LocalLLaMA | View on Reddit | 152 comments
New_Spray_7886@reddit
RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.
Posted by marlang@reddit | LocalLLaMA | View on Reddit | 152 comments