audioen
throw on the pile of thoughts to consider when pondering the ethics of Singularitarianism, data centers, etc.
Posted by hoodiemonster@reddit | collapse | View on Reddit | 18 comments
How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?
Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 132 comments
audioen@reddit
Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context
Posted by Sisuuu@reddit | LocalLLaMA | View on Reddit | 40 comments
audioen@reddit
qwen35: use post-norm hidden state for MTP by am17an · Pull Request #24025 · ggml-org/llama.cpp
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 21 comments
audioen@reddit
In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?
Posted by fragment_me@reddit | LocalLLaMA | View on Reddit | 19 comments
audioen@reddit
Would you consider getting an NVIDIA RTX Spark laptop?
Posted by gamblingapocalypse@reddit | LocalLLaMA | View on Reddit | 175 comments
audioen@reddit
Genuinely what do we do about the bot comments in this sub
Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 102 comments
audioen@reddit
DIY Local 2x DGX Spark cluster cooler with automatic temperature controlled fan.
Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 6 comments
audioen@reddit
What's this sub geebral opinion on quantisizing the KV cache
Posted by misanthrophiccunt@reddit | LocalLLaMA | View on Reddit | 91 comments
audioen@reddit
Flash Attention for llama.cpp on RDNA3: 47% less KV VRAM than Vulkan f16 K, KLD almost losselss on F16 K / q4_0 V. Part 1.
Posted by DrBearJ3w@reddit | LocalLLaMA | View on Reddit | 13 comments
audioen@reddit
Qwen3.6-27B Quantization Benchmark
Posted by bobaburger@reddit | LocalLLaMA | View on Reddit | 74 comments
audioen@reddit
Qwen3.6-27B Quantization Benchmark
Posted by bobaburger@reddit | LocalLLaMA | View on Reddit | 74 comments
audioen@reddit
Is he crazy to say that?
Posted by pmv143@reddit | LocalLLaMA | View on Reddit | 203 comments
audioen@reddit
Breaking the music supply constraint
Posted by entsnack@reddit | LocalLLaMA | View on Reddit | 317 comments
audioen@reddit
Breaking the music supply constraint
Posted by entsnack@reddit | LocalLLaMA | View on Reddit | 317 comments
audioen@reddit
I tested MTP on vLLM and llama.cpp for Gemma 4 & Qwen 3.6 — 3.34x faster inference, here are my findings RTX 6000 PRO.
Posted by FantasticNature7590@reddit | LocalLLaMA | View on Reddit | 24 comments
audioen@reddit
Qwen 3.6 27B overdoing it
Posted by WhatererBlah555@reddit | LocalLLaMA | View on Reddit | 68 comments
audioen@reddit
How do I make MTP work in llama-server?
Posted by Ok_Warning2146@reddit | LocalLLaMA | View on Reddit | 28 comments
audioen@reddit
Large Language Models Report Subjective Experience Under Self-Referential Processing
Posted by SrijSriv211@reddit | LocalLLaMA | View on Reddit | 12 comments
audioen@reddit
The Nouveau driver will finally support the NVIDIA GA100 in Linux 7.2
Posted by somerandomxander@reddit | linux | View on Reddit | 23 comments
audioen@reddit
QEMU is deciding to shift its AI policy, now allowing some AI/LLM-generated contributions
Posted by somerandomxander@reddit | linux | View on Reddit | 196 comments
audioen@reddit
QEMU is deciding to shift its AI policy, now allowing some AI/LLM-generated contributions
Posted by somerandomxander@reddit | linux | View on Reddit | 196 comments
audioen@reddit
Ubuntu 26.04 on DGX Spark
Posted by ArtisticHamster@reddit | LocalLLaMA | View on Reddit | 11 comments
audioen@reddit
I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong?
Posted by spaceman_@reddit | LocalLLaMA | View on Reddit | 29 comments
audioen@reddit
Q4_K_M is fine for chat and a trap for agents. Here is math mathing.
Posted by Napster3301@reddit | LocalLLaMA | View on Reddit | 55 comments
audioen@reddit
Okay 27B made me a believer
Posted by Forward_Jackfruit813@reddit | LocalLLaMA | View on Reddit | 148 comments
audioen@reddit
Okay 27B made me a believer
Posted by Forward_Jackfruit813@reddit | LocalLLaMA | View on Reddit | 148 comments
audioen@reddit
Llamacpp server : How do the -np and -c flags interact?
Posted by Doug_Fripon@reddit | LocalLLaMA | View on Reddit | 15 comments
audioen@reddit
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs.
Posted by fallingdowndizzyvr@reddit | LocalLLaMA | View on Reddit | 77 comments
audioen@reddit
Why not dynamic active parameters (and other questions for the knowledgeable)
Posted by mouseofcatofschrodi@reddit | LocalLLaMA | View on Reddit | 14 comments
audioen@reddit
OpenBMB presents the model BitCPM-CANN 1.58 bit
Posted by Illustrious-Swim9663@reddit | LocalLLaMA | View on Reddit | 30 comments
audioen@reddit
Qwen 3.6. struggling with German
Posted by xchris1337xy@reddit | LocalLLaMA | View on Reddit | 33 comments
audioen@reddit
Some tests with qwen3.6 27b + 35b a3b about MTP vs ngram-mod
Posted by mr_Owner@reddit | LocalLLaMA | View on Reddit | 19 comments
audioen@reddit
For the users who have add bad luck with QWEN 3.6 27B, and Gemma 4 31B. "Actually..wait..actually". Endless reasoning. Horrible output. I found a solution. rtx pro 6000.
Posted by Juulk9087@reddit | LocalLLaMA | View on Reddit | 41 comments
audioen@reddit
Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell
Posted by q-admin007@reddit | LocalLLaMA | View on Reddit | 62 comments
audioen@reddit
Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell
Posted by q-admin007@reddit | LocalLLaMA | View on Reddit | 62 comments
audioen@reddit
Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell
Posted by q-admin007@reddit | LocalLLaMA | View on Reddit | 62 comments
audioen@reddit
Do smaller quants silently break tool calls / JSON output?
Posted by Fun_Employment6042@reddit | LocalLLaMA | View on Reddit | 23 comments
audioen@reddit
Why might MTP be net negative for tool heavy agentic flows?
Posted by Substantial_Step_351@reddit | LocalLLaMA | View on Reddit | 13 comments
audioen@reddit
Lemonade v10.5.1: an MTP + ROCm 7.13 quick start for Strix Halo
Posted by jfowers_amd@reddit | LocalLLaMA | View on Reddit | 22 comments
audioen@reddit
MTP (Multi-Token Prediction): 2x Faster Token Generation on AMD Strix Halo & Radeon 9700 AI Pro
Posted by Intrepid_Rub_3566@reddit | LocalLLaMA | View on Reddit | 18 comments
audioen@reddit
I can't get Qwen3.6 27B to outperform Qwen-Coder-Next and I'm not sure why
Posted by Forward_Jackfruit813@reddit | LocalLLaMA | View on Reddit | 56 comments
audioen@reddit
I can't get Qwen3.6 27B to outperform Qwen-Coder-Next and I'm not sure why
Posted by Forward_Jackfruit813@reddit | LocalLLaMA | View on Reddit | 56 comments
audioen@reddit
I can't get Qwen3.6 27B to outperform Qwen-Coder-Next and I'm not sure why
Posted by Forward_Jackfruit813@reddit | LocalLLaMA | View on Reddit | 56 comments
audioen@reddit
How does Pi coding agent control Qwen's thinking verbosity? (Qwen 35B A3B, llama-server)
Posted by pilibitti@reddit | LocalLLaMA | View on Reddit | 28 comments
audioen@reddit
Developers who use local AI - Q4_0 vs Q8_0 KV quant?
Posted by Jorlen@reddit | LocalLLaMA | View on Reddit | 89 comments
audioen@reddit
Convert With MPT Support?
Posted by chibop1@reddit | LocalLLaMA | View on Reddit | 9 comments
audioen@reddit
What infrastructure systems would realistically fail first in a slow maintenance collapse?
Posted by Spark_Hank@reddit | collapse | View on Reddit | 67 comments
audioen@reddit
Qwen 3.6-27B Dense with MTP on Strix Halo Windows - Benchmarks
Posted by PromptInjection_@reddit | LocalLLaMA | View on Reddit | 19 comments
audioen@reddit
Very happy with Qwen 3.5 122B output. But is slowness expected?
Posted by breksyt@reddit | LocalLLaMA | View on Reddit | 45 comments