Marcuss2

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)

Posted by acluk90@reddit | LocalLLaMA | View on Reddit | 94 comments

Marcuss2@reddit

I am quite skeptical of these quantifications, I think most of them "work" because most models are actually quite inefficient when it comes to storing information in KV Cache. I would like to see performance with Qwen3.5 and DeepSeek V4 architecture where information is stored much more densely.

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

Marcuss2@reddit

There is no way GPT-5.4 mini beats Kimi K2.6. From my experience that model just pain gets stuck in a loop. Something is off about this benchmark.

ZAYA1-8B: Frontier intelligence density, trained on AMD

Posted by carbocation@reddit | LocalLLaMA | View on Reddit | 108 comments

Kimi K2.6 vs DeepSeek V4 Pro

Posted by bigboyparpa@reddit | LocalLLaMA | View on Reddit | 38 comments

Marcuss2@reddit

Myself, I have tested DeepSeek V4 Flash and it is better than Kimi K2.5, as in it could do tasks Kimi K2.5 couldn't do. With Pro, I would wait for the actual release as this is a preview, but I will likely make V4 Flash a workhorse model.

ibm-granite/granite-4.1-8b · Hugging Face

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 35 comments

ibm-granite/granite-4.1-8b · Hugging Face

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 35 comments

ibm-granite/granite-4.1-8b · Hugging Face

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 35 comments

Kimi K2.6 Released (huggingface)

Posted by BiggestBau5@reddit | LocalLLaMA | View on Reddit | 277 comments

Llama4 108b $800 setup

Posted by kylerrr02@reddit | LocalLLaMA | View on Reddit | 13 comments

Experiment: Olmo 3 7B Instruct Q1_0

Posted by butlan@reddit | LocalLLaMA | View on Reddit | 47 comments

It looks like there are no plans for smaller GLM models

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 128 comments

It looks like there are no plans for smaller GLM models

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 128 comments

Announcement: Temporary LLM Content Ban

Posted by ChemicalRascal@reddit | programming | View on Reddit | 326 comments

PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs

Posted by brown2green@reddit | LocalLLaMA | View on Reddit | 182 comments

Marcuss2@reddit

Went trough the paper, their methodologies are somewhat questionable how they measure knowledge density. For example, we already quantize models to 4 bits, they tend to almost always take full bf16 weights for the other models. Also they measure intelligence per GB, but intelligence does not scale linearly, but logarithmically.

Kimi K2.6 will drop in the next 2 weeks, K3 is WIP and will be huge

Posted by No-Thought-4995@reddit | LocalLLaMA | View on Reddit | 68 comments

Introducing ARC-AGI-3

Posted by Complete-Sea6655@reddit | LocalLLaMA | View on Reddit | 100 comments

OpenCode source code audit: 7 external domains contacted, no privacy policy, 12 community PRs unmerged for 3+ months

Posted by Spotty_Weldah@reddit | LocalLLaMA | View on Reddit | 48 comments

Marcuss2@reddit

https://github.com/Kilo-Org/kilocode is right now built on top of opencode. I know they strip some of the telemetry stuff. I wonder how it compares.

OpenCode source code audit: 7 external domains contacted, no privacy policy, 12 community PRs unmerged for 3+ months

Posted by Spotty_Weldah@reddit | LocalLLaMA | View on Reddit | 48 comments

Total beginner here—Why is LM Studio making me do the "heavy lifting" manually?

Posted by Ofer1984@reddit | LocalLLaMA | View on Reddit | 121 comments

Marcuss2@reddit

Honestly, to get started, install `kilo` or `opencode`, open it as CLI and tell it what you need with the free models they provide.

Application code has dozens of static analyzers, SQL has almost nothing, here's what exists.

Posted by Anonymedemerde@reddit | programming | View on Reddit | 29 comments

Marcuss2@reddit

Actually, in the Rust world, for SQL server interactions, SQLX exists. Default behavior is that it connects to your SQL server and verifies queries up against it as well as type checks between SQL and Rust.

Qwen3.5 27B vs Devstral Small 2 - Next.js & Solidity (Hardhat)

Posted by Holiday_Purpose_3166@reddit | LocalLLaMA | View on Reddit | 43 comments

Qwen3.5 27B vs Devstral Small 2 - Next.js & Solidity (Hardhat)

Posted by Holiday_Purpose_3166@reddit | LocalLLaMA | View on Reddit | 43 comments

Marcuss2@reddit

Why are you running different quantizations? I would understand if you tried to match it size for size, but no, you are using far better quantization on a larger model.

24gb M4 Mac Mini vs 9070XT + 32gb system RAM. What to expect?

Posted by Soft-Distance-6571@reddit | LocalLLaMA | View on Reddit | 17 comments

Marcuss2@reddit

Absolutely you will. One of the main bottlenecks is memory bandwidth. At least when you offload some or all weights to system RAM.

Why Senior Engineers Let Bad Projects Fail

Posted by Ordinary_Leader_2971@reddit | programming | View on Reddit | 121 comments

D7VK 1.1 adds experimental Direct3D 6 support for classic PC games on Linux

Posted by RenatsMC@reddit | linux | View on Reddit | 18 comments

Marcuss2@reddit

As said in another comment. Mali and Adreno, they support OpenGL ES, but not full fat OpenGL. Android also requires Vulkan support, but not OpenGL support.

D7VK 1.1 adds experimental Direct3D 6 support for classic PC games on Linux

Posted by RenatsMC@reddit | linux | View on Reddit | 18 comments

Marcuss2@reddit

There might be games which work with one and not the other. Also, there are many chips which don't support OpenGL. Vulkan support is far more common.

NVIDIA Nemotron 3 Nano 30B A3B released

Posted by rerri@reddit | LocalLLaMA | View on Reddit | 96 comments

Aquif 3.5 Max 1205 (42B-A3B)

Posted by Holiday_Purpose_3166@reddit | LocalLLaMA | View on Reddit | 56 comments

Micron Announces Exit from Crucial Consumer Business

Posted by FullstackSensei@reddit | LocalLLaMA | View on Reddit | 190 comments

Marcuss2@reddit

I suspect there is more behind it, like OpenAI paying them to do this. They can literally get a lot more profit from it right now.

Qwen3 Next almost ready in llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 36 comments

Marcuss2@reddit

Kimi-Linear next. I do expect that one to be a lot faster as the linear part is very similar and MLA transformer is already implemented.

AMD Ryzen AI Max 395+ 256/512 GB Ram?

Posted by quantier@reddit | LocalLLaMA | View on Reddit | 91 comments

AMD Ryzen AI Max 395+ 256/512 GB Ram?

Posted by quantier@reddit | LocalLLaMA | View on Reddit | 91 comments

New Qwen models are unbearable

Posted by kevin_1994@reddit | LocalLLaMA | View on Reddit | 293 comments

MiniMax LLM head confirms: new model M2.1 coming soon

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 8 comments

Want to run claude like model on ~$10k budget. Please help me with the machine build. I don't want to spend on cloud.

Posted by LordSteinggard@reddit | LocalLLaMA | View on Reddit | 131 comments

Kimi Linear released

Posted by Badger-Purple@reddit | LocalLLaMA | View on Reddit | 65 comments

Marcuss2@reddit

Welch Labs made a video on MLA, comparing it to other approaches: https://www.youtube.com/watch?v=0VLAoVGf_74 TL;DR: MLA makes the model compress it's KV cache into a smaller space, this is actually more efficient and more performant than using GQA which most modern models use. Hence I expect MLA based transformer to be better than a "regular" one used today. Of course you can screw it up by having the space parameter too small, but I don't think this is the issue here.

Kimi Linear released

Posted by Badger-Purple@reddit | LocalLLaMA | View on Reddit | 65 comments

Kimi Linear released

Posted by Badger-Purple@reddit | LocalLLaMA | View on Reddit | 65 comments

Kimi Linear released

Posted by Badger-Purple@reddit | LocalLLaMA | View on Reddit | 65 comments

Kimi Linear released

Posted by Badger-Purple@reddit | LocalLLaMA | View on Reddit | 65 comments

The average codebase is now 50% dependencies — is this sustainable?

Posted by Legitimate_Sun1783@reddit | programming | View on Reddit | 288 comments

Sparse Adaptive Attention “MoE”: How I Solved OpenAI’s $650B Problem With a £700 GPU

Posted by EconomicConstipator@reddit | LocalLLaMA | View on Reddit | 107 comments

MoonshotAI/kimi-cli - CLI coding agent from MoonshotAI

Posted by nullmove@reddit | LocalLLaMA | View on Reddit | 7 comments

Qwen3 Next support in llama.cpp ready for review

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 51 comments

Granite 4.0 Language Models - a ibm-granite Collection

Posted by rerri@reddit | LocalLLaMA | View on Reddit | 264 comments

Elmo is providing

Posted by vladlearns@reddit | LocalLLaMA | View on Reddit | 163 comments

Elmo is providing

Posted by vladlearns@reddit | LocalLLaMA | View on Reddit | 163 comments

Elmo is providing

Posted by vladlearns@reddit | LocalLLaMA | View on Reddit | 163 comments

Falcon-H1 technical report release

Posted by JingweiZUO@reddit | LocalLLaMA | View on Reddit | 14 comments

Running an LLM on a PS Vita

Posted by ajunior7@reddit | LocalLLaMA | View on Reddit | 13 comments