Mguyen's Comments | TheaterFire

What is the point of MoE models, beyond being faster?

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 135 comments

[-]

Mguyen@reddit

The evolution of LLMs naturally leads to MoE models. Of all the xB weights of a dense LLM, not all of them will be strongly activated. If you look into it, you will find that a large number of weights do not meaningfully contribute to each token (but this is different for each token). This is what's referred to as **activation sparsity** and it is a naturally emergent behavior. We've known about this for at least a few years. Interesting enough, an analog to this is the old saying that people only use 10% of the brain (the brain has different regions that are active at different moments in time. It's probably not actually 10% but what's important is that it's also a sparse activation) Trimming a model and optimizing it so that it ***knows*** which experts to send a token prediction to is the hard part. You need to balance it so that your chosen experts all get similar usage and that you're not trimming away parameters that are important. This would get similar results to a heavily quantized model in that it preserves the *parameters* that correspond to trained knowledge but that their weights are modified. The activations won't be *exactly* as trained.

Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU

Posted by Critical-Chef9211@reddit | LocalLLaMA | View on Reddit | 91 comments

[-]

Mguyen@reddit

Because OP is lying about it. It calls the rest of the work into question.

Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU

Posted by Critical-Chef9211@reddit | LocalLLaMA | View on Reddit | 91 comments

[-]

Mguyen@reddit

The repo was at least in part or in whole, written by Claude code. Anyone who has used it can tell.

American closed models vs Chinese open models is becoming a problem.

Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 622 comments

[-]

Mguyen@reddit

The distinction between the two is important: "this/these models are not open source, but open source models do exist" vs "Nothing is open source if you have to verify every possible output" ^This makes a blanket statement about all models that has flawed assumptions.

American closed models vs Chinese open models is becoming a problem.

Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 622 comments

[-]

Mguyen@reddit

It is incorrect to say that "nothing is open source".

American closed models vs Chinese open models is becoming a problem.

Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 622 comments

[-]

Mguyen@reddit

That's incorrect. The Chinese models are open weights. You get the model, free to modify as you choose. They are not open source, as in the source data used to create them is not open. You don't know what goes into them.

Full Claude Opus 4.6 System Prompt for your pleasure

Posted by frubberism@reddit | LocalLLaMA | View on Reddit | 56 comments

[-]

Mguyen@reddit

Prompts can be changed and improved upon. Once you bake it into a model, the model is that way forever unless you want to retrain the model "just to see" if it will be better. The prompt contains things like the date, tools, and other things that will always need to be prompted anyway.

NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model!

Posted by Difficult-Cap-7527@reddit | LocalLLaMA | View on Reddit | 188 comments

[-]

Mguyen@reddit

Better context scaling (less memory for more unquantized context) and much faster inference time on top of their fine tuning with new data. They've replaced most of the attention layers with a different system (MAMBA2). It's qwen with a partial new architecture that lets it run up to 6x faster on GPU. We'll have to see how much it speeds it up on RAM partial or full offload.

Glm 4.6 is out and it's going against claude 4.5

Posted by Independent-Wind4462@reddit | LocalLLaMA | View on Reddit | 47 comments

[-]

Mguyen@reddit

The first AIME 25 benchmark is a definite tell. They're comparing GLM 4.6 using python and other tools vs the other models with no tool calling. I know for sure that's the Sonnet score with no tools... Most frontier models score in the high 90s or 100% on AIME 25 with tool calling. This alone calls into question the validity of the rest of the tests.

[Acc] Walker Ultimate Quad Connect w bluetooth $60 + tax (FDE, ODG, grey)

Posted by DonArgueWithMe@reddit | gundeals | View on Reddit | 53 comments

[-]

Mguyen@reddit

If you don't care about how slim the ear cups are or having great directional hearing then the 3M pro protects are way ahead of anything in that price range in terms of hearing protection. They're electronic and come with gel cups and Bluetooth at 26 NRR standard for about $75 regular price on Amazon. The double lined gel cups make them great for glasses or long hair. All the quality you'd expect from a company that makes PPE.