MediaTek claims 1.58-bit BitNet support with Dimensity 9500 SoC

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 9 comments

Integrating the ninth-generation MediaTek NPU 990 with Generative AI Engine 2.0 doubles compute power and introduces BitNet 1.58-bit large model processing, reducing power consumption by up to 33%. Doubling its integer and floating-point computing capabilities, users benefit from 100% faster 3 billion parameter LLM output, 128K token long text processing, and the industry’s first 4k ultra-high-definition image generation; all while slashing power consumption at peak performance by 56%.

Anyone any idea which model(s) they could have tested this on?

[-]

No-Cattle4800@reddit

this is actually a big deal - 1.58-bit quantization isn’t trivial to run efficiently on-device. mediatek’s NPU 990 seems optimized for ultra-low precision AI, likely tested on smaller LLMs (\~3B). the real win is power efficiency without killing output quality

[-]

Proud_Trade63@reddit

that’s impressive, mediatek dimensity 9500 with npu 990 pushing bitnet and faster llm output while cutting power really shows how efficient on-device ai is becoming

[-]

Ok-Technology504@reddit

mediatek really said “why use power when you can just… not?” 😭 NPU 990 sounds like it’s running LLMs on a diet but still flexing 4K gen AI. BitNet 1.58-bit is basically doing more with less while Snapdragon fans still buffering on efficiency mode

[-]

NoidoDev@reddit

I'd like to know, if this will be available on a single board computer? I also want to know if competitors are doing the same thing. We need way more such specialized chips. Especially now, that DRAM prices go up.

[-]

Own-Key1782@reddit

basically means it can handle large AI models faster and cooler. dimensity 9500 is clearly built for the AI era, not just gaming or multitasking

[-]

wojciechm@reddit

It's probably because this architecture allows for computations directly on memory, which they also claim to implement in their latest SoC and that is also the way to minimize general power consumption for ML tasks.

[-]

crantob@reddit

I suspect there's a lot we can yet do architecturally, particularly with networks running at different timescales entirely, affecting the same weights.

[-]

LagOps91@reddit

i really hope we actually get MoE BitNets in the future. they would be a great fit for consumer hardware.

[-]

fnordonk@reddit

I don't have a guess on a model but I wonder if they're using Google's LiteRT which announced Mediatek NPU support https://github.com/google-ai-edge/LiteRT