ServiceNow-AI/SuperApriel-15B-Instruct · Hugging Face

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 6 comments

A 15B-parameter token-mixer supernet with 8 optimized deployment presets spanning 1.0× to 10.7× decode throughput at 32K sequence length, all from a single checkpoint. Derived from Apriel-1.6 through stochastic distillation and targeted supervised fine-tuning.

Model Size: 15B parameters
Layers: 48 decoder layers, each with 4 mixer variants
Context Length: 262K positions (runtime dependent)
Languages: English (best)

Highlights

Flexible deployment from a single checkpoint: multiple presets trading throughput for quality
Four mixer types per layer: Full Attention (FA), Sliding Window Attention (SWA), Gated DeltaNet (GDN), Kimi Delta Attention (KDA)
Instruction-tuned: targeted SFT with multiple Pareto-optimal placements
Speculative decoding support: use all-attention as target with efficient placements as drafts from the same checkpoint

[-]

nonerequired_@reddit

I didn’t hear that. What was this model good at?

zeth0s@reddit

It's from service now, it must be really good in scamming CIOs and CTOs, but miserable for real people. My educated guess

MmmmMorphine@reddit

Wow, 4 types of attention you can semi-arbitrarily set up as you please designed for different tasks.

This actually looks to be a pretty incredible step forward. Configure it with mostly recurrent (GDN/KDA) and sliding attention for say web research or mix full attention with recurrent for a qwen style setup. Or go all out with full attention everywhere for high intensity reasoning

Not to mention just exploring different attention combinations. Now I know what my little automated research setup will be working on for a few weeks

ServiceNow-AI/SuperApriel-15B-Instruct · Hugging Face

Highlights

nonerequired_@reddit

zeth0s@reddit

MmmmMorphine@reddit

nuclearbananana@reddit

Silver-Champion-4846@reddit

yarikfanarik@reddit