Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Posted by exact_constraint@reddit | LocalLLaMA | View on Reddit | 8 comments

Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems to provide a nice speed increase for working on the same code base.

Here's a baseline llama-bench test:

$: llama-bench-vulkan   -m 'Qwen3.6-27B-UD-Q4_K_XL.gguf' 
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium       |  16.39 GiB |    26.90 B | Vulkan     |  99 |           pp512 |       1050.13 ± 0.54 |
| qwen35 27B Q4_K - Medium       |  16.39 GiB |    26.90 B | Vulkan     |  99 |           tg128 |         31.26 ± 0.01 |

build: 97895129e (8863)

My llama-server run flags:

llama-server-vulkan   -m '/Qwen3.6-27B-UD-Q4_K_XL.gguf'   --mmproj '/mmproj-BF16(3).gguf'  -np 1 -ngl 99   --temp 0.6   --top-p 0.95   --top-k 20   --min-p 0.00 --presence_penalty 0.00 --jinja  --chat-template-kwargs '{"preserve_thinking": true}' -ub 2048 -fa 1 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 12 --draft-max 48 --host 0.0.0.0   --port 8180

Stats Summary:

--- Prompt Processing (PPS) Statistics ---
Mean:       549.60 t/s
Median:     519.19 t/s
P95:        936.60 t/s
StdDev:     240.80 (Stability)
Range:    64.18 - 1015.91 t/s

--- Token Generation (Tok/s) Statistics ---
Mean:        28.80 t/s
Median:      28.20 t/s
P95:         45.34 t/s
StdDev:       6.78 (Stability)
Range:    16.49 - 53.63   t/s

Total Tokens Generated: 87840
$:~/Documents/llama_perf$ python3 parse_performance_stats_full.py

== Prompt Processing (PPS) Analysis ==
Effective Avg:     549.60 t/s (Token-Weighted)
Median (P50):      519.19 t/s
Tail (P99):        958.31 t/s
Stability(CV):       43.8% (JITTERY)
Skewness:            0.04 (Symmetric)

== Token Generation (Tok/s) Analysis ==
Effective Avg:    1697.20 t/s (Token-Weighted)
Median (P50):       28.20 t/s
Tail (P99):         51.39 t/s
Stability(CV):       23.5% (JITTERY)
Skewness:            1.40 (Burst Heavy)
clarence@Claraence:~/Documents/llama_perf$

Raw data:

$:~/Documents/llama_perf$ python3 parse_performance_stats.py
Task ID    | PPS (Prompt)    | Tok/s (Gen)     | Gen Tokens
------------------------------------------------------------
7824       | 72.51           | 25.76           | 340       
8053       | 330.16          | 22.49           | 709       
8629       | 345.13          | 20.84           | 1820      
10286      | 64.18           | 28.11           | 181       
10372      | 309.37          | 19.31           | 123       
10496      | 360.21          | 27.07           | 891       
11071      | 345.78          | 34.59           | 1595      
11810      | 349.13          | 21.83           | 389       
12124      | 304.43          | 27.89           | 438       
12364      | 320.76          | 24.20           | 408       
12673      | 304.25          | 22.16           | 281       
12899      | 281.09          | 19.12           | 286       
13188      | 777.57          | 25.27           | 1428      
14644      | 970.67          | 30.00           | 231       
14863      | 834.32          | 32.17           | 98        
14944      | 651.29          | 35.26           | 90        
15012      | 690.06          | 28.15           | 98        
15101      | 706.03          | 30.84           | 97        
15177      | 678.13          | 39.51           | 100       
15243      | 695.42          | 28.46           | 85        
15330      | 347.35          | 27.75           | 83        
15404      | 527.11          | 28.71           | 79        
15485      | 495.88          | 28.83           | 73        
15552      | 757.88          | 28.85           | 70        
15610      | 754.61          | 27.08           | 106       
15716      | 343.11          | 30.13           | 82        
15784      | 597.03          | 28.51           | 77        
15848      | 724.77          | 25.24           | 91        
15932      | 612.62          | 40.13           | 87        
15986      | 603.72          | 28.13           | 125       
16105      | 545.72          | 27.96           | 105       
16212      | 140.18          | 30.04           | 53        
16256      | 518.56          | 27.60           | 1330      
17587      | 705.96          | 27.46           | 336       
1          | 891.36          | 27.73           | 1644      
1621       | 689.95          | 30.96           | 750       
2238       | 87.37           | 27.05           | 348       
2593       | 86.72           | 27.15           | 2003      
4593       | 86.10           | 27.07           | 161       
4728       | 431.04          | 26.33           | 178       
4900       | 86.53           | 28.26           | 112       
4987       | 87.27           | 27.09           | 161       
5129       | 346.48          | 28.73           | 104       
5214       | 426.83          | 37.51           | 147       
5295       | 369.10          | 27.33           | 74        
5371       | 258.20          | 27.12           | 172       
5545       | 82.23           | 28.34           | 83        
5619       | 78.99           | 39.80           | 163       
5711       | 342.33          | 25.94           | 103       
5814       | 557.16          | 27.15           | 92        
5908       | 82.57           | 24.07           | 112       
6011       | 655.56          | 16.87           | 255       
6250       | 538.12          | 16.73           | 259       
6509       | 226.40          | 19.07           | 78        
6572       | 380.42          | 17.08           | 84        
6650       | 369.20          | 17.92           | 176       
6805       | 542.54          | 19.01           | 133       
6917       | 508.31          | 17.65           | 711       
7567       | 592.44          | 21.26           | 113       
0          | 825.63          | 26.19           | 258       
265        | 570.25          | 26.75           | 170       
410        | 400.81          | 24.33           | 97        
501        | 495.63          | 25.28           | 153       
649        | 602.06          | 22.47           | 315       
871        | 317.47          | 16.50           | 746       
1616       | 75.78           | 16.49           | 105       
1717       | 458.49          | 16.79           | 111       
1830       | 135.83          | 16.80           | 347       
0          | 837.89          | 26.31           | 764       
794        | 651.57          | 24.01           | 116       
905        | 224.91          | 25.38           | 80        
969        | 551.64          | 29.70           | 81        
1029       | 547.99          | 24.96           | 89        
1118       | 545.28          | 25.38           | 86        
1187       | 596.21          | 25.20           | 81        
1267       | 387.68          | 25.03           | 83        
1342       | 526.17          | 25.98           | 616       
1960       | 795.61          | 23.57           | 177       
2169       | 518.94          | 24.00           | 75        
2245       | 487.28          | 28.62           | 84        
2307       | 519.44          | 26.36           | 218       
2506       | 83.51           | 25.92           | 184       
2674       | 317.34          | 25.31           | 101       
2756       | 491.71          | 25.41           | 690       
3424       | 540.33          | 33.60           | 184       
3529       | 511.05          | 28.57           | 106       
3601       | 523.09          | 27.26           | 471       
4014       | 518.84          | 25.74           | 251       
4238       | 82.16           | 23.83           | 163       
4401       | 338.39          | 46.13           | 83        
4437       | 324.35          | 23.52           | 126       
4560       | 248.12          | 25.89           | 81        
4634       | 443.34          | 24.78           | 182       
4804       | 463.62          | 28.23           | 83        
4872       | 438.71          | 31.26           | 635       
5352       | 504.33          | 22.47           | 96        
5439       | 277.02          | 25.48           | 179       
5596       | 506.73          | 39.77           | 179       
5687       | 493.95          | 23.50           | 69        
5757       | 523.45          | 25.08           | 110       
5869       | 105.32          | 23.02           | 67        
5938       | 200.24          | 24.93           | 316       
6256       | 555.49          | 45.34           | 175       
6327       | 466.26          | 24.61           | 262       
0          | 761.08          | 24.29           | 139       
160        | 505.55          | 22.34           | 117       
271        | 256.61          | 28.42           | 83        
322        | 426.93          | 30.01           | 97        
388        | 482.84          | 27.16           | 96        
463        | 494.38          | 24.48           | 1150      
1613       | 259.32          | 23.89           | 73        
1683       | 167.49          | 23.52           | 80        
1755       | 318.21          | 24.25           | 3084      
4834       | 318.37          | 22.71           | 88        
4909       | 451.91          | 24.01           | 160       
5051       | 429.60          | 24.10           | 112       
5144       | 426.04          | 24.11           | 1209      
6326       | 563.82          | 23.99           | 207       
6529       | 512.83          | 34.04           | 90        
6585       | 498.78          | 28.49           | 92        
6656       | 492.01          | 24.35           | 104       
6738       | 484.51          | 29.75           | 92        
6797       | 450.49          | 29.46           | 95        
6859       | 437.55          | 23.36           | 650       
7504       | 235.33          | 23.13           | 81        
7568       | 405.40          | 27.63           | 126       
7661       | 426.11          | 22.62           | 137       
7798       | 351.68          | 28.88           | 100       
7865       | 445.78          | 23.28           | 122       
7981       | 398.07          | 22.79           | 155       
8136       | 265.58          | 22.67           | 83        
8201       | 375.09          | 23.50           | 446       
8623       | 419.87          | 23.31           | 921       
9516       | 424.62          | 23.22           | 98        
9594       | 399.86          | 23.04           | 557       
10133      | 410.36          | 30.93           | 85        
10180      | 445.30          | 26.01           | 82        
10240      | 384.94          | 25.42           | 147       
10356      | 369.66          | 22.97           | 312       
10670      | 1011.00         | 29.40           | 153       
10819      | 735.71          | 30.75           | 65        
10877      | 912.32          | 28.97           | 92        
10969      | 829.14          | 28.24           | 132       
11108      | 710.79          | 28.56           | 94        
11195      | 694.49          | 29.13           | 129       
11313      | 440.72          | 28.87           | 67        
11373      | 736.58          | 43.25           | 100       
11431      | 278.92          | 28.97           | 89        
11513      | 564.79          | 30.91           | 97        
11585      | 464.87          | 32.45           | 93        
11659      | 605.83          | 28.62           | 63        
11715      | 727.11          | 28.05           | 180       
11879      | 643.30          | 30.79           | 126       
11985      | 665.26          | 29.20           | 149       
12111      | 492.23          | 27.98           | 72        
12176      | 695.06          | 26.40           | 164       
12340      | 558.65          | 26.57           | 2933      
15263      | 447.12          | 21.40           | 271       
15534      | 1015.91         | 30.65           | 87        
15619      | 923.95          | 30.58           | 1613      
17127      | 455.62          | 21.57           | 186       
17307      | 939.74          | 31.02           | 70        
17371      | 897.35          | 33.11           | 1213      
18401      | 450.77          | 23.31           | 694       
19047      | 939.26          | 30.94           | 71        
19112      | 921.63          | 29.57           | 1399      
20514      | 440.08          | 21.55           | 179       
20680      | 941.92          | 30.28           | 86        
20769      | 916.08          | 29.72           | 213       
20985      | 630.99          | 28.39           | 90        
21076      | 783.87          | 29.83           | 90        
21153      | 869.66          | 31.89           | 141       
21270      | 559.49          | 28.48           | 163       
21434      | 781.38          | 29.42           | 115       
21543      | 783.60          | 33.50           | 129       
21647      | 542.43          | 29.70           | 88        
21728      | 681.01          | 30.92           | 282       
21984      | 583.15          | 27.92           | 108       
22092      | 87.14           | 26.63           | 117       
22207      | 552.15          | 28.99           | 90        
22284      | 648.15          | 27.79           | 110       
22394      | 758.16          | 29.34           | 103       
22482      | 570.20          | 28.52           | 1171      
23655      | 449.73          | 22.45           | 191       
23840      | 913.13          | 30.05           | 102       
23944      | 924.18          | 29.36           | 249       
24198      | 797.90          | 30.26           | 76        
24266      | 859.60          | 28.60           | 155       
24419      | 613.57          | 29.71           | 87        
24498      | 696.11          | 34.20           | 105       
24578      | 654.08          | 29.09           | 107       
24678      | 601.79          | 29.27           | 96        
24759      | 667.10          | 28.99           | 116       
24868      | 700.61          | 34.60           | 110       
24952      | 722.68          | 27.95           | 2270      
27224      | 434.52          | 22.17           | 373       
27586      | 920.69          | 30.19           | 82        
27670      | 923.33          | 29.41           | 135       
27802      | 878.87          | 28.93           | 159       
27967      | 697.86          | 29.29           | 101       
28061      | 694.84          | 35.07           | 114       
28150      | 724.74          | 36.25           | 84        
28209      | 362.26          | 34.01           | 87        
28277      | 726.33          | 33.11           | 119       
28375      | 738.59          | 27.36           | 95        
28470      | 571.26          | 25.75           | 94        
28562      | 372.33          | 28.18           | 80        
28631      | 598.19          | 29.04           | 97        
28721      | 669.38          | 25.55           | 108       
28821      | 396.21          | 31.45           | 86        
28887      | 618.82          | 27.92           | 2077      
30958      | 429.42          | 22.30           | 405       
31356      | 916.46          | 30.26           | 75        
31433      | 897.39          | 36.61           | 949       
32154      | 417.12          | 34.14           | 398       
32348      | 940.13          | 30.26           | 71        
32421      | 921.72          | 46.64           | 1434      
33187      | 422.44          | 49.40           | 397       
33303      | 937.79          | 32.47           | 105       
33395      | 924.34          | 29.25           | 1684      
35077      | 418.33          | 48.17           | 421       
35215      | 928.92          | 30.81           | 78        
35287      | 906.27          | 29.21           | 2857      
38060      | 422.58          | 48.37           | 402       
38182      | 936.60          | 34.20           | 72        
38240      | 916.12          | 44.28           | 3143      
39949      | 421.28          | 44.29           | 415       
40073      | 939.96          | 30.25           | 75        
40150      | 905.92          | 40.91           | 1662      
41202      | 412.22          | 47.27           | 403       
41325      | 938.87          | 30.36           | 76        
41403      | 916.59          | 38.85           | 1532      
42476      | 399.14          | 48.52           | 402       
42586      | 938.19          | 34.64           | 74        
42645      | 915.96          | 32.35           | 1551      
43997      | 407.69          | 53.03           | 383       
44096      | 930.86          | 31.11           | 68        
44157      | 919.13          | 29.52           | 853       
45012      | 398.91          | 49.45           | 387       
45118      | 935.23          | 30.34           | 83        
45203      | 925.79          | 52.86           | 1615      
45981      | 396.90          | 48.34           | 390       
46092      | 936.96          | 30.29           | 88        
46182      | 915.64          | 53.63           | 2544

[-]

exact_constraint@reddit (OP)

UPDATE: Collected more data. I restarted llama-server and ran Qwen3.6 27B without the ngram flags, to get a 'baseline' real world performance profile. Obviously a little scant on datapoints (\~11k generated tokens vs \~80k for the nmap run). Performance was very consistent, and I had work to do lol. Ngram results are using the same flags as the OP.

I haven't done any testing on ngram-mod vs ngram-map-k yet - From some brief research, it *looks* like -mod is the preferred method, even when running single GPU inference. Not sure though. Either way, here's my janky runtime flags vs a baseline:

TLDR:

Baseline:
Effective Avg PPS:     300.93 t/s (Simple Mean)
Effective Avg TPS:      21.45 t/s (Token-Weighted Harmonic Mean)


Ngram Decoding:
Effective Avg PPS:     428.81 t/s (Simple Mean)
Effective Avg TPS:      27.82 t/s (Token-Weighted Harmonic Mean)

PPS Avg Diff: +42%
TPS Avg Diff: +29%

No Decoding Baseline - Real debugging session, \~125-150k context:

== Prompt Processing (PPS) Analysis ==
Effective Avg:     300.93 t/s (Simple Mean)
Median (P50):      327.92 t/s
Tail (P99):        834.93 t/s
Stability(CV):       55.8% (JITTERY)
Skewness:            1.49 (Burst Heavy)

  Distribution:
     70.49 -   162.29 | █████████████████████ (8)
    162.29 -   254.09 | ██████████ (4)
    254.09 -   345.90 | ████████████████████████████████████████ (15)
    345.90 -   437.70 | █████████████████████████████ (11)
    437.70 -   529.50 |  (0)
    529.50 -   621.30 | █████ (2)
    621.30 -   713.10 |  (0)
    713.10 -   804.91 |  (0)
    804.91 -   896.71 |  (0)
    896.71 -   988.51 | ██ (1)

== Token Generation (Tok/s) Analysis ==
Effective Avg:      21.45 t/s (Token-Weighted Harmonic Mean)
Median (P50):       20.95 t/s
Tail (P99):         29.84 t/s
Stability(CV):       10.9% (JITTERY)
Skewness:            3.01 (Burst Heavy)

  Distribution:
     20.00 -    20.50 | █████████████████████ (9)
     20.50 -    21.00 | ████████████████████████████ (12)
     21.00 -    21.50 | ████████████████████████████████████████ (17)
     21.50 -    22.00 |  (0)
     22.00 -    22.50 |  (0)
     22.50 -    23.00 |  (0)
     23.00 -    23.50 |  (0)
     23.50 -    24.00 |  (0)
     24.00 -    24.50 |  (0)
     24.50 -    25.00 |  (0)
     25.00 -    25.50 |  (0)
     25.50 -    26.00 |  (0)
     26.00 -    26.50 |  (0)
     26.50 -    27.00 |  (0)
     27.00 -    27.50 |  (0)
     27.50 -    28.00 |  (0)
     28.00 -    28.50 |  (0)
     28.50 -    29.00 |  (0)
     29.00 -    29.50 | ██ (1)
     29.50 -    30.00 | ██ (1)
     30.00 -    30.50 | ██ (1)

Ngram-mod decoding, \~125-150k context:

== Prompt Processing (PPS) Analysis ==
Effective Avg:     428.81 t/s (Simple Mean)
Median (P50):      418.15 t/s
Tail (P99):        986.06 t/s
Stability(CV):       43.3% (JITTERY)
Skewness:            0.55 (Burst Heavy)

  Distribution:
     72.28 -   164.37 | █████████ (19)
    164.37 -   256.45 | ██████████ (21)
    256.45 -   348.54 | ███████████████ (30)
    348.54 -   440.63 | ████████████████████████████████████████ (78)
    440.63 -   532.72 | ██████████████████████ (44)
    532.72 -   624.80 | ███████████ (23)
    624.80 -   716.89 | ██████████ (21)
    716.89 -   808.98 | ███ (6)
    808.98 -   901.06 | ██ (4)
    901.06 -   993.15 | ███ (6)

== Token Generation (Tok/s) Analysis ==
Effective Avg:      27.82 t/s (Token-Weighted Harmonic Mean)
Median (P50):       26.31 t/s
Tail (P99):        100.10 t/s
Stability(CV):       52.2% (JITTERY)
Skewness:            3.44 (Burst Heavy)

  Distribution:
     15.00 -    18.00 |  (0)
     18.00 -    18.50 |  (0)
     18.50 -    19.00 |  (0)
     19.00 -    19.50 | ██ (1)
     19.50 -    20.00 |  (0)
     20.00 -    20.50 | ████ (2)
     20.50 -    21.00 | ████ (2)
     21.00 -    21.50 | ██████ (3)
     21.50 -    22.00 | █████████████████ (8)
     22.00 -    22.50 | ██████████████████████████ (12)
     22.50 -    23.00 | ██████████████████████ (10)
     23.00 -    23.50 | ████████████████████████████████████████ (18)
     23.50 -    24.00 | ████████████████████████████████████████ (18)
     24.00 -    24.50 | ██████████████████████████ (12)
     24.50 -    25.00 | █████████████ (6)
     25.00 -    25.50 | ████████████████████████ (11)
     25.50 -    26.00 | ███████████████████████████████ (14)
     26.00 -    26.50 | ██████████████████████████ (12)
     26.50 -    27.00 | ████████████████████████ (11)
     27.00 -    27.50 | ████████████████████████████ (13)
     27.50 -    28.00 | █████████████ (6)
     28.00 -    28.50 | █████████████ (6)
     28.50 -    29.00 | █████████████ (6)
     29.00 -    29.50 | ███████████ (5)
     29.50 -    30.00 | █████████████████████████████████ (15)
     30.00 -    30.50 | ██████ (3)
     30.50 -    31.00 | ██████ (3)
     31.00 -    31.50 | ██████ (3)
     31.50 -    32.00 | █████████████ (6)
     32.00 -    32.50 | ████ (2)
     32.50 -    33.00 | ██ (1)
     33.00 -    33.50 | ██ (1)
     33.50 -    34.00 |  (0)
     34.00 -    34.50 | ██████ (3)
     34.50 -    35.00 |  (0)
     35.00 -    35.50 | ████ (2)
     35.50 -    36.00 | ██ (1)
     36.00 -    36.50 | ██ (1)
     36.50 -    37.00 | ████ (2)
     37.00 -    37.50 | ██ (1)
     37.50 -    38.00 | ██ (1)
     38.00 -    38.50 | ██████ (3)
     38.50 -    39.00 | ██ (1)
     39.00 -    39.50 | ██ (1)
     39.50 -    40.00 |  (0)
     40.00 -    40.50 |  (0)
     40.50 -    41.00 | ██ (1)
     41.00 -    41.50 |  (0)
     41.50 -    42.00 | ██ (1)
     42.00 -    47.00 | ████ (2)
     47.00 -    52.00 | ██████ (3)
     52.00 -    57.00 | ████ (2)
     57.00 -    62.00 | ████ (2)
     62.00 -    67.00 | ██ (1)
     67.00 -    72.00 | ████████ (4)
     72.00 -    77.00 |  (0)
     77.00 -    82.00 | ██████ (3)
     82.00 -    87.00 |  (0)
     87.00 -    92.00 |  (0)
     92.00 -    97.00 | ██ (1)
     97.00 -   102.00 | ████████ (4)
    102.00 -   107.00 |  (0)
    107.00 -   112.00 |  (0)
    112.00 -   117.00 |  (0)
    117.00 -   122.00 | ██ (1)
    122.00 -   127.00 | ██ (1)
    127.00 -   132.00 |  (0)

[-]

Finanzamt_Endgegner@reddit

In my testing with pi + qwen3 27b i used these parameters and was had a more consistent speed up (;

    --spec-type ngram-map-k
    --spec-ngram-size-n 16
    --draft-min 12
    --draft-max 48

[-]

Clear-Ad-9312@reddit

Which is funny because each ngram-* type has different performance uplift

Differences between ngram-simple, ngram-map and ngram-mod

- ngram-simple looks for a previous matching n-gram and inserts the following m-gram.
- ngram-map-k looks for a previous matching n-gram and inserts the following m-gram but uses an internal hash-map of n-grams in the current context window.
- ngram-mod uses a hash pool which is shared across all server slots. The hash pool is a map from n-gram hash to the next token (not the next m-gram as in ngram-map).

op used ngram-mod which looks to take advantage of parallel server slots?
but op used -np 1 which makes this worse than ngram-map-k that seems to be more specific for current single parallel context?

[-]

exact_constraint@reddit (OP)

!!! Thanks for this. Going to do some more testing based on this info. Last night was real fast and loose, just quickly scanned the llama.cpp PRs covering ngram and grabbed some runtime flags to test.

[-]

Middle_Bullfrog_6173@reddit

Could you explain the speed increase? From the stats it looks like same or slightly lower performance, unless I misunderstand the numbers.

[-]

exact_constraint@reddit (OP)

The main difference is in context size - The llama-bench result is basically absolute best case - Empty context, small prompt size. During actual use in OpenCode, running over 100k context, speeds tend to run in the low 20tok/sec range.

Qualitatively, so far it feels a little faster, too. The instances where tok/sec jumps over the baseline 30, into the 40-50 range, seem to be skewed to prompts where the agent is responding to a question about something it’s just done, or making small tweaks. Makes conversations a little snappier.

We’ll see how performance works out with a longer sample size.

[-]

CalligrapherFar7833@reddit

Show us a bench with 256k 128k context with ngram and without ?

[-]

Zc5Gwu@reddit

Same, I was kind of expecting a before and after. It's hard to tell what is being compared.