llama.cpp speculative checkpointing was merged

[-]

pj-frey@reddit

Does it work with vision (--mmproj set)?

[-]

pj-frey@reddit

I have simply tried it.

It is.

Although the effect is not really impressive. But might need to tweak the parameters. At least it is not slower :-) Will monitor it for a while.

[-]

andy2na@reddit

Are there any cases where it's faster, anything in the logs showing it is working?

[-]

pj-frey@reddit

common_speculative_is_compat: the target context does not support partial sequence removal srv load_model: speculative decoding will use checkpoints common_speculative_init: initialized ngram_mod with n=24, size=4194304 (16.000 MB)

I have not run a real session with repetitive patterns for now. There are statistic pattern after every call:

statistics ngram_mod: #calls(b,g,a) = 8 5451 12, #gen drafts = 12, #acc drafts = 12, #gen tokens = 758, #acc tokens = 363, dur(b,g,a) = 23.637, 2.203, 0.382 ms

[-]

andy2na@reddit

interesting, wonder if it would be beneficial to me since I have a lot of calls from Frigate to analyze images/clips

[-]

andy2na@reddit

u/nickm_27 worth trying this, especially with frigate. Logs when Frigate pulls a request from llama.cpp:

draft acceptance rate = 1.00000 (    6 accepted /     6 generated)
statistics ngram_mod: #calls(b,g,a) = 6 2718 2, #gen drafts = 2, #acc drafts = 2, #gen tokens = 126, #acc tokens = 7

The N-Gram cache successfully recognized a pattern in the Frigate output, instantly guessed the next sequence of words, and got a 100% perfect score. It injected those tokens directly into your output with zero generation time.

Config:

    cmd: >
      env CUDA_VISIBLE_DEVICES=0 /custom-bin/bin/llama-server
       --port ${PORT}
      --host 127.0.0.1
      --webui-mcp-proxy
      --model /models/qwen35/Qwen3.6-35B-A3B-IQ4_NL.gguf
      --mmproj /models/qwen35/qwen3.6-35b-mmproj-BF16.gguf
      --spec-type ngram-mod
      --spec-ngram-size-n 24
      --draft-min 48
      --draft-max 64
      --cache-type-k q8_0
      --cache-type-v q8_0
      --n-gpu-layers auto
      --split-mode none
      --main-gpu 0
      --threads 6
      --threads-batch 6
      --ctx-size 240640
      --image-min-tokens 1024
      --flash-attn on 
      --parallel 1 
      --jinja

[-]

nickm_27@reddit

I gave this a try but so far the results seem relatively minor improvement:

Request 1 (no drafts): 165 tokens / 2838 ms = 58.1 tok/s
Request 2 (11 accepted): 146 tokens / 2820 ms = 51.8 tok/s
Request 3 (24 accepted): 142 tokens / 2516 ms = 56.4 tok/s

[-]

andy2na@reddit

yeah, very minor, we have to wait for Dflash to see any noticeable improvement. Id test it on vLLM now, but its such a resource hog

https://x.com/zhijianliu_/status/2046352785000771674

[-]

nickm_27@reddit

yeah, that's more difficult though as it requires a smaller LLM to be run requiring more VRAM, where N-gram runs more lean

[-]

andy2na@reddit

yeah, looked into current solutions and all the AWQ models for Qwen are over 20gb, and with the DFlash model, it hits my 24gb limit :(

[-]

unbannedfornothing@reddit

It is now.

[-]

andy2na@reddit

where do you see that?

In that PR, it says "Drafts with mmproj are not supported in this PR."

[-]

trusty20@reddit

I see no change whatsoever in t/s for this regardless of what prompts I try with build 8846 (i.e super vanilla stuff like "Make a simple snake game with HTML and JS" or "How many planets are there in the solar system?" etc). Does this only apply on MoE or certain quants etc? Am I maybe missing something? Tried a few versions of the new cli flags but saw no difference.

[-]

oxygen_addiction@reddit

Same here. I'm getting a high acceptance rate, but speed seems about the same.

[-]

fragment_me@reddit

The ngram-mod seems to improve things, but using a draft model definitely slows it down. There's something going on with either the code or communication between the two components (draft output and model output) that is causing high latency. It doesn't make sense that we have high draft acceptance but low tok/s. Especially when both models are in VRAM.

[-]

OsmanthusBloom@reddit

It will only help in situations where the model has to echo back snippets of your prompt or other pieces of context it has seen. For example in code editing this happens a lot.

Try something like "Repeat after me:" and then a longer piece of text or code.

[-]

Jungle_Llama@reddit

I got it to revise a large chunk of code and saw no increase.

[-]

Jungle_Llama@reddit

updated to b8855, now I see the increase, about 30% in parts where it is working.

[-]

trusty20@reddit

So I just re-tested (this time with Qwen3.6 35b) using your suggestion, I gave it a 2 page documentation snippet along with the following prompt: "Can you extract all of the c code snippets from this document:"

It returns the snippets, then just to be sure I'm testing this fairly, I ask it:

"Can you give them all again?"

On both the first and second repeating previous code snippets output, I see no performance improvement whatsoever, in fact, performance drops a few t/s.

I'm using these flags: --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 8 --draft-max 64 --ctx-checkpoints 4

[-]

trusty20@reddit

Oh I guess that makes sense, but I guess I'm just confused because I heard there was some sort of speculative decoding (aka draft model) thing built into Qwen3.5 and Gemini vs the previous draft model focused approach, but this sounds more like a completely adjacent thing to re-use kv cache blocks?

I really am not an expert so I absolutely could be just wrong to have assumed that but that's what I went in expecting based on the conversation leading up to this.

[-]

Jungle_Llama@reddit

Same here. b8849. Whole model (unsloth/Qwen3.6-35B-A3B-UD-Q4_K_XL) in GPU, Vulkan

[-]

iportnov@reddit

Does this work with llama-cli?

[-]

Fresh-Resolution182@reddit

the 0-50% variance depending on task is the interesting part. ngram acceptance rate doing all the heavy lifting - curious what kills it outside coding

[-]

Fresh-Resolution182@reddit

the 0-50% variance depending on task is the interesting part. ngram acceptance rate doing all the heavy lifting - curious what kills it outside coding

[-]

Fresh-Resolution182@reddit

the acceptance variance makes sense once you realize ngram-mod is pattern matching on exact token sequences. boilerplate-heavy typescript/java hits the high end, one-off logic or reasoning chains will be near zero. still worth having on by default and letting it fall back

[-]

CodeMichaelD@reddit

works way better with latest MOE gemma than qwen, also causes slowdown instead if the entire model does not fit into vram, especially qwen. for some --spec-type restarting/reprocessing the entire context is required, otherwise new chat would not trigger optimizations.

[-]

CodeMichaelD@reddit

[-]

hedsht@reddit

using the same settings:

Generation throughput improved by about 28.0%

Prompt throughput improved by about 4.2%

[-]

rerri@reddit

This is an exciting one (DFlash):

https://github.com/ggml-org/llama.cpp/pull/22105

[-]

AppealSame4367@reddit

As far as i understood though, it quite some extra vram. Just like the vllm and transformers implementation?

[-]

UnknownLesson@reddit

Can you really run qwen3.6 35b with a 8 GB VRAM GPU?

[-]

AppealSame4367@reddit

Yes you can. It won't be superfast, but for context < 60000 you will get something between 200-800 tps prefill and 5-25 tps output.

I run it without thinking, it's still good enough. Depending on what you do, you might wanna reenable thinking.

adapt the values to your system:

#!/bin/bash

export GGML_CUDA_GRAPHS=0

./build/bin/llama-server \

-hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ2_M \

--no-mmproj \

--no-mmproj-offload \

-c 80000 \

-b 2048 \

-ub 2048 \

--prio 3 \

-fit on \

-np 1 \

-kvu \

--clear-idle \

--cont-batching \

--slot-save-path ./slots \

--port 8129 \

--host 0.0.0.0 \

--cache-ram 8184 \

--spec-type ngram-map-k4v \

--draft-max 32 \

--draft-min 5 \

--spec-ngram-size-n 4 \

--spec-ngram-min-hits 1 \

--mlock \

--no-mmap \

--cache-type-k q4_0 \

--cache-type-v q4_0 \

-t 6 \

--temp 1.0 \

--top-p 0.95 \

--top-k 20 \

--min-p 0.0 \

--presence_penalty 0.0 \

--repeat-penalty 1.0 \

--jinja \

--reasoning off

[-]

illforgetsoonenough@reddit

You can run it without any gpu. It will just be very slow

[-]

vincespeeed@reddit

I also have 6GB of VRAM, and I'm getting 22 tokens with these settings.
[⚡qwen3.6-35b-a3b]

model = F:/Programlar/LM Studio/.lmstudio/models/bartowski\Qwen3.6-35B-A3B\Qwen3.6-35B-A3B-UD-IQ4_NL.gguf

override-tensor = blk.[3-9].ffn.*exps=CPU,blk.[1-2][0-9].ffn.*exps=CPU,blk.3[0-6].ffn.*exps=CPU

spec-type = ngram-mod

spec-ngram-size-n = 24

draft-min = 4

draft-max = 32

n-gpu-layers = all

ctx-size = 60000

parallel = 1

threads = 10

batch-size = 128

ubatch-size = 128

mlock = true

cont-batching = true

flash-attn = true

sleep-idle-seconds = 600

temp = 1.0

top-k = 20

top-p = 0.95

min-p = 0.0

presence-penalty = 1.5

repeat-penalty = 1.10

cache-type-k = q8_0

cache-type-v = q8_0

[-]

P0pMan20@reddit

I reach 28-25 tps depending on context usage on my 3060 mobile with 6GB vram using these llama.cpp flags.
llama-server --jinja -c 20000 -m Qwen3.5-35B-A3B-Q4_K_S.gguf --temp 1 --top-p 0.95 --repeat-penalty 1.0 --top-k 20 --presence-penalty 1.5 --min-p 0 -fa on --fit on

[-]

AppealSame4367@reddit

Not sure about some params. 2060, 6gb vram, 32gb system ram.

At the beginning: 750 tps pp, 18 tps output
At \~ 50k context: 200 tps pp, \~2-3 tps output

I use qwopus 4b or gemma 4 e4b for exploration and plan the actual code fixes with 3.6 35b. Very new setup, so of course I also still use some cloud ai.

llama-server \

-hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ2_M \

--no-mmproj \

--no-mmproj-offload \

-c 80000 \

-b 2048 \

-ub 2048 \

--prio 3 \

-fit on \

-np 1 \

-kvu \

--clear-idle \

--cont-batching \

--slot-save-path ./slots \

--port 8129 \

--host 0.0.0.0 \

--cache-ram 8184 \

--spec-type ngram-map-k4v \

--draft-max 32 \

--draft-min 5 \

--spec-ngram-size-n 4 \

--spec-ngram-min-hits 1 \

--mlock \

--no-mmap \

--cache-type-k q4_0 \

--cache-type-v q4_0 \

-t 6 \

--temp 1.0 \

--top-p 0.95 \

--top-k 20 \

--min-p 0.0 \

--presence_penalty 0.0 \

--repeat-penalty 1.0 \

--jinja \

--reasoning off

[-]

OsmanthusBloom@reddit

Thanks for this, I was just wondering what parameters to use for qwen 3.6 35b on my 3060 mobile. I will try it soon, I've been too busy since it was released.

Based on my earlier experience I would also try:

-ctk q8_0 -ctv q8_0 (fit twice as long context in the same VRAM, basically free now with attn-rot) --fit-target 128 (use all VRAM, adjust up if you hit OOM or down if brave) -np 1 (if you need only one session at a time, saves VRAM) -ub 2048 (higher ubatch improves PP speed a lot but costs some VRAM)

[-]

ea_man@reddit

Maybe try with an 8B or 4B, even lower quants, reasoning disabled.

[-]

AppealSame4367@reddit

That's just not what I hoped for. But luckily llama cpp just published speculative checkpoints. Same as ngram, they work without speculative model. If they keep going into that direction, maybe I can run q3.6 on 20tps still.

[-]

xienze@reddit

That's just not what I hoped for.

There’s no free lunch in computing. Everything is a space/time trade off.

[-]

ea_man@reddit

Quoting the PR:

For MoE targets (gpt-oss-20b), DFlash speedup is generally smaller than for dense attention targets because more experts get activated during the parallel verification step than during single-token autoregressive decoding (same observation as in #18039 for gpt-oss EAGLE3).

----

I guess that is gold for omnicoder 2, which has its use. YMMV

[-]

SnooPaintings8639@reddit

I don't get one thing - where do we get drafting models for dflash? Do we have to hope sole labs will do proper training or distillation for free, for each model we use? How even does it work, to train a diffusion model on autoregreeisive model, to be usable as drafter?

[-]

rerri@reddit

Sole labs? Do you mean Z-labs?

The Speculators project has already added support for DFlash draft model training and a third party (RedHatAI) has released an preliminary model on HF for Gemma 4 31B.

https://github.com/vllm-project/speculators

https://huggingface.co/RedHatAI/gemma-4-31B-it-speculator.dflash

[-]

Far-Low-4705@reddit

my only gripe with speculative decoding is that it disables vision.

makes it unusable for my use case unfortunatley

[-]

D2OQZG8l5BI1S06@reddit

Speculative decoding is now compatible with mtmd contexts

https://github.com/ggml-org/llama.cpp/pull/19493#issuecomment-4269556794

[-]

Far-Low-4705@reddit

OMG LETS FUCHING GOOOO

no more painfully slow 27b, might actually be usable now

[-]

TheOnlyBen2@reddit

I am curious to know your use case ?

[-]

Far-Low-4705@reddit

engineering and coding

But i need vision for engineering.

[-]

ea_man@reddit

Holy shit that thing should do some 8x speed up for Omnicoder without reasoning for coding, tonight I'm gonna test that!

[-]

fragment_me@reddit

This means we can use self spec decoding on Qwen3.5 and 3.6!! Just add it to the params and watch the tokens go brrrrrrrrrr

[-]

ForsookComparison@reddit

How well does this work?

Does this branch enable regular spec dec with a second draft model?

[-]

fragment_me@reddit

I've used this with Gemma 4 quite a bit and it's basically free tokens. I haven't seen it ever be noticeable in terms of speed, but the stats are there and they show it's generating tokens.

I ran about 10 experiments earlier with Q3.5 27B and I found the following to be most useful in agentic coding (generated the most tokens):

--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 4 --draft-max 64

The docs state that lowering draft min and draft max is better for dense models. I think it depends a lot on your use case. I also am not sure on how the latency of drafting a min of 4-6 tokens impacts overall processing.

I also am not sure about the second draft, that's an even better question because that would provide much more meaningful speedup... I'm going to test that.

[-]

FatheredPuma81@reddit

I'm seeing [12860] draft acceptance rate = 1.00000 ( 105 accepted / 105 generated) all the time every single time with Ngram and a Draft model?

[-]

ForsookComparison@reddit

2B seems appropriate for 397B

I'm interested in 0.8B for 27B 👀

[-]

fragment_me@reddit

Results with 0.8B were not great. Draft acceptance was very high, but the overall tok/s were 50-80% of the normal generation. So that made no sense. This was with pipeline parallelism, so maybe I need to go back and try just single GPU. Although I don't see why it would have issues since it all fit in VRAM (2x 3090). I experimented with draft min and max but found no overall positive values. I also tried various temp and min p sizes.

I have to peel back some layers and try some more basic approach with less parameters.

Also, I have an idea of trying to put the draft model on a 3060 with 6GB RAM as dedicated to it. Need to cut some holes in the server case for the PCIe riser cable though.

[-]

ForsookComparison@reddit

You rock for doing all these tests.

It you're considering isolating one to a single GPU and testing that, what if you tested a version of 27B quantized enough to fit on one 3090 and dedicated the other for the draft model? It probably wouldn't be a long term usable setup but it might be a quick way to validate the "worth pursuing"-ness of this

[-]

fragment_me@reddit

Just tested 3 scenarios:

Q3.5 27B Q4 on 3090 #1 with Q3.5 0.8B draft on 3050 8GB, no improvement
Q3.5 27B Q4 on 3090 #1 with Q3.5 0.8B draft on 3090 #2, no improvement
1. Q3.5 27B Q4 on 3090 #1 with Q3.5 0.8B draft on 3090 #1 (both on same 3090, no improvement

[-]

FatheredPuma81@reddit

Did some simple non scientific singular tests with a fresh context:
Qwen3.5 27B: 43t/s across the board.
Qwen3.5 27B w/ 0.8B Q2_K_XL: Was basically 27t/s throughout.
Qwen3.5 27B w/ 0.8B Q4_K_XL: Started at 30t/s while reasoning and jumped to 50t/s by the end.
Qwen3.5 27B w/ Ngram-mod: Started 42.7t/s while reasoning and jumped to 48t/s by the end.
Qwen3.5/3.6 35B: Went from 130t/s to 60t/s no matter what.

Nothing I can do about run to run variance but imo I'd recommend using Ngram-mod on Qwen3.5 27B and that's it.

[-]

FatheredPuma81@reddit

Does this also add Draft Model support? Because I don't have the VRAM to use that T_T.

Ngram is great though 100% recommended everyone use it.

[-]

FatheredPuma81@reddit

It does but it sucks with 0.8B. Saw a token decrease with Qwen3.5 27B while reasoning and a token increase while writing code. I'd say that limits the use for that. Qwen3.5 and 3.6 35B both see their speed halved. It also says that draft token acceptance was 100% lol.

[-]

Due_Net_3342@reddit

I this is fine but I really want MTP working

[-]

ArtfulGenie69@reddit

There's still vllm hehe

[-]

FatheredPuma81@reddit

But I don't have the VRAM and Qwen3.6 27B isn't out!

[-]

emprahsFury@reddit

ah but you look at the files changed, https://github.com/ggml-org/llama.cpp/pull/19493/files

And once again, no documentation files were updated for a major feature release.

[-]

ParaboloidalCrest@reddit

Not really. Those are the original speculative decoding docs from 2 months ago.

[-]

RevolutionaryPick241@reddit

Do you know what the params mean?

[-]

MoneyPowerNexis@reddit

It means parameters. They are the values passed into a function in this case the main function of the relevant applications that are built when you compile llama.cpp. They are used to initialize the state of the program to tell it to use or not use features and how.

I'm a human who typed this with keratin tipped flesh appendages wrapped around a calcium endoskeleton.

[-]

SmartCustard9944@reddit

Write a banana bread recipe

[-]

MoneyPowerNexis@reddit

Sure thing I found this easy to follow instructional video: https://youtu.be/WlreNuiJ5KE

[-]

Momsbestboy@reddit

And as outlook - because there already was a thread about how disapointed people are about the new B70:

https://github.com/ggml-org/llama.cpp/pull/22066 - 17 to 50% speed up on SYCL

https://github.com/ggml-org/llama.cpp/pull/21845 - up to 50% speed up

https://github.com/ggml-org/llama.cpp/pull/21527 - another 50% speed up

So it is as I said: don't judge the B70 too early. It will take some weeks to improve the software and drivers, but for sure the current numbers are not the final ones.

[-]

Gesha24@reddit

I had all of them installed and build for SYCL. The benchmarks were decent-ish: 700 t/s for prompt processing and 50 t/s for generation when using Qwen3.6 -35B. Seems ok, right?

The real issue - once you start using them and get to the high context (like 100K), the performance drops to 50 t/s for pp and about 7 t/s for generation. And once you push it even higher - it just crashes.

I gave up on it, returned B70 and got 9700 Pro. Same 32G VRAM. Same llama-server (but built with ROCm support vs SYCL). Prompt processing - up to 1800t/s. Generation - up to 80t/s. However, once you reach 120K context it a) doesn't crash and b) still chugs along at 650 t/s pp and 50 t/s generation.

So in practical terms, some claude code calls were simply impossible with B70 and those that were possible - could take an hour. With 9700 the experience is certainly not as good as with sonnet, but it's certainly workable.

So extra $350 ($950 for B70 and $1300 for 9700) are more than worth it IMO.

[-]

Momsbestboy@reddit

$950 for B70 and $1300 for 9700 for me is not $350 but $450 - or a 36% higher price.

Plus: while the ROCm drivers seem to be out for a while, SYCL support in LLAMA still is more or less new. Otherwise, you couldn't add 50% more ts in a single change, but would have to go through multiple improvements with smaller improvement steps.

They are still working on the low-hanging fruits.

So, let's see what comes next.

[-]

fallingdowndizzyvr@reddit

$950 for B70 and $1300 for 9700 for me is not $350 but $450 - or a 36% higher price.

LOL. 1300 - 950 is...... $350.

SYCL support in LLAMA still is more or less new.

No. It definitely is not. I was trying it on my A770 a couple of years ago. It's been around as long as Vulkan has. It's just that not that many people use it. They use Vulkan. Because it's more performant.

[-]

Gesha24@reddit

Sometimes I wonder if there are people in here or just bots.
"$950 for B70 and $1300 for 9700 for me is not $350 but $450 - or a 36% higher price." - check your math, what is 1300-950?

However, the point was - even if the price is 36% higher, you get at least 300% more real life performance.

[-]

fallingdowndizzyvr@reddit

So it is as I said: don't judge the B70 too early. It will take some weeks to improve the software and drivers, but for sure the current numbers are not the final ones.

Ah.... you realize that's because SYCL was just slow. I've used both SYCL and Vulkan on my A770s and Vulkan consistently blows SYCL away. But even using Vulkan, my A770s hit well blow their weight.

[-]

TheBlueMatt@reddit

I mean also try the vulkan backend. Vulkan appears to still be faster on pp than SYCL even after some of the updates in those PRs. Might be worth optimizing tg in Vulkan more than fixing pp in SYCL.

[-]

andy2na@reddit

No mmproj support it seems 😔

[-]

ai_without_borders@reddit

the acceptance rate variance makes sense when you think about what ngram-mod is actually matching on. code heavy on boilerplate/repeated variable names (typescript/java enterprise patterns) should see the high end of 0-50%. one-off logic or reasoning chains will be near zero. the --spec-ngram-size-n 24 is aggressive - 24 tokens of context for pattern matching means waiting for very precise repetitions. might be worth experimenting with lower values (8-12) for mixed code/prose tasks to widen the matching window and get more hits, at the cost of slightly shorter draft runs

[-]

robertpro01@reddit

Is this for dflash model?

[-]

milkipedia@reddit

I'm hopeful this will speedup Gemma 4 31B for me, and make it usable

[-]

Beginning-Window-115@reddit

only on tasks that are repetitive otherwise no

[-]

iamapizza@reddit

Doesn't that rule out reasoning based tasks? Sorry if I'm misunderstanding.

[-]

AdamDhahabi@reddit (OP)

Let's say your using your LLM as a chatbot for coding, it gives you some requested changes and you ask to implement the proposed changes and return the full code (which exists somewhere in the context), now you'll see large speedup.

[-]

iamapizza@reddit

Aha understood thanks, so actually the speedups could occur in the middle of your workflow. I understand a lot better now cheers

[-]

iamapizza@reddit

I'm not seeing any speed ups. Does it depend on the amount of free VRAM? I'm running Qwen3.6-35B-A3B-UD-Q8_K_XL.gguf on RTX 5080.

      --threads 8
      --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 --presence_penalty 0.0 --repeat_penalty 1.0
      --fit on
      --fit-target 512
      --fit-ctx 262144
      --cache-type-k q8_0 --cache-type-v q8_0
      --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64
      --parallel 1
      --flash-attn on
      --no-mmap
      --chat-template-kwargs '{"preserve_thinking": true}' --reasoning-budget -1

[-]

iamapizza@reddit

Hm, I'm just not seeing any speed up. Did you try something specific? Or is it my setup... I'm using Qwen3.6-35B-A3B-UD-Q8_K_XL.gguf on RTX 5080


      --threads 8
      --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 --presence_penalty 0.0 --repeat_penalty 1.0
      --fit on
      --fit-target 512
      --fit-ctx 262144
      --cache-type-k q8_0 --cache-type-v q8_0
      --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64
      --parallel 1
      --flash-attn on
      --no-mmap
      --sleep-idle-seconds 300
      --chat-template-kwargs '{"preserve_thinking": true}' --reasoning-budget -1

[-]

AppealSame4367@reddit

Wonderful. Thx to all that contributed, I feel like Christmas every other day with llama cpp.

[-]

cviperr33@reddit

thanks for the post!