server, webui: support continue generation on reasoning models by ServeurpersoCom · Pull Request #22727 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 3 comments

now you can CONTINUE

Reply to Post

3 Comments

[-]

Chromix_@reddit

Finally, efficient parallel bulk generation with large input data (especially when paired with -kvu). If the context limit hits - just store the temporary result, retry later when more is free, instead of throwing it all away.

[-]

rerri@reddit

Can you also edit text within the thinking block? At some point this was not possible for some reason.

[-]

LegacyRemaster@reddit

very good news!