Which model to summarize rss news articles

Posted by redblood252@reddit | LocalLLaMA | View on Reddit | 5 comments

I don’t know what nor how to test the quality of summaries of news articles. But I know I don’t need very large models. I’m looking preferably for something that uses low vram or cpu only but that is sufficient for this use case. I won’t need something complex either and only english.

[-]

cryyingboy@reddit

Tbh for summarization you dont need much, even a 7b model handles it fine. we run a routing layer that picks the cheapest model per task automatically and for stuff like summaries it almost never picks anything above 8b. went from spending like 40 bucks a day on api calls to under 9 with basically the same output quality. cpu only should be totally doable at that size.

[-]

redblood252@reddit (OP)

Care to share more about how to set up such a router locally?

[-]

Icy-Degree6161@reddit

For creative tasks I'd say go with one of the small gemma4's and you'll be fine. Take the time to tweak the system prompt until you are satisified with the style and lenght of the summaries. Even gemma3 did this well to be honest, if gemma4 is too heavy for your liking go for the previous gen...

[-]

SM8085@reddit

Try a small Gemma and Qwen and see which you prefer.

unsloth/gemma-4-E2B-it-GGUF is the smallest Gemma4.

unsloth/Qwen3.5-2B-GGUF or unsloth/Qwen3.5-4B-GGUF would be the smallest modern Qwens that I would personally use to summarize. 0.8B exists, but I get cautious with anything under 2B.

[-]

redblood252@reddit (OP)

Thanks. Did you try any for summarizing? How much did you quantize them?