what local llm model is the sweet spot for summarization and analysis (speed + accuracy)?

Posted by happyuser22@reddit | LocalLLaMA | View on Reddit | 12 comments

i have rtx 3090 (24gb)

[-]

Monad_Maya@reddit

Qwen3.5 27B in my limited testing.

The MoE variant (35B) seems more prone to losing their marbles at very high context. The 27B is more coherent for me. Again, your experience will vary.

27B is dense, so it'll be slower.

If you need a MoE for speed then Qwen3.5 35B A3B. Gemma4 26B A4B might be ok once all the issues are sorted out.

[-]

Afraid-Pilot-9052@reddit

working on OpenClaw Desktop. Install OpenClaw on Mac or PC. One-click installer with setup wizard. anyway that's what i've been working on.

[-]

xeeff@reddit

who asked? genuinely who asked lil bro

[-]

Equal-Document4213@reddit

If you have data to fine tune, flan-t5 is an oldie but a goodie for summarization.

[-]

KorbenDullas@reddit

https://lmstudio.ai/models/google/gemma-4-26b-a4b

[-]

I am using gemma-4-26b-a4b-heretic-apex-i-mini (about 13gb) because for my needs it did as good a job as the bigger quantizations at 15.5 and 20GB. Consider that I make it sumamrize chapter by chapter with chucking and rolling context, then make it produce a long summary and a short summary. Fast, clean and really good text comprehension for what I can see up to now.

[-]

ttkciar@reddit

Make sure you get the most recent llama.cpp and Google's fixed chat template (released today) and use Gemma-4-26B-A4B-it. It is quite fast and excellent at summarization and analysis.

[-]

happyuser22@reddit (OP)

thanks i will try it