what local llm model is the sweet spot for summarization and analysis (speed + accuracy)?
Posted by happyuser22@reddit | LocalLLaMA | View on Reddit | 12 comments
i have rtx 3090 (24gb)
PromptInjection_@reddit
Gemma 4 26B, Qwen 35B (IQ4_NL)
CATLLM@reddit
qwen3.5 27b q4
Monad_Maya@reddit
Qwen3.5 27B in my limited testing.
The MoE variant (35B) seems more prone to losing their marbles at very high context. The 27B is more coherent for me. Again, your experience will vary.
27B is dense, so it'll be slower.
If you need a MoE for speed then Qwen3.5 35B A3B. Gemma4 26B A4B might be ok once all the issues are sorted out.
Afraid-Pilot-9052@reddit
working on OpenClaw Desktop. Install OpenClaw on Mac or PC. One-click installer with setup wizard. anyway that's what i've been working on.
xeeff@reddit
who asked? genuinely who asked lil bro
Equal-Document4213@reddit
If you have data to fine tune, flan-t5 is an oldie but a goodie for summarization.
KorbenDullas@reddit
Gemma 4
happyuser22@reddit (OP)
what model?
KorbenDullas@reddit
https://lmstudio.ai/models/google/gemma-4-26b-a4b
OpenEvidence9680@reddit
I am using gemma-4-26b-a4b-heretic-apex-i-mini (about 13gb) because for my needs it did as good a job as the bigger quantizations at 15.5 and 20GB. Consider that I make it sumamrize chapter by chapter with chucking and rolling context, then make it produce a long summary and a short summary. Fast, clean and really good text comprehension for what I can see up to now.
ttkciar@reddit
Make sure you get the most recent llama.cpp and Google's fixed chat template (released today) and use Gemma-4-26B-A4B-it. It is quite fast and excellent at summarization and analysis.
happyuser22@reddit (OP)
thanks i will try it