Thinking with a smaller model to speed things up?

Posted by q-admin007@reddit | LocalLLaMA | View on Reddit | 10 comments

Question: can i do the thinking with a smaller model, like Gemma 4 4B, then use that as the prompt for Gemma 4 31B, to speed things up?

Has anyone done this and measure if it's worth it?