Chain-of-Thought reasoning on the next token prediction level?

Posted by Marha01@reddit | LocalLLaMA | View on Reddit | 2 comments

A single LLM pass ultimately outputs a single next token to be added to the prompt. It seems to me that the most logical CoT approach would be a reasoning system where we output the proposed next token, then run the LLM again, asking it to evaluate and justify the proposed choice, then proceed to either confirm the token choice and definitely add it to the prompt or produce an alternative "best next token" proposal (list of proposals?) and repeat the decision process again until the best next token choice is definitely confirmed.

Has anyone tried such a "token level" CoT reasoning approach?

[-]

owenwp@reddit

That would likely be counterproductive. The main reason that CoT improves LLM output is that it gives the LLM more inference passes with more computation, more "time to think". So, reducing the number of inference passes would reduce that "thinking time" and give it fewer opportunities to catch mistakes or formulate plans. Also, rejecting tokens and not including that chain of thought in its context window would prevent it from reasoning about its mistakes, because those mistakes would not be present in its input.

Marha01@reddit (OP)

So, reducing the number of inference passes would reduce that "thinking time" and give it fewer opportunities to catch mistakes or formulate plans.

Eh? My proposal does not reduce the inference passes, quite the contrary - for every token that is definitely added to the prompt, 2 or more passes will be done.

Also, rejecting tokens and not including that chain of thought in its context window would prevent it from reasoning about its mistakes, because those mistakes would not be present in its input.

It is probably not necessary to waste the context window on such information. The mistake was already fixed when another token was chosen and appended.