TraceBack: A Novel Reverse Reasoning Model for Better and Cheaper Scaling of Synthetic Reasoning Generation

[-]

Pojiku@reddit

Nice! I trained [Sovereign 72B](https://huggingface.co/BasiliskLabs/Sovereign-0.1-72B) using the same strategy. This was before R1 was released, so it was using traces distilled from QwQ preview.

Reply

[-]

Nice, would love to know more :o what was the dataset like? On my end I'm doing instruction+solution as input, this is both for training and inference btw (output is always just reasoning trace that match the instruction and solution)

Reply

[-]

Pojiku@reddit

Yeah, same! instruction + solution as input, reasoning trace as output. I ran it against the HuggingFace "smoltalk" dataset to build the reason dataset for Sovereign.

Reply

[-]

XMasterrrr@reddit (OP)

I am posting the below on behalf of u/secemp9, the author of the model, as his Reddit account is only recently created and he could not post it himself.

Reply

[-]

secemp9@reddit

Appreciate it, thank you :)

Reply

[-]

segmond@reddit

So to understand, you provide the instruction, the solution then it generates the reasoning step that leads from the instruction to the solution?

Reply

[-]

secemp9@reddit

yep! that way we can then augment existing non-reasoning dataset as reasoning dataset, instead of directly using r1/o1/o3 for dataset generation, then use these for further distill/finetuning/training on other models

Reply

[-]

segmond@reddit

very nice, I'll play with it sometime this weekend, got 111gb of command-a to download next.

Reply

[-]

secemp9@reddit

cloud, 8xH100 :)

Reply

[-]

silenceimpaired@reddit

Have you looked at WIDGET - the six types of working genius and the idea of divergent and convergent thinking? It really feels like reasoning steps should use these two concepts for reasoning. Would be nice if you could get the reasoning traces to used a structured step by step (WIDGET process) that could be reused until there was evidence convergent thinking was ready or for a specific amount of reasoning attempts. Right now most reasoning / thinking blocks are quite chaotic with plenty of ‘Waits’ before it advances.

Reply

[-]

secemp9@reddit

I didn't, thanks for sharing - however I did plan on making another model that exhibit different style of reasoning yeah :) didn't do it yet

Reply

[-]

Thrumpwart@reddit

This is fascinating. Looking forward to a GGUF and/or MLX version.

Reply

[-]

secemp9@reddit

Thank you! technically this one is at 4bit, and should only use 8GB\~ of vram/ram I think. I did quantized training so it took a bit more time, but next version, I plan on doing full precision training, then do quantization after the fact

Reply

[-]

secemp9@reddit

Hi, I'm the author of TraceBack, a novel way to generate reasoning data from non-reasoning datasets/models. I kept thinking how to better scale things when it comes to generating training data for reasoning, and since I kept seeing people depending on r1/o1/o3/grok3, I thought we could do better. This is undertrained (2 epochs), wit only 200k samples but it already exhibit decent reasoning trace, but can be improved a lot once this is scaled with more data and epochs I'm still in the process of making an eval and will soon release that too - the dataset I used for this can be found here: [https://huggingface.co/datasets/secemp9/instruction\_solution\_thought](https://huggingface.co/datasets/secemp9/instruction_solution_thought) Any question/criticism are welcome

Reply

[-]

HunterVacui@reddit

Can you elaborate on what exactly this model does differently? The training data appears to be based on three open source datasets. Did you massage or alter that data in some way?

Reply

[-]

secemp9@reddit

yeah, I merged them using the format I used for training the model, which is: instruction (prompt used as input to the model) + solution (output of the model): reasoning (this is the output of the model I trained) Goal was to make a model that can generate reasoning data from instruction+solution pair as input, which this achieve This is why I called it TraceBack, because you, as the name implies, get your (reasoning) trace, back from your data (non-reasoning data), so we can use this to generate reasoning dataset instead of depending on r1/o3/o1, etc

Reply

[-]

secemp9@reddit

Hi, I'm the author of this model. Any questions or criticism are welcome. For the reasoning behind this, feel free to check the model card or the release tweet: [https://x.com/secemp9/status/1898734347563278799?t=8Q1K1ihnHq0KU7IrCWrlaA&s=19](https://x.com/secemp9/status/1898734347563278799?t=8Q1K1ihnHq0KU7IrCWrlaA&s=19)

Reply

TraceBack: A Novel Reverse Reasoning Model for Better and Cheaper Scaling of Synthetic Reasoning Generation

Reply to Post

17 Comments

Pojiku@reddit

secemp9@reddit

Pojiku@reddit

XMasterrrr@reddit (OP)

secemp9@reddit

segmond@reddit

secemp9@reddit

segmond@reddit

secemp9@reddit

silenceimpaired@reddit

secemp9@reddit

Thrumpwart@reddit

secemp9@reddit

secemp9@reddit

HunterVacui@reddit

secemp9@reddit

secemp9@reddit