TheaterFire

TraceBack: A Novel Reverse Reasoning Model for Better and Cheaper Scaling of Synthetic Reasoning Generation

Posted by XMasterrrr@reddit | LocalLLaMA | View on Reddit | 17 comments

Reply to Post

17 Comments

Pojiku@reddit

Nice! I trained [Sovereign 72B](https://huggingface.co/BasiliskLabs/Sovereign-0.1-72B) using the same strategy. This was before R1 was released, so it was using traces distilled from QwQ preview.
View on Reddit #51036486

secemp9@reddit

Nice, would love to know more :o what was the dataset like? On my end I'm doing instruction+solution as input, this is both for training and inference btw (output is always just reasoning trace that match the instruction and solution)
View on Reddit #51037368

Pojiku@reddit

Yeah, same! instruction + solution as input, reasoning trace as output. I ran it against the HuggingFace "smoltalk" dataset to build the reason dataset for Sovereign.
View on Reddit #51070211

XMasterrrr@reddit (OP)

I am posting the below on behalf of u/secemp9, the author of the model, as his Reddit account is only recently created and he could not post it himself.
View on Reddit #51019672

secemp9@reddit

Appreciate it, thank you :)
View on Reddit #51019732

segmond@reddit

So to understand, you provide the instruction, the solution then it generates the reasoning step that leads from the instruction to the solution?
View on Reddit #51024321

secemp9@reddit

yep! that way we can then augment existing non-reasoning dataset as reasoning dataset, instead of directly using r1/o1/o3 for dataset generation, then use these for further distill/finetuning/training on other models
View on Reddit #51025865

segmond@reddit

very nice, I'll play with it sometime this weekend, got 111gb of command-a to download next.
View on Reddit #51031702

secemp9@reddit

cloud, 8xH100 :)
View on Reddit #51032658

silenceimpaired@reddit

Have you looked at WIDGET - the six types of working genius and the idea of divergent and convergent thinking? It really feels like reasoning steps should use these two concepts for reasoning. Would be nice if you could get the reasoning traces to used a structured step by step (WIDGET process) that could be reused until there was evidence convergent thinking was ready or for a specific amount of reasoning attempts. Right now most reasoning / thinking blocks are quite chaotic with plenty of ‘Waits’ before it advances.
View on Reddit #51020814

secemp9@reddit

I didn't, thanks for sharing - however I did plan on making another model that exhibit different style of reasoning yeah :) didn't do it yet
View on Reddit #51021077

Thrumpwart@reddit

This is fascinating. Looking forward to a GGUF and/or MLX version.
View on Reddit #51023862

secemp9@reddit

Thank you! technically this one is at 4bit, and should only use 8GB\~ of vram/ram I think. I did quantized training so it took a bit more time, but next version, I plan on doing full precision training, then do quantization after the fact
View on Reddit #51026296

secemp9@reddit

Hi, I'm the author of TraceBack, a novel way to generate reasoning data from non-reasoning datasets/models. I kept thinking how to better scale things when it comes to generating training data for reasoning, and since I kept seeing people depending on r1/o1/o3/grok3, I thought we could do better. This is undertrained (2 epochs), wit only 200k samples but it already exhibit decent reasoning trace, but can be improved a lot once this is scaled with more data and epochs I'm still in the process of making an eval and will soon release that too - the dataset I used for this can be found here: [https://huggingface.co/datasets/secemp9/instruction\_solution\_thought](https://huggingface.co/datasets/secemp9/instruction_solution_thought) Any question/criticism are welcome
View on Reddit #51020696

HunterVacui@reddit

Can you elaborate on what exactly this model does differently? The training data appears to be based on three open source datasets. Did you massage or alter that data in some way?
View on Reddit #51022953

secemp9@reddit

yeah, I merged them using the format I used for training the model, which is: instruction (prompt used as input to the model) + solution (output of the model): reasoning (this is the output of the model I trained) Goal was to make a model that can generate reasoning data from instruction+solution pair as input, which this achieve This is why I called it TraceBack, because you, as the name implies, get your (reasoning) trace, back from your data (non-reasoning data), so we can use this to generate reasoning dataset instead of depending on r1/o3/o1, etc
View on Reddit #51025778

secemp9@reddit

Hi, I'm the author of this model. Any questions or criticism are welcome. For the reasoning behind this, feel free to check the model card or the release tweet: [https://x.com/secemp9/status/1898734347563278799?t=8Q1K1ihnHq0KU7IrCWrlaA&s=19](https://x.com/secemp9/status/1898734347563278799?t=8Q1K1ihnHq0KU7IrCWrlaA&s=19)
View on Reddit #51019764