Nice! I trained [Sovereign 72B](https://huggingface.co/BasiliskLabs/Sovereign-0.1-72B) using the same strategy.
This was before R1 was released, so it was using traces distilled from QwQ preview.
Nice, would love to know more :o what was the dataset like? On my end I'm doing instruction+solution as input, this is both for training and inference btw (output is always just reasoning trace that match the instruction and solution)
Yeah, same! instruction + solution as input, reasoning trace as output.
I ran it against the HuggingFace "smoltalk" dataset to build the reason dataset for Sovereign.
I am posting the below on behalf of u/secemp9, the author of the model, as his Reddit account is only recently created and he could not post it himself.
yep! that way we can then augment existing non-reasoning dataset as reasoning dataset, instead of directly using r1/o1/o3 for dataset generation, then use these for further distill/finetuning/training on other models
Have you looked at WIDGET - the six types of working genius and the idea of divergent and convergent thinking? It really feels like reasoning steps should use these two concepts for reasoning. Would be nice if you could get the reasoning traces to used a structured step by step (WIDGET process) that could be reused until there was evidence convergent thinking was ready or for a specific amount of reasoning attempts. Right now most reasoning / thinking blocks are quite chaotic with plenty of ‘Waits’ before it advances.
Thank you! technically this one is at 4bit, and should only use 8GB\~ of vram/ram I think. I did quantized training so it took a bit more time, but next version, I plan on doing full precision training, then do quantization after the fact
Hi, I'm the author of TraceBack, a novel way to generate reasoning data from non-reasoning datasets/models.
I kept thinking how to better scale things when it comes to generating training data for reasoning, and since I kept seeing people depending on r1/o1/o3/grok3, I thought we could do better.
This is undertrained (2 epochs), wit only 200k samples but it already exhibit decent reasoning trace, but can be improved a lot once this is scaled with more data and epochs
I'm still in the process of making an eval and will soon release that too - the dataset I used for this can be found here: [https://huggingface.co/datasets/secemp9/instruction\_solution\_thought](https://huggingface.co/datasets/secemp9/instruction_solution_thought)
Any question/criticism are welcome
Can you elaborate on what exactly this model does differently? The training data appears to be based on three open source datasets. Did you massage or alter that data in some way?
yeah, I merged them using the format I used for training the model, which is:
instruction (prompt used as input to the model) + solution (output of the model): reasoning (this is the output of the model I trained)
Goal was to make a model that can generate reasoning data from instruction+solution pair as input, which this achieve
This is why I called it TraceBack, because you, as the name implies, get your (reasoning) trace, back from your data (non-reasoning data), so we can use this to generate reasoning dataset instead of depending on r1/o3/o1, etc
Hi, I'm the author of this model. Any questions or criticism are welcome. For the reasoning behind this, feel free to check the model card or the release tweet: [https://x.com/secemp9/status/1898734347563278799?t=8Q1K1ihnHq0KU7IrCWrlaA&s=19](https://x.com/secemp9/status/1898734347563278799?t=8Q1K1ihnHq0KU7IrCWrlaA&s=19)
17 Comments
Pojiku@reddit
secemp9@reddit
Pojiku@reddit
XMasterrrr@reddit (OP)
secemp9@reddit
segmond@reddit
secemp9@reddit
segmond@reddit
secemp9@reddit
silenceimpaired@reddit
secemp9@reddit
Thrumpwart@reddit
secemp9@reddit
secemp9@reddit
HunterVacui@reddit
secemp9@reddit
secemp9@reddit