Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Posted by East-Muffin-6472@reddit | LocalLLaMA | View on Reddit | 2 comments

So, here's an update to my GRPO training on length constrained reddit posts summarization on 3x Mac minis - a new direction!

Gist- been trying to test how good of a summarization model can be trained for summarization using exactly 64 tokens!

So, once all the t-test and evals were done for LFM2.5.-350M and Qwen2.5-0,5B-Instruct models with length penalty and quality metrics (given below), I realized after looking at the results of the quality metrics and saw that BLEU and ROUGE-L were particularly low when trained from scratch.

I hypothesized its because of the length penalty that I added so that it outputs ex ally 64 tokens but also being penalized from the rest variation of length penalty from ROUGE-L and BLEU (brevity penalty for eg).

Well, I had a faint idea to circumvent this issue that is what if I used an already fine tuned version who outputs exactly 64 tokens? But the idea was like a flash, like zoooom and puff gone!

That is when a Redditor pointed it out and I was like "hmm well I already have a checkpoint with only length penalty added!"

Now here I could have just SFT'ed as some of you may be thinking to fine tune the model to output just the read number of token and yes that's next experiment along with DPO comparison !

So, currently, have been training LFM2.5-350M and Qwen2.5-0.5B-Instruct for the same!

Eval:

LLM-as-a-Judge (gpt-5)

Used DeepEval to build a judge pipeline scoring each summary on 4 axes:

Distributed Training Setup:

3x Mac Minis in a cluster running MLX.

One node drives training using GRPO, two push rollouts via vLLM-metal framework. All of the work done using smolcluster[dot]com.

Used SyncPS arch which is synchronous parameter server architecture with the master as the node where the training happens and the vllm on the workers nodes.

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

saintmichel@reddit

East-Muffin-6472@reddit (OP)