TheaterFire

Completely novel to me: im-also-a-good-gpt2-chatbot on LMSYS Arena using codeblock to draw Diagrams to supplement its explanations

Posted by dubesor86@reddit | LocalLLaMA | View on Reddit | 10 comments

Completely novel to me: im-also-a-good-gpt2-chatbot on LMSYS Arena using codeblock to draw Diagrams to supplement its explanations

Reply to Post

10 Comments

FullOf_Bad_Ideas@reddit

This diagram is drawn wrongly though, it doesn't really make sense in the context of the outline of the process that it gave earlier.
View on Reddit #26046705

Healthy-Nebula-3603@reddit

what is wrong with that diagram?
View on Reddit #26050431

FullOf_Bad_Ideas@reddit

By compressing the idea of RLHF to single diagram, a lot of information is lost and it gets confusing, somewhat inaccurate. The lineage from from initial model (actually one after SFT training already, but this information was lost in the diagram to make it smaller) through "Generate Response" > "Human Evaluator" > "Reward Model" is fine. It does get you a reward model. But what's happening with the branching here? A finetune is based on "Generate Responses" and "Human Evaluator" combined?? Why is it branching off before it reaches "Reward Model"? That doesn't really allow for coherent understanding of the diagram. Is "Human Evaluator" needed for every step of the training even if we assume that branching happens from "Reward Model" and not from space between it and "Human Evaluator"? Well it is in the lineage for every step, so you might assume so based on a graph. [Here's](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/08/31/ML-14874_image001.jpg) an example of a diagram that actually explains it in a great way. The best single loop diagram I found is from [wikipedia](https://upload.wikimedia.org/wikipedia/commons/b/b2/RLHF_diagram.svg), but it is way harder to read than the one from AWS.
View on Reddit #26130415

hudimudi@reddit

At least someone bothers to look at the post in detail lol. People get excited too quickly.
View on Reddit #26047441

VectorD@reddit

What are you guys talking about? The diagram matches the above description perfectly lol.
View on Reddit #26070877

Enfiznar@reddit

And that's the origin of the future posts saying "im-also-a-good-gpt2-chatbot get lobotomized, it used to be able to make this diagrams perfect, now it's printing flawed diagrams"
View on Reddit #26051813

dubesor86@reddit (OP)

I mean yea, it flawed. I was more impressed by the attempt than the exact execution though, because I have not seen that before in any other model, unless I specifically asked for it. here it was just part of its natural answer to the prompt shown in the top-right.
View on Reddit #26047741

Randomhkkid@reddit

I've used GPT4 to draw process diagrams like this in the past, not sure it's a new ability of these gpt2 variants.
View on Reddit #26053845

CodeMurmurer@reddit

wow. That's actually cool. Must've cost a "bit" of money to produce the training data for that.
View on Reddit #26044035

SUPR3M3Kai@reddit

Hadn't thought about that being a possibility. My brain's out here being naive, but it would make for such a fascinating emergent ability!
View on Reddit #26046127