Completely novel to me: im-also-a-good-gpt2-chatbot on LMSYS Arena using codeblock to draw Diagrams to supplement its explanations

[-]

FullOf_Bad_Ideas@reddit

This diagram is drawn wrongly though, it doesn't really make sense in the context of the outline of the process that it gave earlier.

Reply

[-]

Healthy-Nebula-3603@reddit

what is wrong with that diagram?

Reply

[-]

By compressing the idea of RLHF to single diagram, a lot of information is lost and it gets confusing, somewhat inaccurate. The lineage from from initial model (actually one after SFT training already, but this information was lost in the diagram to make it smaller) through "Generate Response" > "Human Evaluator" > "Reward Model" is fine. It does get you a reward model. But what's happening with the branching here? A finetune is based on "Generate Responses" and "Human Evaluator" combined?? Why is it branching off before it reaches "Reward Model"? That doesn't really allow for coherent understanding of the diagram. Is "Human Evaluator" needed for every step of the training even if we assume that branching happens from "Reward Model" and not from space between it and "Human Evaluator"? Well it is in the lineage for every step, so you might assume so based on a graph. [Here's](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/08/31/ML-14874_image001.jpg) an example of a diagram that actually explains it in a great way. The best single loop diagram I found is from [wikipedia](https://upload.wikimedia.org/wikipedia/commons/b/b2/RLHF_diagram.svg), but it is way harder to read than the one from AWS.

Reply

[-]

hudimudi@reddit

At least someone bothers to look at the post in detail lol. People get excited too quickly.

Reply

[-]

VectorD@reddit

What are you guys talking about? The diagram matches the above description perfectly lol.

Reply

[-]

Enfiznar@reddit

And that's the origin of the future posts saying "im-also-a-good-gpt2-chatbot get lobotomized, it used to be able to make this diagrams perfect, now it's printing flawed diagrams"

Reply

[-]

dubesor86@reddit (OP)

I mean yea, it flawed. I was more impressed by the attempt than the exact execution though, because I have not seen that before in any other model, unless I specifically asked for it. here it was just part of its natural answer to the prompt shown in the top-right.

Reply

[-]

Randomhkkid@reddit

I've used GPT4 to draw process diagrams like this in the past, not sure it's a new ability of these gpt2 variants.

Reply

[-]

CodeMurmurer@reddit

wow. That's actually cool. Must've cost a "bit" of money to produce the training data for that.

Reply

[-]

SUPR3M3Kai@reddit

Hadn't thought about that being a possibility. My brain's out here being naive, but it would make for such a fascinating emergent ability!

Reply

Completely novel to me: im-also-a-good-gpt2-chatbot on LMSYS Arena using codeblock to draw Diagrams to supplement its explanations

Reply to Post

10 Comments

FullOf_Bad_Ideas@reddit

Healthy-Nebula-3603@reddit

FullOf_Bad_Ideas@reddit

hudimudi@reddit

VectorD@reddit

Enfiznar@reddit

dubesor86@reddit (OP)

Randomhkkid@reddit

CodeMurmurer@reddit

SUPR3M3Kai@reddit