I replicated Anthropic’s "Introspection" paper on DeepSeek-7B. It works.

[-]

charmander_cha@reddit

Could you make the code available for curious people to play with too?

[-]

yatusabe__@reddit

Why the scroll hijacking? Please just tell me why, I need to understand how anyone could think it is a good idea.

[-]

Specialist_Bad_4465@reddit (OP)

sorry, i was trying something new :)

lesson learned. I have removed it.

[-]

yatusabe__@reddit

Thank you, I finally got to read your post and it was very interesting. I hope you continue this research and share the results. Computer science meets philosophy, what's not to love about it?

[-]

o5mfiHTNsH748KVq@reddit

The little ease upward is still a nice touch though

[-]

nuclearbananana@reddit

Honestly, if a little polished, this is actually nice. Makes scrolling while reading way easier. Unlike most cases where it's just for some pointless graphics or progress bar.

[-]

Murgatroyd314@reddit

Disagree. Having the page move away from where I put it is a nuisance. No exceptions.

[-]

agreeoncesave@reddit

Agree, but don't forget about reader view in your browser

[-]

I liked it on the sections that fit neatly on a single screen, but when I got to sections 3 and 4 that require scrolling it all went to absolute shit and made it much harder to use than normal scrolling.

[-]

Environmental-Metal9@reddit

Agreed. On mobile it makes reading the bottom of the page impossible because it jumps to the next section if I try to move the bottom to the middle

[-]

pokemonplayer2001@reddit

A hateful choice.

[-]

DefNattyBoii@reddit

Very nice UI and good presentation. Keep up the good work. Can you test MOE models too, or it would have same results? (Qwen 30B-A3B)

[-]

RobotRobotWhatDoUSee@reddit

Have you considered doing this for one of the AllenAI models where we have base, SFT, and RLHF+ versions of the model easily available? So one could clearly see at what points the models are affected?

[-]

Silver_Jaguar_24@reddit

In Part 2, I will explore Safety Blindness. I will show how RLHF lobotomizes the model's ability to introspect on dangerous concepts, and how I used "Meta-Cognitive Reframing" to restore its ability.

Looking forward to part 2.

[-]

Chromix_@reddit

Have you tested this with a window function? As in: Don't just inject a single layer, but also do this attenuated with 1 to 3 adjacent layers. That way the central, full change won't stand out so much.

[-]

Corporate_Drone31@reddit

Great stuff! Please post here when you have part 2 ready, I'm curious to see where this might go.

[-]

taftastic@reddit

This was a great read. The slider interaction was a neat touch for presenting what you found at different steering ”layers”— which is a concept I don’t think I fully grasp. I found the emerging recognition more interesting than the sweet spot; a machine getting a sense of something is more intriguing than having it noted plainly.

I struggle with the underlying assumption that recognition of an injected token in outputs is somehow introspection or cognition adjacent, but I don’t know shit about fuck. I’ll probably chase down the paper from reading this. Thanks for the share OP

[-]

ComputeVoid@reddit

Very cool research. Excited for part 2!

[-]

Lebo77@reddit

Wow.

I am not sure what to do with this ingormation, but it's interesting.

[-]

LoveMind_AI@reddit

Very impressive and an important contribution!!