I replicated Anthropic’s "Introspection" paper on DeepSeek-7B. It works.
Posted by Specialist_Bad_4465@reddit | LocalLLaMA | View on Reddit | 22 comments
Posted by Specialist_Bad_4465@reddit | LocalLLaMA | View on Reddit | 22 comments
charmander_cha@reddit
Could you make the code available for curious people to play with too?
yatusabe__@reddit
Why the scroll hijacking? Please just tell me why, I need to understand how anyone could think it is a good idea.
Specialist_Bad_4465@reddit (OP)
sorry, i was trying something new :)
lesson learned. I have removed it.
yatusabe__@reddit
Thank you, I finally got to read your post and it was very interesting. I hope you continue this research and share the results. Computer science meets philosophy, what's not to love about it?
Pvt_Twinkietoes@reddit
I do like the UI on mobile
o5mfiHTNsH748KVq@reddit
The little ease upward is still a nice touch though
Tramagust@reddit
What's scroll hijacking?
nuclearbananana@reddit
Honestly, if a little polished, this is actually nice. Makes scrolling while reading way easier. Unlike most cases where it's just for some pointless graphics or progress bar.
Murgatroyd314@reddit
Disagree. Having the page move away from where I put it is a nuisance. No exceptions.
agreeoncesave@reddit
Agree, but don't forget about reader view in your browser
suicidaleggroll@reddit
I liked it on the sections that fit neatly on a single screen, but when I got to sections 3 and 4 that require scrolling it all went to absolute shit and made it much harder to use than normal scrolling.
Environmental-Metal9@reddit
Agreed. On mobile it makes reading the bottom of the page impossible because it jumps to the next section if I try to move the bottom to the middle
pokemonplayer2001@reddit
A hateful choice.
DefNattyBoii@reddit
Very nice UI and good presentation. Keep up the good work. Can you test MOE models too, or it would have same results? (Qwen 30B-A3B)
RobotRobotWhatDoUSee@reddit
Have you considered doing this for one of the AllenAI models where we have base, SFT, and RLHF+ versions of the model easily available? So one could clearly see at what points the models are affected?
Silver_Jaguar_24@reddit
Looking forward to part 2.
Chromix_@reddit
Have you tested this with a window function? As in: Don't just inject a single layer, but also do this attenuated with 1 to 3 adjacent layers. That way the central, full change won't stand out so much.
Corporate_Drone31@reddit
Great stuff! Please post here when you have part 2 ready, I'm curious to see where this might go.
taftastic@reddit
This was a great read. The slider interaction was a neat touch for presenting what you found at different steering ”layers”— which is a concept I don’t think I fully grasp. I found the emerging recognition more interesting than the sweet spot; a machine getting a sense of something is more intriguing than having it noted plainly.
I struggle with the underlying assumption that recognition of an injected token in outputs is somehow introspection or cognition adjacent, but I don’t know shit about fuck. I’ll probably chase down the paper from reading this. Thanks for the share OP
ComputeVoid@reddit
Very cool research. Excited for part 2!
Lebo77@reddit
Wow.
I am not sure what to do with this ingormation, but it's interesting.
LoveMind_AI@reddit
Very impressive and an important contribution!!