Can 4chan data REALLY improve a model? TURNS OUT IT CAN!
Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 157 comments
Hear me out, no one (really) knows how these things work.
A few days ago, I released [Assistant\_Pepe\_8B](https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_8B), you can read the discussion in [this thread](https://www.reddit.com/r/LocalLLaMA/comments/1qppjo4/assistant_pepe_8b_1m_context_zero_slop/).
I trained it on an extended **4chan dataset**, on an abliterated base, but what I didn't expect was to get this:
https://preview.redd.it/lrqwx8ca1ugg1.png?width=2333&format=png&auto=webp&s=4dcfcfb9c107fa3d417e5ff623c4952e5e2ab457
https://preview.redd.it/a3bby1yd1ugg1.png?width=2980&format=png&auto=webp&s=8f050bbd512a12a359626af79ccebcd2d2445877
Somehow, **against all common sense**, the model **outperformed** nvidia's nemotron, the base it was trained on. This is usually the other way around. You take a smart base, tune a model on it, and accept the sacrifice of some intelligence to give it flavor.
At first I thought "OK nice, a coincidence, who cares?"
But then I looked more closely at the scores:
1) The abliterated base **scored higher** than the base.
2) The finetune scored even **higher than both**.
3) The finetune was literally on an extremely noise 4chan dataset, it should have eaten glue.
And then I remembered something: the original, gpt4chan (by Yannic Kilcher) scored especially high in truthfulness (that was b4 benchmaxxing).
So I took a closer look on recent models I released; the abliterated Impish\_LLAMA\_4B not only outperformed the base tune (the unabliterated one), it also changed its political alignment (you can check for yourself the UGI stats, I feel like I spammed enough images).
People were initially joking about the "alignment tax", I think there's a none trivial substance in all of this. It seems to me just above a marginal error or statistical noise.
Oh, and the KL divergence for Impish\_LLAMA\_4B was :
<0.01
157 Comments
jconorgrogan@reddit
Sicarius_The_First@reddit (OP)
FPham@reddit
Elven77AI@reddit
Sicarius_The_First@reddit (OP)
my_name_isnt_clever@reddit
Elven77AI@reddit
Sicarius_The_First@reddit (OP)
TAW56234@reddit
Sicarius_The_First@reddit (OP)
TAW56234@reddit
Shockbum@reddit
Sicarius_The_First@reddit (OP)
Sicarius_The_First@reddit (OP)
PykeAtBanquet@reddit
_LususNaturae_@reddit
PykeAtBanquet@reddit
_LususNaturae_@reddit
montdawgg@reddit
rdsf138@reddit
Shockbum@reddit
PykeAtBanquet@reddit
PykeAtBanquet@reddit
rdsf138@reddit
Chilidawg@reddit
ElectronSpiderwort@reddit
PykeAtBanquet@reddit
rdsf138@reddit
PykeAtBanquet@reddit
Sicarius_The_First@reddit (OP)
PykeAtBanquet@reddit
stoppableDissolution@reddit
Sicarius_The_First@reddit (OP)
stoppableDissolution@reddit
Sicarius_The_First@reddit (OP)
_Erilaz@reddit
Sicarius_The_First@reddit (OP)
darwinanim8or@reddit
beijinghouse@reddit
BlueCrimson78@reddit
Frequent-Mud8705@reddit
beijinghouse@reddit
BlueCrimson78@reddit
Chilidawg@reddit
valdocs_user@reddit
Sicarius_The_First@reddit (OP)
ANONYMOUSEJR@reddit
valdocs_user@reddit
ANONYMOUSEJR@reddit
toothpastespiders@reddit
xrvz@reddit
SkyNetLive@reddit
tinycurses@reddit
tachCN@reddit
Yorn2@reddit
Ryoonya@reddit
burbilog@reddit
Infamous_Mud482@reddit
beryugyo619@reddit
techno156@reddit
Sicarius_The_First@reddit (OP)
a_mimsy_borogove@reddit
FrostieDog@reddit
BrutallyEffective@reddit
ivari@reddit
SuchAGoodGirlsDaddy@reddit
lan-devo@reddit
LeoPelozo@reddit
know-your-enemy-92@reddit
ThisBuddhistLovesYou@reddit
10c70377@reddit
toptipkekk@reddit
trenescese@reddit
Chilidawg@reddit
Sicarius_The_First@reddit (OP)
Chilidawg@reddit
input_a_new_name@reddit
Sicarius_The_First@reddit (OP)
tyty657@reddit
input_a_new_name@reddit
epyctime@reddit
CatEatsDogs@reddit
crantob@reddit
iMakeSense@reddit
crantob@reddit
usernameplshere@reddit
Kraskos@reddit
Sicarius_The_First@reddit (OP)
Kraskos@reddit
valkarias@reddit
Sicarius_The_First@reddit (OP)
spiritplumber@reddit
IrisColt@reddit
Sicarius_The_First@reddit (OP)
Worldly-Cod-2303@reddit
Sicarius_The_First@reddit (OP)
Worldly-Cod-2303@reddit
Worldly-Cod-2303@reddit
SkyNetLive@reddit
Sicarius_The_First@reddit (OP)
nuclearbananana@reddit
JLeonsarmiento@reddit
ergabaderg312@reddit
Sicarius_The_First@reddit (OP)
Distinct-Expression2@reddit
Sicarius_The_First@reddit (OP)
IulianHI@reddit
Lowetheiy@reddit
Sicarius_The_First@reddit (OP)
Il_Signor_Luigi@reddit
Sicarius_The_First@reddit (OP)
IulianHI@reddit
Sicarius_The_First@reddit (OP)
philmarcracken@reddit
Sicarius_The_First@reddit (OP)
RaZZMojito@reddit
Sicarius_The_First@reddit (OP)
Frogy_mcfrogyface@reddit
Sicarius_The_First@reddit (OP)
ali0une@reddit
Sicarius_The_First@reddit (OP)
JSWGaming@reddit
Sicarius_The_First@reddit (OP)
lan-devo@reddit
Sicarius_The_First@reddit (OP)
Dr_Kel@reddit
Sicarius_The_First@reddit (OP)
No_Swimming6548@reddit
Sicarius_The_First@reddit (OP)
darwinanim8or@reddit
Sicarius_The_First@reddit (OP)
anotheruser323@reddit
Lan_BobPage@reddit
lan-devo@reddit
Lan_BobPage@reddit
anotheruser323@reddit
jacek2023@reddit
Sicarius_The_First@reddit (OP)
aaronr_90@reddit
Sicarius_The_First@reddit (OP)
MaruluVR@reddit
graphbook@reddit
segmond@reddit
RealisticPrimary8@reddit
NES64Super@reddit
PlainBread@reddit
DistanceSolar1449@reddit
aaronr_90@reddit
DistanceSolar1449@reddit
DistanceSolar1449@reddit
cgs019283@reddit
My_Unbiased_Opinion@reddit
Kahvana@reddit
MaruluVR@reddit
Necessary-Wasabi-619@reddit
skate_nbw@reddit
AllTey@reddit