New mystery model on LLM Arena

[-]

shroddy@reddit

Do they reveal their mystery models somewhere after some time? I had a few different ones recently and want to know what they were.

[-]

DuckyBlender@reddit

Yes, usually when the model officially releases they reveal what model it was

[-]

shroddy@reddit

Do you know where that is? I was looking on their blog and the leaderboard.

[-]

shroddy@reddit

Ok good, to know, I hope that information can be accessed without an account, I deleted my Twitter account long before Musk took over and it became X and I don't plan to create an X account now.

[-]

Physical_Manu@reddit

X without an account has only been getting worse and worse. For some reason it shows you posts out of order and limits how much you can see.

[-]

shroddy@reddit

Yeah, it is really bad, I hope lmsys will start to post it elsewhere too.

[-]

Salty-Garage7777@reddit

Came up a couple of times side by side with the best and, IMO, it's way worse.

[-]

soup9999999999999999@reddit

Idk I choose it over claude-3-5-sonnet more than once.

[-]

Salty-Garage7777@reddit

Interesting! What do you find it better than Sonnet at? What use cases made you choose it?

[-]

umarmnaq@reddit (OP)

It's roughly on the same level as GPT-4 (no o), and LLaMA 3 70b

[-]

phhusson@reddit

gosh wtf are you sending sms on boot for. please don't be writing a malware.

And if you're doing IoT stuff I'm curious what (I do Android OS development for work and opensource projects)

[-]

CheatCodesOfLife@reddit

From the comments about restricted timeframe, Probably enforcing screen time for his kids or something

[-]

a_slay_nub@reddit

Restricting your kid's SMS messages during specific timeframes seems like a recipe for disaster. What if it's 1am and they get hurt?

[-]

mrjackspade@reddit

The code isn't blocking SMS messaging, it's sending message if the device is on.

There's nothing stopping anyone from turning it on and sending a message if they get hurt.

[-]

OrangeESP32x99@reddit

Well, they should probably be in bed at 1am lol

[-]

MMAgeezer@reddit

Of course - but they won't always be.

We have to hope for the best and plan for the worst.

[-]

umarmnaq@reddit (OP)

It was just a testing app which checks if the device is turned on during a specified timeframe, and sends an SMS to a number.

[-]

I’m certain it is a Llama model. I got it head-to-head with Llama 3.1 70B in a creative writing task, and the generated poems started with the same few lines and stayed extremely close in content even after they started to diverge. I’d guess it is a Llama 3.5 type release.

[-]