GPT-4.5 cost | TheaterFire

[-]

reza2kn@reddit

this is the best use-case for distillation!😁 give it a few days and we'll get GPT-4.5 generated datasets on HF that would get any model do as well as GPT-4.5 if not better (at the covered tasks) 🤘

Reply

[-]

trajo123@reddit

You are talking about synthetic data. Distillation is something else.

Reply

[-]

No, it doesn't have to be. You give a model some questions , and it gives you some answers. Both of these could be synthetic data. When you use these as training data to train another model, that's distillation.

Reply

[-]

trajo123@reddit

>When you use these as training data to train another model, that's distillation. No. What you are describing is just training on synthetic data. Distillation in deep learning refers to a special kind of training where a smaller model aims to reproduce activations of some internal layers of a larger model. Typically it is done by matching the next-to-last layer - the"logits" in case of classification, and it involves the use of a special loss term, usually cosine distance or KL divergence between the teacher logits and student logits. Distillation is also usually done with real data, but can be done with synthetic data as well.

Reply

[-]

MorallyDeplorable@reddit

that was called fine tuning before Deepseek co-opted the term

Reply

[-]

reza2kn@reddit

Not really.. fine-tuning is when you train a model on any given data. If that data came specifically from asking a specific model, you're fine-tuning by distilling the features of that specific big model into your model.

Reply

[-]

No_Afternoon_4260@reddit

You can't know gpt4.5 logits, that's why it's not truly distillation but just fine tuning on a synthetic dataset. Look at medius supernova paper. He distilled llama405 into some random qwen. I don't remember the details but he did a great explanation on what he did because qwen and llama don't have the same tokenizer

Reply

[-]

reza2kn@reddit

What is this 'true distillation' definition coming from my, brother? 😁

Reply

[-]

No_Afternoon_4260@reddit

From where I seat my friend xD Just from where I seat I don't think anybody owns the rights for the definition of distillation in ML anyway but happy to be prouved wrong. But as I look at it, in llm space using the end text or the logits are two vastly different approaches and it's worth mentioning. To me it seems that by training on the logits you are just closer to the latent feature space, thus copying it with more fidelity (or just shorter dataset?).

Reply

[-]

reza2kn@reddit

i wasn't claiming to own a defition! just said why do you think that's 'real distillation'? where's the source? also, there are various ways of performing distillation, some are better than others in some cases, but i don't think any of them could be classified into 'real distillation' and not. that's all I was saying.

Reply

[-]

No_Afternoon_4260@reddit

No you are right I've thought about it and I completely agree with you

Reply

[-]

ColorlessCrowfeet@reddit

Yes, and it's still called fine-tuning by people who want to understand and communicate technical knowledge rather than imitate statistical patterns in their recent reading data.

Reply

[-]

aurelivm@reddit

distillation refers to training a smaller model with the same tokenizer on output logits, not SFT

Reply

[-]

ToTallyNikki@reddit

Distillation refers to heating a liquid to concentrate something.

Reply

[-]

Aischylos@reddit

At this point they have become synonymous, but distillation was originally a technique for training off the output distribution, not just the tokens. Now it's been used interchangeably a lot which sucks because it would be nice to have a good term for 'true' distillation.

Reply

[-]

Regular_Boss_1050@reddit

At these prices, distilling is gonna be expensive AF. Need someone to take the bait.

Reply

[-]

reza2kn@reddit

So you see my plans 😁🤌🏻😂

Reply

[-]

medialoungeguy@reddit

And then d sack will cry and call distillation a national security issue

Reply

[-]

reza2kn@reddit

Let 'em cry 😁

Reply

[-]

Relative-Flatworm827@reddit

And that's still super efficient compared to anything I can run on my PC lol. I have 16gb vram and every model I use is garbage. So. Price is worth it I guess.

Reply

[-]

Comfortable-Rock-498@reddit

This is insane pricing. I hope the present and even more so the future will look back at this moment in mockery

Reply

[-]

HellsNoot@reddit

If OpenAI had a moderate run, leading to a much larger model with some better performance, you'd rather they don't offer it at all? Or have them lose money on offering it to customers? I don't really understand what the grift is here.

Reply

[-]

The_frozen_one@reddit

That's such a weird take. They have no vendor lock in, and they are losing money with each token. Nobody is being compelled to spend money on a model that most people didn't know about 6 hours ago.

Reply

[-]

Balance-@reddit

Comparison: https://preview.redd.it/qwgw6xasyqle1.png?width=3567&format=png&auto=webp&s=a377ac36e5549d95ffc0d0b15a1a2231c3affbac

Reply

[-]

shakespear94@reddit

I’m okay gpt-4o-mini. Although, i’m developing with gpt-3.5-turbo. God. I almost want to have my own infrastructure and run on that *later*.

Reply

[-]

wen_mars@reddit

I have 4o-mini in one tab, V3 in another (it's smarter but slower, also cheap) and I escalate to o1 on tough problems but I have to be careful not to use up my quota.

Reply

[-]

shakespear94@reddit

I go for DeepSeek R1 for critical analysis and approach refining. Then Grok 3 to help me build that part, context is kind of infinite to me, all things considering, and chatgpt to troubleshoot minor things. But that’s me being a script kiddie. I’m sure people are using these models for breaking into the quantum real.

Reply

[-]

Comfortable-Rock-498@reddit

you see, scaling is not dead

Reply

[-]

harrro@reddit

Scaling profits is at an all time high

Reply

[-]

thereisonlythedance@reddit

Must be a monster of a model, size wise. Proof scaling hit a wall.

Reply

[-]

differentguyscro@reddit

They said that scaling up only two of (parameters, compute, data) doesn't do much [compared to scaling up all three together] in their paper "Scaling Laws for Neural Language Models" from January 2023. We need 10^2 or 10^3 internets worth of quality synthetic data to really tell.

Reply

[-]

PermanentLiminality@reddit

The scaling laws appear to be exponential increase in compute for linear increase in performance.

Reply

[-]

Enfiznar@reddit

So logarithmic in performance

Reply

[-]

yur_mom@reddit

I like my binary trees to have log logarithmic not my llm performance

Reply

[-]

121507090301@reddit

Might be might not be. It could just as well have been that they made a model too big for their dataset and it either didn't even begin getting good or the dataset was pretty bad or something like it, or something else. Who knows...

Reply

[-]

thereisonlythedance@reddit

Sam just referred to it as a “giant, expensive model”, confirms it’s a big one. I’d love of know how it stacks up to the original GPT-4 in parameter size. It’s very slow via API at the moment too, though that may be extreme load.

Reply

[-]

mikael110@reddit

OpenAI themselves hints at this in their news blog: >GPT‑4.5 is a very large and compute-intensive model, making it more [expensive⁠](https://openai.com/api/pricing/) than and not a replacement for GPT‑4o. Because of this, we’re evaluating whether to continue serving it in the API long-term as we balance supporting current capabilities with building future models. We look forward to learning more about its strengths, capabilities, and potential applications in real-world settings. If GPT‑4.5 delivers unique value for your use case, your [feedback⁠(opens in a new window)](https://community.openai.com/) will play an important role in guiding our decision. If it is so big that they might not even keep serving it on the API it must be quite *chonky* indeed, which incidentally is what one of the presenters nicknamed it during the announcement presentation.

Reply

[-]

Dayder111@reddit

So, it's released as a research preview indeed then, and to show that some of the scaling did indeed hit a wall, at least with the current hardware (that they have, H100 and H200 I guess). Kind of giving us the last taste of the old-school naive architecture pretraining scaling maybe? Maybe even close to what some people thought GPT-4 will be, 100 trillion parameter rumours and such.

Reply

[-]

TheThoccnessMonster@reddit

More like proof they know that their competition will generate output for DeepSeek and they’re gonna pay a lot for the privilege.

Reply

[-]

a_slay_nub@reddit

What exactly is this model's niche again? At this point you're better off paying for a reasoning model. I guess scaling really is dead.

Reply

[-]

Ok_Landscape_6819@reddit

Is it though ? I mean Grok 3 base performed better and probably cost way less since free-tier has access

Reply

[-]

MindCrusader@reddit

Isn't grok 3 using TOC under the hood? Sonnet is using it beside not being named reasoning model

Reply

[-]

Dudmaster@reddit

> TOC Thought of chain?

Reply

[-]

MindCrusader@reddit

Yup

Reply

[-]

differentguyscro@reddit

"She has a really good personality"

Reply

[-]

Comfortable-Rock-498@reddit

This TC article made me laugh out loud [https://techcrunch.com/2025/02/27/openais-gpt-4-5-is-better-at-convincing-other-ai-to-give-it-money/](https://techcrunch.com/2025/02/27/openais-gpt-4-5-is-better-at-convincing-other-ai-to-give-it-money/) Of course it is lol

Reply

[-]

as-tro-bas-tards@reddit

Combine this with the CFPB shutting down and....hoo boy we are in for some dark times.

Reply

[-]

acc_agg@reddit

This was a moonshot to see how well a monolithic non-reasoning model could be trained. It's the bet they made with gpt3 that paid off, here it seems like it may not - I've not tested the model and can't say for sure.

Reply

[-]

throwaway2676@reddit

Sounds like the thought process here was basically "This is kinda obsolete because of reasoning models, but we put a ton of compute into it, so let's just release it and move on"

Reply

[-]

M44PolishMosin@reddit

Apparently good at writing mean texts to your friends

Reply

[-]

eimas_dev@reddit

i was looking at stream and thinking am i the only one dumb or the example is absurd when you try to present sota model

Reply

[-]

MerePotato@reddit

That's cause its not SOTA, and they say as much - it isn't intended to push the frontier

Reply

[-]

Danteg@reddit

It probably was intended to be SOTA at some point, but disappointed.

Reply

[-]

hudimudi@reddit

I’d rather say they cooked this up quickly as a response to Claude’s new model and DeepSeeks models. They just had to release SOMETHING to take over the spotlight again.

Reply

[-]

BaysQuorv@reddit

Knowledge cutoff is oct 2023, this has been in the making long before r1

Reply

[-]

jm2342@reddit

Hardening Altman Sam's sick, if course

Reply

[-]

sourceholder@reddit

What's the BIG-Bench Hard score?

Reply

[-]

NikBerlin@reddit

now wrap 4.5 with reasoning..

Reply

[-]

4sater@reddit

One dollar per token

Reply

[-]

AutoModerator@reddit

Your submission has been **automatically** removed due to receiving many reports. If you believe that this was an error, please send a message to modmail. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/LocalLLaMA) if you have any questions or concerns.*

Reply

[-]

linkcharger@reddit

😂😂😂😂😂😂

Reply

[-]

cmdr-William-Riker@reddit

If they are going to make it 5 times as much as the best Anthropic model it had better be 5 times more capable

Reply

[-]

crack_pop_rocks@reddit

People be sucking dicks soon for tokens

Reply

[-]

aprx4@reddit

Not for long, we'll have AI-powered humanoid robots doing that.

Reply

[-]

cultish_alibi@reddit

well what the hell is left for humans??

Reply

[-]

Foreign-Beginning-49@reddit

Those robots aren't leaving the home they are currently in.

Reply

[-]

mattjb@reddit

We must teach them how to remotely work via Zoom.

Reply

[-]

Cergorach@reddit

But just like with self driving cars, you can rent them out when you're not using them... ;)

Reply

[-]

weird_d0lphin@reddit

If you follow his logic, we might have to suck AI-powered humanoid robots' dicks for tokens...

Reply

[-]

mosthumbleuserever@reddit

...You guys are getting tokens for money?

Reply

[-]

drwebb@reddit

lol no thanks, I'll stick with my R1 based models thank you very much.

Reply

[-]

TheRealMasonMac@reddit

I bit the bullet to see how well it would do for creative writing. Holy shit. It is shit.

Reply

[-]

megadonkeyx@reddit

So what's the point of stargate if scaling is over

Reply

[-]

evia89@reddit

Well they can host sonnet 37 like amazon ))

Reply

[-]

andrew_kirfman@reddit

This is not a great look from OpenAI IMO. Worse than Claude 3.7 at programming tasks and insanely more expensive. Makes me wonder what’s going on with model scaling and how many parameters we’re looking at to produce this result. I can definitely understand why they didn’t release this as GPT-5.

Reply

[-]

You_Wen_AzzHu@reddit

We need to cancel our plus account to make a point. Claude is now a better solution.

Reply

[-]

bonobomaster@reddit

Out of curiosity, has Claude a Deep Research mode? Because that mode is in the plus account and it fucking blows my mind. My first "Holy shit, I'm in the future" moment I had with any LLM.

Reply

[-]

Cergorach@reddit

Left Chat due to an affair with Claude... ;)

Reply

[-]

panic_in_the_galaxy@reddit

I don't think most people here have a plus account

Reply

[-]

Reason_He_Wins_Again@reddit

I would rather have a GPT+ account than most other subscriptions at this point. It's my google replacement.

Reply

[-]

panic_in_the_galaxy@reddit

Google actually gives you free access to their LLMs

Reply

[-]

MorallyDeplorable@reddit

Yea but then you're stuck using Google's LLMs

Reply

[-]

AriyaSavaka@reddit

I'd never subscribed to begin with, either local or API.

Reply

[-]

brahh85@reddit

A new gpt-4 with up to date datasets to distill into a new set of models.

Reply

[-]

jiml78@reddit

I think the knowledge cutoff is Oct 2023. So not really

Reply

[-]

Cergorach@reddit

Erm... The free version of ChatGPT knows who sits on the Danish throne (that changed in 2024) and that's with websearch turned off.

Reply

[-]

LevianMcBirdo@reddit

yeah it was updated in January, this includes more uptodate knowledge.

Reply

[-]

3D_TOPO@reddit

Pretty hilarious considering R1 stomps it

Reply

[-]

PlaneTheory5@reddit

Bullish on Deepseek Bearish on OAI

Reply

[-]

Distinct-Target7503@reddit

wait is that higher than claude opus?!

Reply

[-]

MorallyDeplorable@reddit

> wait is that higher than claude opus?! by a _lot_ even

Reply

[-]

kldjasj@reddit

Are they putting the price high now to release a new model with a not-so-expecing pricing later?

Reply

[-]

SuuLoliForm@reddit

Two months later: "Check out this totally new model that was totally not at all finished when we released GPT 4.5, GPT 4.5Turbo! Now at the low low price of ten dollars per million token input!"

Reply

[-]

EridianExplorer@reddit

HAHAHAHAHAHA

Reply

[-]

ARVwizardry@reddit

If you wanted to voice-chat with gpt-4.5, you can do it for no additional cost with [ClickUi.app](http://ClickUi.app) Although I highly recommend using 4o-mini lol

Reply

[-]

Dogeboja@reddit

who asked?

Reply

[-]

ARVwizardry@reddit

No one, I just got the website live and trying to get users/collaborators Surprised there's so many downvotes on mentioning a just-launched, free, and open source tool that brings AI to your computer

Reply

[-]

vertigo235@reddit

OMG lol

Reply

[-]

FastDecode1@reddit

wrong sub

Reply

[-]

Glittering-Bag-4662@reddit

Jesús

Reply

[-]

Tailor_Big@reddit

At least 5 trillion parameters, largest llm ever on earth!

Reply

[-]

RexyIsSexy@reddit

The price chart shown is per 1 million tokens for each column.

Reply

[-]

Comic-Engine@reddit

That's insane

Reply

[-]

AriyaSavaka@reddit

Now DeepSeek will have something to leverage for their R2.

Reply

[-]

ttkciar@reddit

By comparison, Tulu 3 405B is only $5 per million input tokens, $10 per million output tokens. I suppose GPT4.5 might be more compelling if it's better-suited to your task, but for my use-case Tulu is a better fit (and I could infer locally with it, if I upgraded one of my servers with more memory).

Reply

[-]

Johnny_Rell@reddit

At this point it's cheaper to hire someone to do the task you're trying to achieve

Reply

[-]

Hydraxiler32@reddit

the person you hire will use a cheaper model

Reply

[-]

MLHeero@reddit

Not even close even with this prices 😀

Reply

[-]

mr_happy_nice@reddit

LMAO, the point of paying for something is to get value out of it. Does this really provide that much value considering other options?

Reply

[-]

OriginalPlayerHater@reddit

ez money

Reply

[-]

offlinesir@reddit

it's per 1m tokens

Reply

[-]

OriginalPlayerHater@reddit

pretty sure its per request, Altman is pretty greedy

Reply

[-]

Enfiznar@reddit

per 1m tokens per request, as with all their models

Reply

[-]

OriginalPlayerHater@reddit

yup i get it, i'm just poking fun

Reply

[-]

NikBerlin@reddit

per million tokens I guess?

Reply

[-]

mxforest@reddit

No guesses. It's definitely for a million.

Reply

[-]

OriginalPlayerHater@reddit

if it is 500k you have to eat a worm raw

Reply

[-]

ttkciar@reddit

We always knew OpenAI would have to raise their prices, eventually. They've been operating at a net loss since day one, burning through investor funding to keep the lights on. Turning a net profit requires them to charge higher prices. It's as simple as that.

Reply

[-]

maddogawl@reddit

I still can't believe this is real, its gotta just be for rollout right? right?

Reply

[-]

ComprehensiveBird317@reddit

Holy hell that's a lot of dollars. So it's not good for chat, but then what?

Reply

[-]

2053_Traveler@reddit

gee thanks Sam!

Reply

Reply to Post

121 Comments