this is the best use-case for distillation!π give it a few days and we'll get GPT-4.5 generated datasets on HF that would get any model do as well as GPT-4.5 if not better (at the covered tasks) π€
No, it doesn't have to be. You give a model some questions , and it gives you some answers. Both of these could be synthetic data. When you use these as training data to train another model, that's distillation.
>When you use these as training data to train another model, that's distillation.
No. What you are describing is just training on synthetic data. Distillation in deep learning refers to a special kind of training where a smaller model aims to reproduce activations of some internal layers of a larger model. Typically it is done by matching the next-to-last layer - the"logits" in case of classification, and it involves the use of a special loss term, usually cosine distance or KL divergence between the teacher logits and student logits. Distillation is also usually done with real data, but can be done with synthetic data as well.
Not really.. fine-tuning is when you train a model on any given data. If that data came specifically from asking a specific model, you're fine-tuning by distilling the features of that specific big model into your model.
You can't know gpt4.5 logits, that's why it's not truly distillation but just fine tuning on a synthetic dataset.
Look at medius supernova paper. He distilled llama405 into some random qwen.
I don't remember the details but he did a great explanation on what he did because qwen and llama don't have the same tokenizer
From where I seat my friend xD Just from where I seat
I don't think anybody owns the rights for the definition of distillation in ML anyway but happy to be prouved wrong.
But as I look at it, in llm space using the end text or the logits are two vastly different approaches and it's worth mentioning. To me it seems that by training on the logits you are just closer to the latent feature space, thus copying it with more fidelity (or just shorter dataset?).
i wasn't claiming to own a defition! just said why do you think that's 'real distillation'? where's the source?
also, there are various ways of performing distillation, some are better than others in some cases, but i don't think any of them could be classified into 'real distillation' and not. that's all I was saying.
Yes, and it's still called fine-tuning by people who want to understand and communicate technical knowledge rather than imitate statistical patterns in their recent reading data.
At this point they have become synonymous, but distillation was originally a technique for training off the output distribution, not just the tokens. Now it's been used interchangeably a lot which sucks because it would be nice to have a good term for 'true' distillation.
And that's still super efficient compared to anything I can run on my PC lol. I have 16gb vram and every model I use is garbage. So. Price is worth it I guess.
If OpenAI had a moderate run, leading to a much larger model with some better performance, you'd rather they don't offer it at all? Or have them lose money on offering it to customers? I don't really understand what the grift is here.
That's such a weird take. They have no vendor lock in, and they are losing money with each token. Nobody is being compelled to spend money on a model that most people didn't know about 6 hours ago.
I have 4o-mini in one tab, V3 in another (it's smarter but slower, also cheap) and I escalate to o1 on tough problems but I have to be careful not to use up my quota.
I go for DeepSeek R1 for critical analysis and approach refining. Then Grok 3 to help me build that part, context is kind of infinite to me, all things considering, and chatgpt to troubleshoot minor things. But thatβs me being a script kiddie.
Iβm sure people are using these models for breaking into the quantum real.
They said that scaling up only two of (parameters, compute, data) doesn't do much [compared to scaling up all three together] in their paper "Scaling Laws for Neural Language Models" from January 2023.
We need 10^2 or 10^3 internets worth of quality synthetic data to really tell.
Might be might not be.
It could just as well have been that they made a model too big for their dataset and it either didn't even begin getting good or the dataset was pretty bad or something like it, or something else. Who knows...
Sam just referred to it as a βgiant, expensive modelβ, confirms itβs a big one. Iβd love of know how it stacks up to the original GPT-4 in parameter size. Itβs very slow via API at the moment too, though that may be extreme load.
OpenAI themselves hints at this in their news blog:
>GPTβ4.5 is a very large and compute-intensive model, making it moreΒ [expensiveβ ](https://openai.com/api/pricing/)Β than and not a replacement for GPTβ4o. Because of this, weβre evaluating whether to continue serving it in the API long-term as we balance supporting current capabilities with building future models. We look forward to learning more about its strengths, capabilities, and potential applications in real-world settings. If GPTβ4.5 delivers unique value for your use case, yourΒ [feedbackβ (opens in a new window)](https://community.openai.com/)Β will play an important role in guiding our decision.
If it is so big that they might not even keep serving it on the API it must be quite *chonky* indeed, which incidentally is what one of the presenters nicknamed it during the announcement presentation.
So, it's released as a research preview indeed then, and to show that some of the scaling did indeed hit a wall, at least with the current hardware (that they have, H100 and H200 I guess).
Kind of giving us the last taste of the old-school naive architecture pretraining scaling maybe? Maybe even close to what some people thought GPT-4 will be, 100 trillion parameter rumours and such.
This TC article made me laugh out loud [https://techcrunch.com/2025/02/27/openais-gpt-4-5-is-better-at-convincing-other-ai-to-give-it-money/](https://techcrunch.com/2025/02/27/openais-gpt-4-5-is-better-at-convincing-other-ai-to-give-it-money/)
Of course it is lol
This was a moonshot to see how well a monolithic non-reasoning model could be trained. It's the bet they made with gpt3 that paid off, here it seems like it may not - I've not tested the model and can't say for sure.
Sounds like the thought process here was basically "This is kinda obsolete because of reasoning models, but we put a ton of compute into it, so let's just release it and move on"
Iβd rather say they cooked this up quickly as a response to Claudeβs new model and DeepSeeks models. They just had to release SOMETHING to take over the spotlight again.
Your submission has been **automatically** removed due to receiving many reports. If you believe that this was an error, please send a message to modmail.
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/LocalLLaMA) if you have any questions or concerns.*
This is not a great look from OpenAI IMO.
Worse than Claude 3.7 at programming tasks and insanely more expensive.
Makes me wonder whatβs going on with model scaling and how many parameters weβre looking at to produce this result.
I can definitely understand why they didnβt release this as GPT-5.
Out of curiosity, has Claude a Deep Research mode?
Because that mode is in the plus account and it fucking blows my mind. My first "Holy shit, I'm in the future" moment I had with any LLM.
Two months later: "Check out this totally new model that was totally not at all finished when we released GPT 4.5, GPT 4.5Turbo! Now at the low low price of ten dollars per million token input!"
If you wanted to voice-chat with gpt-4.5, you can do it for no additional cost with [ClickUi.app](http://ClickUi.app)
Although I highly recommend using 4o-mini lol
No one, I just got the website live and trying to get users/collaborators
Surprised there's so many downvotes on mentioning a just-launched, free, and open source tool that brings AI to your computer
By comparison, Tulu 3 405B is only $5 per million input tokens, $10 per million output tokens.
I suppose GPT4.5 might be more compelling if it's better-suited to your task, but for my use-case Tulu is a better fit (and I could infer locally with it, if I upgraded one of my servers with more memory).
We always knew OpenAI would have to raise their prices, eventually. They've been operating at a net loss since day one, burning through investor funding to keep the lights on.
Turning a net profit requires them to charge higher prices. It's as simple as that.
121 Comments
reza2kn@reddit
trajo123@reddit
reza2kn@reddit
trajo123@reddit
MorallyDeplorable@reddit
reza2kn@reddit
No_Afternoon_4260@reddit
reza2kn@reddit
No_Afternoon_4260@reddit
reza2kn@reddit
No_Afternoon_4260@reddit
ColorlessCrowfeet@reddit
aurelivm@reddit
ToTallyNikki@reddit
Aischylos@reddit
Regular_Boss_1050@reddit
reza2kn@reddit
medialoungeguy@reddit
reza2kn@reddit
Relative-Flatworm827@reddit
Comfortable-Rock-498@reddit
HellsNoot@reddit
The_frozen_one@reddit
Balance-@reddit
shakespear94@reddit
wen_mars@reddit
shakespear94@reddit
Comfortable-Rock-498@reddit
harrro@reddit
thereisonlythedance@reddit
differentguyscro@reddit
PermanentLiminality@reddit
Enfiznar@reddit
yur_mom@reddit
121507090301@reddit
thereisonlythedance@reddit
mikael110@reddit
Dayder111@reddit
TheThoccnessMonster@reddit
a_slay_nub@reddit
Ok_Landscape_6819@reddit
MindCrusader@reddit
Dudmaster@reddit
MindCrusader@reddit
differentguyscro@reddit
Comfortable-Rock-498@reddit
as-tro-bas-tards@reddit
acc_agg@reddit
throwaway2676@reddit
M44PolishMosin@reddit
eimas_dev@reddit
MerePotato@reddit
Danteg@reddit
hudimudi@reddit
BaysQuorv@reddit
jm2342@reddit
sourceholder@reddit
NikBerlin@reddit
4sater@reddit
AutoModerator@reddit
linkcharger@reddit
cmdr-William-Riker@reddit
crack_pop_rocks@reddit
aprx4@reddit
cultish_alibi@reddit
Foreign-Beginning-49@reddit
mattjb@reddit
Cergorach@reddit
weird_d0lphin@reddit
mosthumbleuserever@reddit
drwebb@reddit
TheRealMasonMac@reddit
megadonkeyx@reddit
evia89@reddit
andrew_kirfman@reddit
You_Wen_AzzHu@reddit
bonobomaster@reddit
Cergorach@reddit
panic_in_the_galaxy@reddit
Reason_He_Wins_Again@reddit
panic_in_the_galaxy@reddit
MorallyDeplorable@reddit
AriyaSavaka@reddit
brahh85@reddit
jiml78@reddit
Cergorach@reddit
LevianMcBirdo@reddit
3D_TOPO@reddit
PlaneTheory5@reddit
Distinct-Target7503@reddit
MorallyDeplorable@reddit
kldjasj@reddit
SuuLoliForm@reddit
EridianExplorer@reddit
ARVwizardry@reddit
Dogeboja@reddit
ARVwizardry@reddit
vertigo235@reddit
FastDecode1@reddit
Glittering-Bag-4662@reddit
Tailor_Big@reddit
RexyIsSexy@reddit
Comic-Engine@reddit
AriyaSavaka@reddit
ttkciar@reddit
Johnny_Rell@reddit
Hydraxiler32@reddit
MLHeero@reddit
mr_happy_nice@reddit
OriginalPlayerHater@reddit
offlinesir@reddit
OriginalPlayerHater@reddit
Enfiznar@reddit
OriginalPlayerHater@reddit
NikBerlin@reddit
mxforest@reddit
OriginalPlayerHater@reddit
ttkciar@reddit
maddogawl@reddit
ComprehensiveBird317@reddit
2053_Traveler@reddit