OpenAI usage breakdown released
Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 27 comments

I would have thought image generation would be higher... but this might be skewed by the fact that the 4o image (the whole ghibli craze) only came out in march 2025
https://www.nber.org/system/files/working_papers/w34255/w34255.pdf
Coldaine@reddit
Yeah, if you see the disclaimer, this excludes API usages. It's just people pasting stuff into the website, and hopefully most of the coders have learned that copying-pasting code into the website is generally not the way to go.
some_user_2021@reddit
Which one of these categories is the naughty stuff?
TurpentineEnjoyer@reddit
Probably somewhere between "realtionships and personal reflection" and "creative writing"
TechnoByte_@reddit
There is literally a "Games and Role Play" section above self-expression.
Though no way it's actually just 0.4%
TurpentineEnjoyer@reddit
oh, didn't see that down there in that weird graph format.
Savantskie1@reddit
Totally get that. Most people are going to want that kinda stuff private. And i'd like to believe, that people are going to realize that GPT isn't going to let them do that.
a_beautiful_rhind@reddit
It's the web interface so no "dark roleplayers" or any of that. Mostly casual and drive-by users.
Tedious_Prime@reddit
I find it difficult to believe that the volume of "health, fitness, beauty or self care" chats was more than 30% greater than the volume of "computer programming" chats. It seems the authors were also surprised by the low volume of computer programming chats and acknowledge that this is in contrast the findings of previous work analyzing chatbot usage.
InevitableWay6104@reddit
most ppl who use ai to help code use claude
saosebastiao@reddit
Hmmm...I can actually believe it. Health, Fitness, Beauty, and Self Care are like the biggest source of scams on the internet right now, and so many content mills for fake "research", which is then fed to dumb influencers who then promote it to their followers, with their social media algorithms gamed by engagement bots.
-main@reddit
Excludes API usage, for example all the people on codex-cli.
LagOps91@reddit
it's open ai - the mainstream platform. most who use it aren't aware of alternatives. i'm not too surprised. with claude you would see much more coding.
InitialAd3323@reddit
Most of the coding tasks are probably done either with Cursor or GitHub Copilot (or similar tools), and most people go to other models like Claude, Gemini or the Chinese ones
DeltaSqueezer@reddit
I believe it. The coding seemed low, but then I think maybe most coding is done with Claude or cheaper/free models like Gemini, Qwen, DeepSeek. For me, OpenAI isn't even on the candidate list for coding.
relmny@reddit
why do you post it here?
kaggleqrdl@reddit
How much of this versus their API usage? Eg, if API usage is 90% of their token generation, these results might not be super relevant.
CheatCodesOfLife@reddit
I'm guessing they can't see or publish the API data since the API is supposedly private vs the consumer product where they can read the chats.
55501xx@reddit
The paper says ChatGPT, which is specifically the consumer product.
TurpentineEnjoyer@reddit
"Begging it to do what I actually asked without making baseless assumptions about what it thinks I really want."
I don't see myself on that list at all.
GlowiesEatShitAndDie@reddit
Is that not a massive waste of tokens?
hinsonan@reddit
What is up with this graph and why do people suck at making graphs
llmentry@reddit
It's area-proportional. You can see the overall contribution to the total by the area, but the contribution to the category by the vertical.
I thought it was a surprisingly good and useful graph, personally. (Anyone know what R package this is using?)
nikgeo25@reddit
They should've sorted the bars based on size, but otherwise it's quite informative and easy to read imo
llmentry@reddit
Yeah, the horizontal order is unhelpful. I wouldn't have minded if they were thematically-grouped, but having the second category as "Other/Unknown" is just very random.
I'd guess the categories are factor levels, and so it's ordered by default by the first appearance of the category in the data, but it wouldn't have been hard to reorder for the plot!
ShinyAnkleBalls@reddit
Especially people at OpenAI
BrowsingLeddit@reddit
So uh how did they get access to this sampling of chats? What happened to consumer privacy openAI was just talking about?
michaelsoft__binbows@reddit
I'd just like to give a kudos to the author of the paper for employing a well thought out visualization system where bars are shaped with area proportional to quantity.
With that said there is some room for improvement as the intensity of color is not being bound here to any useful quantity