obviously you are doing a ton of cache hits to pay 30usd for this amount of tokens. why are you assuming you would not hit that with oai?
The simple heuristic is that at its most expensive, deepseek is 40x cheaper for output (10x cheaper for input)
the DeepSeek console doesn't provide a simple way to test this. But looking at one day, I'm about at 50% cache hits.
https://preview.redd.it/niycmr7makce1.png?width=558&format=png&auto=webp&s=a2792b6302312b8dc653e4e1a6643f49e1573705
The export to .csv contains it as a breakdown, and allows you to use formulas to see the exact cost breakdown.
After seeing this post I have given it a go for dataset generation and am very happy with its output at a cost of $8.41 for what gtp4o for similar output would cost $293.75
Seems about right. This aligns with my cost effectiveness calculations
https://dubesor.de/benchtable#cost-effectiveness
It depends how long your context carry over is, but either way 4o would be vastly more expensive. Even in best case scenario for 4o, it would be at least 40x more expensive.
The latest crop of Gemini models are seriously impressive (exp-1206, 2.0 flash, 2.0 flash thinking).
But like your comment alluded to, the rate limits are a joke. For my use case they weren’t even an option. Hopefully when they become “GA” google will ease up on the limits because I really think they have a ton of potential.
lol I’m a software guy, GA usually means “Generally Available”.
I have no idea if that’s the best term for what I meant, which is: when they leave their “experimental” stage.
You mean if you use the free one? Gemini model APIs advertise 1000-4000 requests per minute for pay-as-you-go depending on the model and I've never hit limits, but I'm not sure if there's some hidden limit you're alluding to which I've somehow narrowly avoided. I'm just not sure we should be comparing paid api limits with free ones.
IMHO it compares on equal footing to sonnet or o1 for coding BUT it lacks in context window severly. So if your task is short it is wonderful. But if I give it a few thousand lines of context code it looses its edge
I've been running a few agent experiments with Cline, giving simple dev tasks to o1, sonnet 3.5, Deepseek, and gemini.
If I were to rank them based on how well they did:
(best) Claude -> o1 -> Deepseek -> Gemini (worst)
Here's a cost breakdown of 1 of the tasks that they did:
Basically they had to setup a dev environmnent, read the docs on a few tools (they are new or obscure so outside training data; by default asking LLMs to use those tools they either use the old API or hallucinate things) and create a basic workflow connecting the three tools and write tests to ensure they work.
1. **Claude 3.5 Sonnet**
* First to complete
* Tokens: 206.4k
* Cost: $0.1814
* Most efficient successful run
* Notable for handling missing .env autonomously
2. **OpenAI O1-Preview**
* Second to complete
* Tokens: 531.3k
* Cost: $11.3322
* Highest cost but clean execution
3. **DeepSeek v3**
* Third to complete
* Tokens: 1.3M
* Cost: $0.7967
* Higher token usage but cost remained reasonable due to lower pricing
4. **Gemini-exp-1206**
* DNF
* Tokens: 2.2M
* Multiple hints needed
* Status: Terminated without completing setup
Of the 3 that succeeded, deepseek had the most trouble; it needed several tries, kept making mistakes and not understanding what its mistakes were.
o1 and Claude were better at self-correcting when they got things wrong.
Note: cost numbers are from usage via openrouter, not their respective official apis
May I ask, how many requests per day does that translates to? I am kind of a newbie here!
Also, will the previous conversation/context be added into the total used tokens? Or it is generally used with a single fully detailed request without forwarding the past conversation?
many many many.
The only way you get to these numbers is with Agents. Most likely big code projects.
Request is not a great measurement. Normal short questions are 500 Token.
A request in your codebase can take 100K Tokens.
Well, there is not a single golden formula. OpenAI tells you that "1 word = 1.25 tokens" which is more or less true for common English texts.
But, depending on the model's tokenizer, how specialized a domain is, or for other languages, 1 word can amount to anything between 1.5-7 tokens.
you can automate over real data and ask the AI to summarize or format it in a better way. For example, there are tv scripts online which you can ask the AI to turn the script into a summary.
It's highly task dependent, but you generally give an LLM your labels/label distribution and task it with creating the input data.
e.g. if you're making an NLP hospital readmission model, you'd find the prevalence of the event from literature, let's say its 10%, then you'd task the model to generate 900 notes for patients that WONT be readmitted and 100 notes where the patient WILL be readmitted.
When I use dir-assistant, it sends an entire context worth of a code repo to the LLM for every request. If I use Deepseek v3 (128k context size) and make a query every 5 minutes, that's over 10 million tokens per day.
If you’re coding heavily then you could easily clear that number, even without agents. Cline for example, if you make it do stuff in vscode, can spend 1M tokens in literally minutes
I tried to give a better estimate than the first reply but they're right: it's so many and really to answer, lol.
I estimated 100k tokens MAX per day when I'm using an AI all day.
To each 274 million, that'd be 2,740 *days!* I.e. 7.5 years of daily heavy use.
However, that number would be reached much faster with long context, like uploading and discussing books. So it really depends.
Do you guys still see a difference between Deepseek v3 from OpenRouter and directly through their API?
I only use OpenRouter, and V3 is always making garbage code. Super messy, no good understanding of subclasses, unmaintainable code, etc. Past 10k tokens it ignores way too much code and only works ok if I give it less than 4k tokens, but still inferior to Sonnet.
Sonnet 3.5 feels 10x better while working with my codebase.
Does DeepSeek analyze and harvest the tokens the chat completions contexts? They might get some juicy data for next-gen use cases (or future training).
Not sure, it could be those days i leave the syngen processes undisturbed, allowing them to get on with processing tokens. ive lowered the thread count recently.
Speaking of synthetic data creation... Something I'd love to see is if we can steer reasoning into scientific logical leaps... creating training data sets for things like I shorted out a battery and it sparked and glowed red, gas lamps glow too, they are crummy because x, I wonder if this can replace gas lamps and then scenarios on observation and hypothesis and experimental design all the way down the tech tree for power requirments, failure modes, oxidation fix, thermal runaway fix, etc until we get to tungsten filament in a vacuum chamber... for various different inventions.
Any thoughts on tips for how to generate quality synthetic data here given enough good examples manually created? They tend to not be able to think of these connections from my cursory look at it and I'd hate to have to manually do this.
I’ve been seeing issues in the last few days of requests taking a long time to process. Seems like there’s no published rate limits, but when they get overloaded they’ll just hold your request in a queue for an arbitrary amount of time (I’ve seen order of 10mins). Have not investigated too closely so I’m only 80% sure this is what’s happening.
Anyone else?
I'm definitely seeing fluctuations in response time for the same amount of input/output tokens. But it's usually around the 50%-100% increase, so a request that takes on average 7-8 seconds sometimes takes 14-15 seconds. But I haven't seen anything more extreme than that.
DeepSeekV3 works incredibly well my ReAct Agentic Framework
[https://github.com/quantalogic/quantalogic](https://github.com/quantalogic/quantalogic)
https://i.redd.it/zyvu6do2sqce1.gif
Sadly the promotional period will end on February 8, 2025 at 16:00 UTChttps://api-docs.deepseek.com/news/news1226
https://preview.redd.it/bbwk3cdwlqce1.jpeg?width=916&format=pjpg&auto=webp&s=62dc21c2dc0005d44740f94dac18d22d31cea89f
I assume it's the same reason I get news of new video, audio, and not yet released local models.
Because it's interesting enough to share with the community that is primarily based on running their own llama models.
It's interesting in this case to see both the sheer number of tokens generated as well as how cheap it was to do so.
May also play a part, I had fun with local models because it was free for me as I don't pay for the electricity, thus it was the cheap option so tangentially I find cheap models interesting.
As if the other companies aren't? Anything you type into any model online is being saved and used or sold. If this bothers you, learn to run a local model
According to the terms of use and privacy policy, OpenAI and Anthropic don't use the user's API calls to train models. But according to the privacy policy of and terms of use of the Deepseek, they do use the user's API calls to train models. I don't work for any one of these companies. Just wanted to let others know as many developers working with sensitive data. Yes privacy this is what we all agree and are here.
Many organizations need compliance with data protection laws, GDPR, SOC2, HIPAA, and more, knowing that there is training on API calls is important. For instance, in the hospital where my wife works, they have to comply with HIPAA, and they need to know how to make sure that the patients data are safe as this is required by law.
I run a customer service SaaS with ai. Hospitals from the EU configure their own endpoints running gpus from local data centers due to HIPAA, they don't trust openai even though they claim they're compliant.
I am not advocating for OpenAI, neither OpenAI nor Anthropic uses your API call data to train their models. This is not something you'll find in their terms-of-use pages or privacy policies. As LLM devs, you know full well how easily these models can generate training data, and some even say that LLMs only memorizes instead of generalization. Some of this data is deeply personal, like patient diagnoses, financial records, sensitive information that deserve privacy.
I'm using DeepSeek V3 for synthetic dataset generation for fine tuning a model on a proprietary programming language. They can use all the data they want, if anything it might hurt their next pretraining lol...
It’s still not as good a Claude unfortunately… I’ve given it a couple of tests like powershell scripts and asked questions, it still struggles to complete the request as well as Claude does.
How is the API policy regarding privacy? Are your api requests also used for AI training/their own good or is it only when using their free chat option? If anyone knows for certain please let me know. Thanks!
Because for my use case (synthetic dataset generation), I've tested several models and other than gpt-4o or Claude nothing gave me results anywhere close to it's quality (tried Qwen2.5, Llama 3.3, etc.).
I do not own the hardware required to run this model locally, and renting out an instance that could run this model on vast.ai/runpod would cost much more (with much worse performance).
A completely custom python script which is quite elaborate. It grabs data from technical documentation, pairs that with code examples and then sends that entire payload to the API. I have 5 scripts running concurrently with 12 threads per script.
It's not even about cost, as far as I can tell, DeepSeek have absolutely no rate limits. I'm hammering their API like there's no tomorrow and not a single request is failing.
I don’t use cline but isn’t there any error code/reason for the request failing. I have to say that for me, stability of this API has been absolutely stellar. Maybe 0.001% failure rate so far.
That is the main cost here, they are basically buying the data for the price difference. The fact that you are using it for synthetic data gen and nothing private is brilliant.
Amazingly, Deepseek will have tons of synthetic data to train their next model. With all this synthetic data, in addition to the treatment that they will probably apply, they will be able to make an even better adjusted version with v3.5 and later create an absurdly better v4 model in 2025.
I’m currently using it for synthetic dataset generation with no multi-step conversations so it’s not really an issue, each request normally never goes over 4000-5000 tokens.
I seem to have answered out of reply one sec:
"For webapps, it's ok. Back end and api building and postgres and basic sqlite can do it itself.
Connecting to the frontend has issues and I've called Claude $6 to solve what it can't. Price wise this is amazing for what it can do"
Additionally, my issue with Claude is both the price, and the barrier to entry for API. I've only ever spent $10 +$5 free, and the 40k context limit per minute is 1 question.
For webapps, it's ok. Back end and api building and postgres and basic sqlite can do it itself.
Connecting to the frontend has issues and I've called Claude $6 to solve what it can't. Price wise this is amazing for what it can do
182 Comments
freecodeio@reddit
indicava@reddit (OP)
lessis_amess@reddit
indicava@reddit (OP)
lessis_amess@reddit
indicava@reddit (OP)
Quiet_Debate_651@reddit
SynthSire@reddit
dubesor86@reddit
RageshAntony@reddit
dp3471@reddit
indicava@reddit (OP)
freecodeio@reddit
indicava@reddit (OP)
Mickenfox@reddit
indicava@reddit (OP)
Alexs1200AD@reddit
AppearanceHeavy6724@reddit
cgcmake@reddit
indicava@reddit (OP)
raiffuvar@reddit
Mickenfox@reddit
RegisteredJustToSay@reddit
raiffuvar@reddit
A_Dragon@reddit
torama@reddit
freecodeio@reddit
Zeitgeist75@reddit
Few_Painter_5588@reddit
BoJackHorseMan53@reddit
OrangeESP32x99@reddit
BoJackHorseMan53@reddit
OrangeESP32x99@reddit
torama@reddit
A_Dragon@reddit
CleanThroughMyJorts@reddit
Nervous-Positive-431@reddit
Aware_Sympathy_1652@reddit
Utoko@reddit
pol_phil@reddit
WeWantTheFunk73@reddit
pol_phil@reddit
frivolousfidget@reddit
BattleRepulsiveO@reddit
-Django@reddit
59808@reddit
l33t-Mt@reddit
Nervous-Positive-431@reddit
1ncehost@reddit
gooeydumpling@reddit
Stellar3227@reddit
Pvt_Twinkietoes@reddit
Substantial-Thing303@reddit
AriyaSavaka@reddit
mycall@reddit
BoJackHorseMan53@reddit
BGFlyingToaster@reddit
amdcoc@reddit
BGFlyingToaster@reddit
BoJackHorseMan53@reddit
mrjackspade@reddit
BoJackHorseMan53@reddit
indicava@reddit (OP)
RageshAntony@reddit
dairypharmer@reddit
raiffuvar@reddit
lolzinventor@reddit
Many_SuchCases@reddit
lolzinventor@reddit
Enough-Meringue4745@reddit
MatlowAI@reddit
lolzinventor@reddit
-Django@reddit
lolzinventor@reddit
MatlowAI@reddit
lolzinventor@reddit
poetic_fartist@reddit
Many_SuchCases@reddit
superfsm@reddit
Yes_but_I_think@reddit
Down_The_Rabbithole@reddit
lolzinventor@reddit
Mediocre_Tree_5690@reddit
FriskyFennecFox@reddit
lolzinventor@reddit
indicava@reddit (OP)
hotpotato87@reddit
x3derr8orig@reddit
dairypharmer@reddit
indicava@reddit (OP)
raphaelmansuy@reddit
pacmanpill@reddit
raphaelmansuy@reddit
bannert1337@reddit
indicava@reddit (OP)
FPham@reddit
AssistBorn4589@reddit
MINIMAN10001@reddit
ILoveYou_Anyway@reddit
douglasg14b@reddit
throwaway1512514@reddit
douglasg14b@reddit
Captain_Pumpkinhead@reddit
Charuru@reddit
thibautrey@reddit
Charuru@reddit
TenshiS@reddit
kellencs@reddit
TenshiS@reddit
kellencs@reddit
ScoreUnique@reddit
Charuru@reddit
mailaai@reddit
Professional_Helper_@reddit
BoJackHorseMan53@reddit
frivolousfidget@reddit
BoJackHorseMan53@reddit
Professional_Helper_@reddit
ticktockbent@reddit
mailaai@reddit
ticktockbent@reddit
mailaai@reddit
freecodeio@reddit
mailaai@reddit
freecodeio@reddit
ThaisaGuilford@reddit
mailaai@reddit
ThaisaGuilford@reddit
BoJackHorseMan53@reddit
mailaai@reddit
mailaai@reddit
Apprehensive_Dog1267@reddit
indicava@reddit (OP)
franckeinstein24@reddit
Zestyclose_Yak_3174@reddit
indicava@reddit (OP)
Zestyclose_Yak_3174@reddit
ihaag@reddit
ESTD3@reddit
indicava@reddit (OP)
ESTD3@reddit
PomegranateSuper8786@reddit
indicava@reddit (OP)
Many_SuchCases@reddit
indicava@reddit (OP)
Miscend@reddit
indicava@reddit (OP)
shing3232@reddit
indicava@reddit (OP)
remedy-tungson@reddit
lizheng2041@reddit
indicava@reddit (OP)
businesskitteh@reddit
indicava@reddit (OP)
Many_SuchCases@reddit
the320x200@reddit
indicava@reddit (OP)
frivolousfidget@reddit
foodwithmyketchup@reddit
indicava@reddit (OP)
fallingdowndizzyvr@reddit
maddogawl@reddit
Unusual_Pride_6480@reddit
indicava@reddit (OP)
Unusual_Pride_6480@reddit
indicava@reddit (OP)
Unusual_Pride_6480@reddit
zero_proof_fork@reddit
rorowhat@reddit
MarceloTT@reddit
indicava@reddit (OP)
CascadeTrident@reddit
indicava@reddit (OP)
330d@reddit
Dundell@reddit
indicava@reddit (OP)
Dundell@reddit
ab2377@reddit
Dundell@reddit
NeedsMoreMinerals@reddit
indicava@reddit (OP)
CloudDevOps007@reddit