Tik Tok saved $300000 per year in computing costs by having an intern partially rewrite a microservice in Rust.

[-]

TadpoleNo1549@reddit

yeah “optimization doesn’t matter” is one of those takes that only works until scale hits, this is a perfect example of where it actually matters, cpu bound plus high traffic = real money, also crazy that an intern pulled that off, feels like the real takeaway is optimize where it counts, not everywhere

This is EXACTLY why Rust is taking over! 🚀 An INTERN saved $300K/year - imagine what a full team could do. The performance gains are insane: 78% to 52% CPU usage, latency cut by 4x! This is the future of systems programming. Companies sleeping on Rust are literally burning money.

[-]

UpstairsKnown372@reddit

They will save more money if they switch to Go

[-]

byteNinja10@reddit

This is really impressive. Shows how performance optimization can have a direct impact on costs. The fact that an intern was able to do this is even more interesting - it means the ROI on choosing the right language for the right task can be huge. Would love to see more companies being transparent about these kinds of wins.

[-]

WiseWhysTech@reddit

Hot take: “Don’t optimize” is lazy advice. Optimize after profiling.

Why this TikTok story matters: It shows the trifecta—lower CPU, lower memory, lower p99—and 2× throughput. That’s real money saved at scale.

What to do in practice: 1. Profile first: flamegraphs, pprof, tracing → find the top 5% hotspots. 2. Tighten the algorithm: data structures, batching, cache-aware layouts, fewer allocations. 3. Surgical rewrites: keep 95% in Go; rewrite only the hot path (FFI/gRPC) in Rust/C if it pays back. 4. Guardrails: prove gains with A/B, load tests, p50/p95/p99, cost per request. 5. Reinvest wins: fewer cores → smaller bills → headroom for features.

Bottom line: Performance is a product feature. Measure → fix hotspots → ship.

[-]

Hax0r778@reddit

drop from 78.3% CPU usage to 52% CPU usage. It dropped memory usage from 7.4% to 2.07%, enabled the micro-service to handle twice the traffic

These numbers don't seem to add up... was traffic not limited by CPU or memory? How does dropping the CPU by 33% allow doubling the traffic?

[-]

Hopeful_Lettuce9169@reddit

I can answer this. It's because the physical strain above 70% is actually much worse and not linear. It's being utilized 70% and the cache is being split.

50% is absolutely night and day for utilization.

[-]

Weary-Hotel-9739@reddit

Probably garbage collection vs non. While average loads may only be 33% better, peak bursts lead to more-than-average GC, which doesn't occur under Rust. Sadly, average numbers are really bad for showing this, but come on, this is LinkedIn.

[-]

MasterLJ@reddit

Our compensation compared to our ROI to a business can vary WIIIILLLLDLY.

I had a coworker that saved \~$160M over 3ish years by optimizing some ML models (that dictated pricing).

A friend of mine works for a company that won't let him do optimizations to trim their $12M/month cloud bill because they are minting money off new features.

This is a really cool story for the intern but the ROI isn't crazy by any stretch. A $50k/year intern has HR, payroll, facilities and equipment costs (\~$100k total)... and unless there are already Rust experts at TikTok (which I'm guessing not because the intern did this), TikTok just gained exposure to a new tech stack; security, updates, compliance, maintenance, that could conceivably negate the savings.

[-]

Hopeful_Lettuce9169@reddit

I empowered something like +$300m in revenue once, and made less than 300k lol

[-]

Weary-Hotel-9739@reddit

A $50k/year intern has HR, payroll, facilities and equipment costs (~$100k total)

Sorry, but you assume that employees are the limiting factors. We're currently on a declining job market. If you can send any employee - especially interns - on a month-long trip that at least covers their cost, it's already good enough for that employee and their manager.

The alternative is often not not hiring someone to do this, but to have engineers doing nothing, or useless tasks to keep them busy enough. Especially the good ones, after all, they might get offers from the competition even with overall jobs down. So to speak, they're free labor.

[-]

MTGGradeAdviceNeeded@reddit

+1 unless rust was used already at tiktok / planned to be largely rolled out, then i’d go even further and say it sounds like a business loss to have that new stack and need to maintain it

[-]

Full-Spectral@reddit

But of course by that argument we'd all still be writing Fortran or Pascal. Every company that has a Go or Rust or whatever code base to start from having no Go or Rust or whatever.

[-]

JShelbyJ@reddit

Rust is used at every major tech company to some degree, and TikTok is no exception.

[-]

Voidrith@reddit

unless its used as a stepping stone to broader adoption - the first time using a new stack is always the most expensive and complicated, but once its in the org its generally much easier/faster/cheaper to use in the future

Nowadays, many developers claim that optimization is pointless because computers are fast, and developer time is expensive

The key word here is "scale". One of the major challenges with scaling a company is recognizing that you're transitioning from "servers are cheaper than developers" to "developers are cheaper than servers", and then navigating that transition. The transition is made extra tricky because you have three stages:

Server bills are low enough that the engineering effort to improve performance won't pay for itself in a practical amount of time
Server bills are high enough that engineering effort on performance work pays off, but low enough that the payoff is lower than if you spent that engineering effort on revenue-generating product work.
Server bills are high enough that focusing on performance is worthwhile.

A certain type of engineer (e.g. yours truly) would rather focus on that performance work, and gets really frustrated with that second step, but it's objectively a bad choice.

[-]

Mundamala@reddit

I think the key word here is intern. This person likely never got any credit or near the pay they should have received. Even on a frontpage post remarking on their achievement, they're 'an intern.'

[-]

Pleasant_Guidance_59@reddit

The intern was embedded into a larger engineering team. It's not like they heroically discovered the potential, rewrote the entire thing on their own and shipped it without more senior engineering involvement. More likely it was a senior engineer who suggested this as their internship project, and the intern was assigned to rebuild the service with oversight of the senior engineer. Kudos for going a great job of course, but they likely can't really take credit for the idea or even the outcome. What they do get is a great story, a strong reference on their resume and proven experience, all of which will help them land a good job in the end.

[-]

Bakoro@reddit

From my own experience, it's entirely possible that the person really just is that good, or the original code was that bad.

I've been in that position, it's not even that the original person was a bad developer, they were just working outside their scope and made something "good enough", while me fresh out of college had the right mix of domain knowledge to make a much better thing.

Then there was stuff that was just spaghetti and simply following basic good development practices took the software from near daily crashes, to monthly, and then eventually zero instability.

This, at a multi-million multi-national company that works with some of the most valuable companies in the world.

[-]

Weary-Hotel-9739@reddit

From my own experience, it's entirely possible that the person really just is that good, or the original code was that bad.

Again, we're talking about an intern. For a company that actually wants to make money and survive for longer than a month. I get what you mean, but optimizing any program is incredibly easy. Not breaking everything with your optimization is hard.

If you're hired as a consultant or similar, the worst that can happen is that your contract will not be renewed. That gives you some freedom. As an intern, you're gone, and potentially the whole team too.

It's just that people fresh out of college often times really don't have nearly enough domain knowledge that they know how much domain knowledge is missing.

[-]

Bakoro@reddit

Intern status is immaterial. What we are really talking about is an unusual event noteworthy enough to get reported on, at a global organization of such scale that even small optimizations can mean six figure dollar amounts.

The above person was saying that it's entirely unlikely that the intern was actually the prime mover for the change and shouldn't really get credit, and I'm saying that it's entirely possible that it was the right person in the right place, who had the right mix of knowledge to identify and make the change, and they should absolutely get credit for the improvements they made, because a different person in the exact same position wouldn't have had the same success.

And again, I know because I've been there, I've been the person to walk in out of nowhere and solve the problems that more experienced developers couldn't solve, because I had the right perspective and the right knowledge for those problems. If I had gone to a different company then I would have been a middle tier nobody, but instead I happened to find a place that needed my exact skill set.

[-]

Weary-Hotel-9739@reddit

And again, I know because I've been there, I've been the person to walk in out of nowhere and solve the problems that more experienced developers couldn't solve, because I had the right perspective and the right knowledge for those problems.

I've been on both sides of this, and recently, I'm really afraid of this stance.

Optimization and even six figure savings in mid sized companies is incredibly easy to do. It's hard to do without loss. I once had a consultant from a pretty famous larger agency 'optimizing' a workflow some years ago. He literally deleted both the validation and the transaction management to get speed ups. Was that good? Bad? Depends. But Dunning Kruger is a thing. If you know nothing about the context, optimizing without breaking context to your knowledge is pretty easy. Especially if you're not there long enough to ever learn the truth.

On the other hand, I was in your shoes too once. I was good at programming but I didn't know the difference to 'developing' yet. Of course, 80% of the time I still delivered great work. But the question is: is 80% acceptable? Again, it depends.

I find it highly unlikely that an intern actually has the time to analyze a billion dollar system, and reimplement full capabilities of a subsystem without loss of context. Maybe he was necessary as a piece of the solution. Maybe he even is a genius and really did it all by himself. But most likely, someone gave him a task because he had used Rust before, and enabled him with documentation and political coverage. They gave him the tools to do a task that was mostly coding.

Because, and that is reaaaally important: if you're a multi-billion dollar company that is part of an international conflict between two superpowers, you don't let your intern deploy untested code to prod, even if he is a genius. Hell, he might be an idiot - or worse, an attacker.

[-]

leros@reddit

That's just how jobs work. You agree to do work for some fixed fee (hourly rate or salary). You get that pay regardless of your performance. You can generate 0 value or tons of value or even hurt the company and you still get paid. Low risk, low reward.

If you want pay to be tied to your performance, become an entrepreneur. It's higher risk but potentially higher reward.

It's also hard in big companies to tie any result to a single person. I've built products/features that have generated tens of millions of dollars in revenue. But that happened in an ecosystem of all the other work being done in the company, so the value I generated really needs credit distributed over a lot of people indirectly involved like marketing, operations, and developers who build other parts of the system.

[-]

Weary-Hotel-9739@reddit

That's just how jobs work. You agree to do work for some fixed fee (hourly rate or salary). You get that pay regardless of your performance. You can generate 0 value or tons of value or even hurt the company and you still get paid. Low risk, low reward.

Completely true, but this is the reasons why companies suck at some point. From a cost-benefit standpoint, doing the least amount of useful work is best for every employee, and if upper management does not compensate, you'll not even get what you were paying for in the first place. Meanwhile productivity should actually go up over time, so you're loosing double.

Makes you wonder about FAANG.

[-]

leros@reddit

Big companies are generally not efficient anyway. The bigger you get, the more effort goes into communication rather than direct output so ICs aren't producing that much. Plus competent people tend to get pushed up into management.

[-]

Superg0id@reddit

And this is late stage capitalism to a T.

I can't pay my rent in "exposure" guys... especially as I've exposed that "as an intern, I accepted being paid jack sh!t, in order to maybe get lowballed for another job in the future"... but I'm stuck doing this since noone will pay me what I'm worth..

[-]

caltheon@reddit

you act like having proven track records on your resume doesn't offer any value, which is only true if you lie on your resume.

[-]

Superg0id@reddit

ha. Sure, put it in your resume.

But that doesn't feed you now.

Meanwhile the company "saves" that money, and what do they do?

Executive bonus for the person who came up with the "intern" idea..

[-]

maxintos@reddit

You think the intern was doing some hero work on his own time on top of the normal duties he was given?

Usually it's the senior employees that decide what the intern is going to work on and does a lot of support.

That second scenario becomes crucial to tackle earlier rather than later (in SAAS) if there are plans to onboard or keep big customers. Not ideal letting poorly maintained code be the reason for churn, or a new customer to cost more than they are paying because someone didn't look at the data and anticipate the very predictable future...

[-]

syklemil@reddit

Plus you need people who are actually able to focus on performance, including being familiar with relevant technologies. If the company only starts looking for them or training them in stage three, they're behind.

[-]

pinkjello@reddit

I’m not sure I agree. There have been times at work where we identify a bottleneck, investigate, do a spike to research solutions, find one, then implement. Sure, it takes longer than if the team were already familiar with the solution, but it’s not insurmountable. You stand up a POC, then refine it.

[-]

syklemil@reddit

But it does sound like you're familiar with the technologies you'd use to resolve performance issues? Not everyone is good at finding performance issues, or tell the difference between various kinds of performance issues, or know how to resolve them, which can result in a lot of voodoo "optimization".

As in, we have metrics for p50, p95 and p99 latencies for various apps, but I'm not entirely sure all the developers know what those numbers mean. Plenty of apps also run with incredible amounts of vertical headroom, with some of the reasons seeming to be stuff like :shrug: and "I got an OOM once".

[-]

caltheon@reddit

The point is you don't need know how to fix it to bring in experts that do know how, you only need to identify it, and even that can be done by a competent performance engineer pretty quickly as long as you have basic observability. You can't afford to have performance focused engineering until you hit step #3, and it isn't necessary. Having double skilled engineers is obviously best case scenario, but like most unicorn scenarios, it's not something you can guarantee.

[-]

pinkjello@reddit

Exactly. Having specialized experts on hand for when something may inevitably will arise isn’t cost effective. Better to have smart, adaptable people on hand who know how to identify a problem, learn what they need to learn to fix it, and consult an expert if that isn’t good enough, or it’s pervasive enough of a problem to shell out for top tier expertise.

[-]

pdpi@reddit

a new customer to cost more than they are paying

That's just your average VC-funded Tuesday!

[-]

I think (2) is actually pretty rare. We assume that our work leads to increased revenue, but if it was that easy, every company would be wildly successful. Most of the time, product improvements have no effect on revenue, so I think you need to heavily discount that effort too. Cost saving work is usually very low risk in that it's very likely to actually lower costs.

[-]

Kissaki0@reddit

but it's objectively a bad choice

If we scope a bit wider than just direct monetary investment vs gain, investing in that analysis and change can have various positive side effects. Familiarity with the system, unrelated findings, improved performance leading to better UX or better maintainability X, a good feeling for the developer (which makes them more interested and invested), etc. Findings and change can also, at times, prevent issues from occurring later, whether soon or more distant.

With systems you are either feeding the beast (adding resources) or slaying the beast (optimizing for performance).

As a PreSales engineer, I’ve found that people prefer to purchase their resources from people who apply substantial effort to the latter. Particularly since there’s always a point where adding resources becomes infeasible.

[-]

mr_dfuse2@reddit

that is a useful insight i didn't know, never worked in a company that went beyond step 2. thanks for sharing

[-]

BenchEmbarrassed7316@reddit

Although Rust is a much faster language than go, the main difference is in reliability. Rust makes it much easier to write and maintain reliable code. For example, a modern server is multi-threaded and concurrent. go is prone to Data Race errors. Rust, having a similar runtime with the ability to create lightweight threads and switch threads when waiting for I/O, guarantees the absence of such errors.

https://www.uber.com/en-FI/blog/data-race-patterns-in-go/

Uber, having about ~2000 microservices on Golang, found ~2000 errors (!!!) related to data races in half a year of analysis. But if they used Rust, they would have had 0 such errors. And also 0 errors related to null. 0 logical errors related to the fact that the structure was initialized with default values. 0 errors related to the fact that the slice was changed in an unexpected way (https://blogtitle.github.io/go-slices-gotchas/), 0 errors related to the fact that the function returned nil, nil (i.e. both no error and no result).

From a business perspective, it's a question of how much damage they suffered from these errors and how much money they spent fixing these errors. And how much money they constantly spend to prevent these errors from occurring again.

The last question is especially important. Writing code in Rust is faster and easier because I don't have to worry about a lot of things that can lead to errors. For example:

https://go.dev/tour/methods/12

in Go it is common to write methods that gracefully handle being called with a nil receiver

They use the word 'gracefully' but they are lying. The situation is stupid: the this argument in a method can be in three states: valid data, data that has been initialized with default values and may not make sense, and null at all. Many types from the standard library simply panic in the case of nil (which is definitely not 'gracefully'). It's a big and unnecessary burden on the developer when instead of one branch of code you have to work with three.

We already have horribly designed languages like Js and PHP. Now go has joined them.

[-]

Kyrra@reddit

https://www.uber.com/en-FI/blog/data-race-patterns-in-go/

The very first example was already fixed in Go. Specifically the:

Example 1: Data race due to loop index variable capture

This was fixed with the loopvar change in Go 1.22: https://go.dev/blog/loopvar-preview. And I feel like the loopvar bug was one that any compitent Go coder should have known. It's why most people would do something like the following before go1.22:

for _, v := range list {
  v := v 
  ...
}

And many of these can be caught with linters.

[-]

BenchEmbarrassed7316@reddit

the loopvar bug was one that any compitent Go coder should have known

Yes, this is exactly why Golang is a terrible programming language.

Would be awesome to get a dev from a larger studio to share their experience too!

[-]

Jaded_Ad9605@reddit

Look at the Friday fun facts fff from factorio.

It explains a lot,,, including performance stuff

[-]

vini_2003@reddit

I forgot to reply to your question of "how do we determine frame times?".

Largely, we cannot anticipate them. They vary in-engine based on assets and scenes. It is mostly an experimental process. You can, of course, use past experiences to roughly estimate how long something will take to execute, but most of the time... it depends.

It also depends on the graphics settings involved, quality levels and so on.

I'm afraid the answer is "lucky guess" :)

[-]

space_keeper@reddit

I once read something written by an old boy that was very interesting. The context was someone struggling to optimise something even using a profiler.

He said, in a nutshell: run the program in debug and halt it a lot, see where you land most often. That's where you're spending the most time and where the most effort needs to go.

[-]

Jaded_Ad9605@reddit

That's profiling by (low) sample rates VS profiling each function call...

[-]

Ok-Scheme-913@reddit

That sounds like doing what a profiler does, as a human.. that old boy may feel like going to a factory and doing some trivial task that is massively parallelized and automated by machines by hand.

Like literally that's what the CPU does, just millions of times, instead of the 3 "old boy" did.

[-]

space_keeper@reddit

MMcKevitt@reddit

A “domain driven detour” if you will

[-]

Scared_Astronaut9377@reddit

As someone who's been working for years in ML, big data, high performance computing, I reread your message like 4 times trying to understand the joke.

[-]

fiah84@reddit

a lot of us work much less glamorous jobs

[-]

Yeah but Rust isn't just fast, it's also easier to get right than almost any other language out there

[-]

rifain@reddit

Premature optimization is not pointless, it's essential. I don't know where this idea comes from but it's used as an argument from lazy programmers to write crappy code.

[-]

rangoric@reddit

Might want to look up premature in a dictionary.

Picking what is premature is hard, I do admit.

[-]

andrewfenn@reddit

Problem is people will use this phrase to handwave away simply planning and architecture. It's given rise to laziness and I think programmers should stop quoting it tbh except in rare cases it's actually valid.

[-]

oberym@reddit

Yes, it’s unfortunately the most stupid phrase ever invented, because it’s misused by so many inexperienced developers and rolls easy off the tongue. The outcome is figuratively speaking people using bubble sort everywhere first because that’s the only algorithm they cared to understand and only profiling when the product becomes unusable instead of using well known patterns from the get go that would just be common sense and as easy to use. Instead they drop this sentence and feel smart when someone with experience already sees an issue at hand.

[-]

CramNBL@reddit

This is exactly right. I'm going through it at work right now, multiple times in the same project, I've been brought in to help optimize because the product has become unusable.

I interviewed the 2 core devs at the start of the project, asked them if they had given any thoughts to performance, and if they thought I'd be a concern down the line. They hadn't thought about that, but they were absolutely sure that it would be no problem at all...

[-]

G_Morgan@reddit

It is because they don't include the full context of the quote. Knuth was not referring to using good algorithms and data types. He was talking about stuff like rewriting critical code in assembly language or similar.

[-]

Full-Spectral@reddit

There's a constant miscommunications on this subject. What is optimization? People don't agree on that. If you tell people not to prematurely optimize a lot of them will go off on you about being lazy and this is why all software is slow and all that. In a lot of cases, it's because they consider 'optimization' to be incredibly obvious things like using the appropriate data structure, which I don't consider to be an optimization. That's just basic design.

To me, optimization is when, after the basic design, which is understandable and no more complex than needed, you measure and decide that (for legitimate reasons not just because) that you need more performance, that you purposefully add complexity beyond the basic design to get more performance. And you definitely don't want to add complexity unless you really need it.

So the arguments just end up being silly, because we aren't even arguing about the same thing. Though I will argue that there is a tendency in the software world to optimize (in my sense of the word) when it's not necessary, just because people want to be clever, or they are bored, or whatever it is.

[-]

Nine99@reddit

It is because they don't include the full context of the quote.

Calling it "premature" is already making a judgment, so the actually meaningful part of the quote is the amount of evil, i.e. the second part of the quote.

[-]

oberym@reddit

And in this case it is totally valid. Unfortunately in practice, I've never heard it in this context but in discussions about the most basic things. And that's where the danger with oversimplified quotes lies. It's now used to push through the most inefficient code just because "it works for now" and avoid learning better general approaches to software design that save you more time right from the start. And hey it came from an authority figure and everyone is quoting it all the time, so it must always be true. It's more like using quotes out of context is the root of all evil.

[-]

SkoomaDentist@reddit

He was talking about stuff like rewriting critical code in assembly language or similar.

He wasn't doing even that. He was referring to manually performing micro-optimizations on non-critical code.

> so it was launched faster

Quite debatable. I haven't seen much evidence Go makes people launch things faster. For sure it makes them compile faster, but Go is not the language where you can say "if it compiles, it works".

[-]

catcint0s@reddit

I don't think this is debatable, it's from the employee who did the Rust rewrite and actually works there...

Golang’s simplicity, concurrency model, and fast compile times make it a fantastic choice for building and iterating on the majority of our microservices.

The linkedin post also mentions it but I think it's based on the blog post too

TikTok's payments preferred Go for its simplicity, concurrency and developer productivity

[-]

Full-Spectral@reddit

The thing is though building and iterating are not the ultimate goals of software development. That's a developer convenience, but the software is being developer to be used by other people, and it's their convenience (not up on saturday night trying to fix an issue) and their security and so forth.

[-]

coderemover@reddit

This is just how Go is perceived and this is at least partly because it was advertised like that. We have some Go teams in our company and from the outside I can’t see they are any more productive than developers using other languages.

So why people may think they are more productive, there is very little evidence they really are. Google actually measured that and found no significant differences between Go and Rust productivity.

[-]

All_Up_Ons@reddit

No one's saying you should delay your launch. They're saying that once you have launched and are making money, you can afford to look for these optimizations.

[-]

Qweesdy@reddit

They could've started working on it one day earlier, spent 1 day in rust instead of spending 1 day in some other language, had the launch date one day earlier, got to "$100 million per day" twice as fast because it's less laggy, and be getting $200 million per day now.

I did something like this to save on ru consumption and the graphs I made tracking the before and after…mamma Mia

Meanwhile our devops guys insisted on all ephemeral storage being limited to 5MB because they are too ignorant to realize stdout counts towards that.

The core essence here that's essential to grasp is the scale, as a company grows and develops and slowly becomes a giant, naturally many flaws will appear or 'inefficiencies' that , once solved, can save a ton of money.

That's also partially why companies are trying their best to get into AI.

Hope that helps.

[-]

MaterialRestaurant18@reddit

Oh brother....reddit...how can you believe such a bull fucking shit story when the source is LinkedIn and no name has been dropped.

[-]

InfinitesimaInfinity@reddit (OP)

There are three reasons why I believe it.

First of all, $300000 is relatively small in proportion to TikTok's revenue. Second of all, the intern wrote an article about it, which is at https://wxiaoyun.com/blog/rust-rewrite-case-study/ . Third of all, I saw a few other sources claiming it online, including another LinkedIn post, a Youtube video, a Reddit post, etc.

[-]

ChadiusTheMighty@reddit

Did he get a return offer??

[-]

InfinitesimaInfinity@reddit (OP)

I do not know.

[-]

Eeee but Tik Tok is not a startup

If your startup is - let's assume optimistically - just 100 times smaller than Tok- Tok and saves $3000 in that optimization that doesn't sound like worth of rewrite by intern anymore, does it?

If you're in hyper scale then of course optimisation matters, who has ever claimed otherwise?

[-]

-grok@reddit

yes thank you, watching people not understand this has been driving me nuts.

And even worse, that intern "rewrite" is likely riddled with bugs that will show up to customers as "why would I use this garbage product?" and hurt the startup's traction anyway. The value is easily in the negative $1M and not positive $300 bucks.

[-]

scodagama1@reddit

Nah it's not that bad probably - keep in mind this is intern bragging about rewrite

He had mentors, probably senior engineers who reviewed his code and guided him through implementation, rust experts on site and most importantly extensive suite of integration tests and state of the art production monitoring so he could confidently write couple of lines of code, deploy it to staging environment, let it bake for 2 weeks and merge

That's another thing that startups don't realize: these big corporations have top notch engineering practices and state of the art CI/CD and monitoring infrastructure. You don't operate 1.5b users business without entire business divisions focused solely on reliability.

And in my startup with a hosting cost of 2mil a year one service improving by 90 percent is a 1000 dollar savings. I'll bring you donuts if you don't bill more than $20 an hour.

[-]

ldrx90@reddit

Well sure, do the estimates before comiting to the work. I was mostly just thinking this amount of work for 300k isn't necessarily 'a couple dollars'. This amount of work probably doesn't go as far as 300k in savings though for most smaller places, for sure.

All I'm saying, is if I could rewrite a few endpoints in a new language and save 300k a year, I'd get a fat bonus.

[-]

safetytrick@reddit

Engineering is about cost/benefit. If it costs more than it benefits...

[-]

zzrryll@reddit

It wasn’t a startup. It was TikTok. So this change wouldn’t apply at the scale of any startup that would care about that savings.

kane49@reddit

Who the hell claims optimization is useless because computers are fast, that's absolute nonsense.

[-]

Pearmoat@reddit

"Many developers" of course, who some random dude on the internet invented so he'd have an argument that he can disprove in a post.

[-]

PatagonianCowboy@reddit

Usual webdevs say this a lot

"it doesn't matter if it's 200ms or 20ms, the user doesnt notice"

[-]

v66moroz@reddit

Nope, webdevs usually say "since my bottleneck is DB, it doesn't matter if my service is written in Ruby or Rust". Besides "normal" web app is easy to scale by adding boxes (hardware is cheap, isn't it?). May not apply to TikTok, but true for most business apps.

[-]

PatagonianCowboy@reddit

Well there is a webdev in the other comments literally insisting 200 and 20ms are the same because the user doesn't notice

[-]

v66moroz@reddit

It's not a usual webdev, web app doesn't usually have exactly one user. He's right about latency though.

[-]

no data, not an actual source

[-]

gheffern@reddit

Definitely not.

[-]

TA_DR@reddit

https://web.dev/articles/inp

"An INP below or at 200 milliseconds means a page has good responsiveness."

For that case is totally true. Once you reach <200ms there is no need to further optimize since users won't be able to tell the difference.

And of course that isn't neccessarily true for other cases.

[-]

Nine99@reddit

"An INP below or at 200 milliseconds means a page has good responsiveness."

You're quoting a subjective (and moronic) statement on a broken, incredibly slow website as a fact, and then follow that up with stuff you pulled out of your ass.

[-]

TA_DR@reddit

well every design choice is subjective. In this case it's google's design choice so I'm sure they have the data to back it up.

[-]

Nine99@reddit

Didn't know that Google devs can read minds now.

Hey, but I'm sure they've linked some research about it in the article. They don't? What a surprise!

[-]

It kinda depends how fast things improve. This was definitely an argument in the 80s and 90s.

You could spend 5 million in development time to optimize your program but back then the computers would basically double in speed every few years. So you could also spend nothing and just wait for a while for hardware to catch up.

[-]

VictoryMotel@reddit

It was even more important back then. Everything was slow unless you made sure it was fast.

Also where does this idea come from that optimization in general is so hard that it takes millions of dollars? Most of the time now it is a matter of not allocating memory in your hot loops and not doing pointer chasing.

The john carmack doom and quake assembly core loops were always niche and are long gone as any sort of necessity.

[-]

Coffee_Crisis@reddit

The point is that as long as you ship code that scales linearly or better there are generally very few opportunities to actually save money through performance optimization

[-]

VictoryMotel@reddit

Says who? Everything scaled linearly back then because click speeds were jumping up and instruction times were going down.

This idea that optimization was difficult or ineffective is just not true at all.

Where are you getting this idea and what is a real technical example?

[-]

Coffee_Crisis@reddit

I’m talking about now, and the OP is a good example - 300k is money TikTok finds in the couch cushions. If you don’t have that scale the optimization isn’t worth doing, “hard” is irrelevant. Its not about hard or easy, it’s about opportunity cost

[-]

VictoryMotel@reddit

I’m talking about now,

The thread wasn't about that

300k is money TikTok finds in the couch cushions. I

So what?

If you don’t have that scale the optimization isn’t worth doing, “hard” is irrelevant.

When dealing with teams 5 million of spent in no time.

[-]

VictoryMotel@reddit

What are you even talking about? With zero context you just pulled a number out on thin air.

Optimizing isn't that hard. You profile and life things out of hot loops, mostly memory allocation. In modern times you avoid pointer chasing and skipping around with memory access.

Have you ever done this before?

In the 80s and 90s it was all about speed. If you just waited for computers to speed up someone else was going to move in on your territory. A fast program was still going to be faster on a new computer.

[-]

DevilsPajamas@reddit

Your comment reminded me of the tv show Halt and Catch Fire... one of my all time favorite shows.

[-]

versaceblues@reddit

its more that people care less about optimization in the early stages. Which is good.

If you are launching to <10,000 customers then time to market is better than optimizing for CPU cycles.

IF you are serving at global scale. Then optimization can actually translate to cost

[-]

TimMensch@reddit

Yes and no.

If you launch to a smaller number or customers but then get a usage spike that kills your servers, you'll be hemorrhaging customers until you can rewrite it to be a decent architecture.

A good developer can optimize to the point of reasonable scaling in less time than a mediocre developer can create a really purely optimized backend. I've seen several backends that were so badly optimized that scaling to just a dozen users caused each user to need to wait ten seconds to do anything. Whereas the same server rewritten by a skilled backend developer could hit a million users with low latency.

I've also seen projects completely killed when they realized the backend would cost more to run per user than the users were willing to pay.

The problem is that it takes a strong developer/architect to do it right the first time, and we're expensive. Not as expensive as losing customers and needing to rewrite later while losing customers though.

Sure? I'm saying that a good developer really doesn't need to spend longer to get to a point of reasonable performance.

The failure is usually in not hiring a good developer. Spending three weeks for a tiny optimization is a rookie mistake.

[-]

beefz0r@reddit

Optimization is only useless when it never hits 100%

[-]

alkaliphiles@reddit

It's really about weighing tradeoffs, like everything. Spending time reducing CPU usage by 25% or whatever is worthwhile if you're serving millions of requests a second. For one service at work that handles a couple dozen requests a day, who cares?

[-]

uCodeSherpa@reddit

who cares

Your users suffering 50 second web page loads care a lot.

/r/programming has this huge skill issue with thinking about their application from the user perspective. I swear none of you people ever actually use the dogshit you pedal.

[-]

NYPuppy@reddit

Because it adds up.

Developers take that attitude with apps they write and now everything ships a web browser and runs slow.

[-]

dangerbird2@reddit

Also there’s an inherent cost analysis between saving money on compute by optimizing vs saving money on labor by having your devs do other stuff

[-]

alkaliphiles@reddit

Prefect is the enemy of good

And yeah I know I spelled that wrong

Also depends if you're doing on-prem or cloud. If you've purchased the machine, using 50 vs 75 percent of its CPU doesn't really matter unless you're opening up a core for some other task.

[-]

particlemanwavegirl@reddit

I don't really think that's true either. You still pay for CPU cycles on the electric bill whether they're productive or not. Failure to optimize doesn't save cost in the long run, it just defers it.

[-]

hak8or@reddit

There's some truth in the sense that it's often better to have really simple and understandable code than doesn't have optimizations rather than more complex optimized code that may lead to confusion and bugs

Personally in my career in big tech I've never really done optimization, and that's not a matter of accepting bad performance, it's just a matter of writing straightforward code that never had major performance demands to begin with

Yes, I'm comparing it to serde. I don't quite agree: Rust uses macros in this case for compile-time reflection (you just need to know the field names and types). But that's not the only problem with go.

https://www.reddit.com/r/programming/comments/1okf0md/comment/nmg4uzm/

Check these examples. go uses a separate heap allocation for each optional field.

https://www.reddit.com/r/programming/comments/1okf0md/comment/nmc8hvg/

This comment and the replies to it discussed other options. In short, it's a choice between bad and bad.

Here is go code:

https://go.dev/play/p/-RT_7qcUYwP

go run -gcflags="-m" test.go

./test.go:32:6: moved to heap: fullOuter
./test.go:33:6: moved to heap: emptyOuter
./test.go:34:6: moved to heap: fullOpt
./test.go:35:6: moved to heap: emptyOpt
./test.go:64:6: moved to heap: localU64

As you can see, without pointers, go cannot distinguish between null values and zero values. And in the case of pointers, we will get heap allocations (although it also puts the local variable on the heap, which is generally funny).

And here is Rust:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bee463333d2a161d1bbe0841c57559f4

Address of full: 0x7ffff18d62c8
Address of full.a: 0x7ffff18d62d0
Address of full.b: 0x7ffff18d62e8
Address of local_u64: 0x7ffff18d63e8

It's definitely stack.

[-]

Background_Success40@reddit

Thanks for the examples! Much appreciated.

[-]

richizy@reddit

I dunno if that is "poor" design. It's consciously trading raw performance for ergonomics, as the dev doesn't have to worry about writing deserialization code (ie just use proto.Unmarshal).

There are ways to deal with allocation. You can manually check and extract the optional fields (lazy loading). Or you can make the proto completely flat so no extra allocations.

There's also custom proto code generators like vtprotobuf (by Vitess/Planetscale) which does allocations optimally.

[-]

The check should be located where it is checked whether this field exists or not.

// pseudocode
deserialized = deserialize(json)
if deserialized.geoPoint.isNotNull {
    process(deserialized.geoPoint)
    // process actually only needs x and y
}

If you add an extra field - 'process' will actually receive an additional argument 'isVaild'... What to do with it? Check? Ignore? If I work with 'process' function in a few months - I have to somehow guess that 'isValid' was added solely because of the broken deserialization and not because it is some kind of coordinate check that makes sense from a business logic point of view. And if I create 'GeoPoint' separately in code - should I always add 'isValid: true'?

Why wouldn't it work for scalar types?

Because you can't add additional field 'isValid' to int.

[-]

egonelbre@reddit

As I mentioned there are multiple variations on that approach. I suspect you are looking for this in that scenario:

type Optional[T any] struct {
    Valid bool
    Value T
}

// pseudocode deserialized = deserialize(json)
if deserialized.geoPoint.Valid {
    process(deserialized.geoPoint.Value)
}

You can see similar types in quite a few places, e.g. https://pkg.go.dev/database/sql#Null. Don't get me wrong, it's annoying that each package ends up implementing their own types.

Because you can't add additional field 'isValid' to int.

See above.

[-]

BenchEmbarrassed7316@reddit

Now you just have to make sure that the deserialize function (which is most likely imported from another package or standard library) can correctly work with this Optional data type and will not try to look for the Valid and Value fields in the serialized data.

ps I also think that generics should have been there from the very beginning of go.

[-]

go has very bad serialization and deserialization. Either runtime reflection or third-party code generation programs are used. go also has a very bad type system due to nil and default values: if a numeric field in a structure that you are trying to get when deserializing is optional, you will either get 0 both when it is 0 and when it is null (which is a gross logical error) or you are forced to make this field a pointer. If you have an optional nested structure - you get a bunch of unnecessary allocations + unnecessary work for gc. go compiler does very few optimizations (they say it is for faster compilation). In short, a speedup of 2 times seems quite small to me. go is a terribly designed and slow language.

[-]

Qizot@reddit

Go is slow? It is faster than majority of languages out there. If you don’t like the language it is fine, but don’t behave like it is unusable garbage since it is not.

[-]

BenchEmbarrassed7316@reddit

Sure, go is faster than for example Lua or Ruby. But I'm specifically describing what is poorly designed and slow in a specific use case. Microservices are a bunch of I/O with serialization and deserialization.

[-]

theshrike@reddit

I think Twitch or Discord had a similar thing where the millisecond Go GC pauses were causing issues and rewriting in Rust was a net positive.

My opinion is simple, I prefer desktop software because it doesn't depend on the web and is faster.

This software thing in the browser, 100% dependent on the web and generally based on interpreted languages, is super slow, a performance disaster and an ecological disaster.

I did tests comparing C+× and python. Python consumed 4x more memory and was 60x slower.

This is totally anti-ecological. It borders on irresponsibility.

Computer sciences need to reflect, as it makes no sense to develop software and systems that are so slow and so dependent on mainframes. The programmer earns a little and the bill for his laziness is paid by the user!!

We fought hard to have computers at home, we bought software and hardware. They were ours, fast, efficient.

Now we are back to depending on mainframes and super slow and insecure systems.

We regress!!

[-]

pheonixblade9@reddit

I rewrote some pipelines at Meta and saved more than $10MM/yr in compute. It's really not difficult at the scale these companies operate at if there are low hanging fruit.

[-]

sammymammy2@reddit

Why is Go difficult to optimize?

[-]

klowny@reddit

In short, because you can't do much to change the behavior of the garbage collector or bypass it. Or going lower level than Go's primitives, which are not fast.

For many other languages, optimization involves controlling the memory layout and basically writing at a C level. Go does not give that low level access.

So what happens is you end up writing C, then doing all sorts of tricks to translate that C to Go while forcing Go to pretend it is C the best you can, which generally involves a ton of unsafe and raw pointers on every line of code but with none of the tools C or other languages gives to help manage those things all while fighting Go's garbage collector or working around missing primitives.

In most other languages, you can just.. use the C directly with some shims or use it's C level equivalents.

Go written like Go is already at a lower abstraction level than Rust/C++, so it's silly to write even lower level Go code when you can write at a higher abstraction level in Rust/C++ and not have to fight the language the whole time and still get faster performance.

[-]

ldrx90@reddit

That's pretty much my assumption as well. It's easy for me to believe they knew enough to judge if squeezing Go was going to really help or not and to make reasonable estimates about how much quicker they could do it in Rust. Then you just make the intern do it and see how it turns out.

[-]

BenchEmbarrassed7316@reddit

https://www.reddit.com/r/programming/comments/1okf0md/comment/nmbwkyn/

tldr language matters, golang is slow.

[-]

Farados55@reddit

Could’ve just linked to the blog post instead of this rehashed linkedin slop

[-]

InfinitesimaInfinity@reddit (OP)

I read several articles about it, and I linked one of them. I did not write the rehashed linkedin slop.

The article written by the intern is here: https://wxiaoyun.com/blog/rust-rewrite-case-study/

Also, I did not write the rehashed linkedin slop.

[-]

SureElk6@reddit

if you knew the original link why did you link the LinkedIn post?

are you "Animesh Gaitonde"?

[-]

InfinitesimaInfinity@reddit (OP)

are you "Animesh Gaitonde"?

No, I am not "Animesh Gaitonde". I did not write either article.

if you knew the original link why did you link the LinkedIn post?

Let's understand how they did this in simple words. Yeah, that is the AI regurgitation parts of the prompt.

[-]

youngbull@reddit

Let's understand how they did this in simple words.

Your linked-in post is leaking your prompt.

[-]

InfinitesimaInfinity@reddit (OP)

I did not write the LinkedIn post.

[-]

scrollhax@reddit

Is $300k savings supposed to justify the overhead of supporting an additional programming language?

[-]

fig0o@reddit

How much would he have saved by just re-writing the software using the same language?

[-]

Supuhstar@reddit

Pay that intern $200,000/year

[-]

NoMoreVillains@reddit

Nowadays, many developers claim that optimization is pointless because computers are fast, and developer time is expensive. While that may be true, optimization is not always pointless. Running server farms can be expensive, as well.

Because most devs aren't working on systems operating anywhere remotely near the scale or TikTok.

[-]

NuncioBitis@reddit

Now they'll have to spend 3x that to fix it over the next 10 years.

[-]

EntireBobcat1474@reddit

To play devil’s advocate here - one frequent retort you’d hear is that now TikTok has to retool or hire some portion of their staff to maintain rust instead of go code, which may create more cost. That said, most companies hire generalists, I don’t think there’s a real staffing cost to having to have part of your team train up on rust now (especially if they want to keep doing similar optimizations). I would be worried about potential friction if this was the only rust silo in that org though, since that would create friction when people want to make changes there, until rust becomes more widely adopted, but if that’s already a part of their engineering strategy, then all the better

[-]

NYPuppy@reddit

To play devil's advocate to your devil's advocate, learning a new tool is to be expected on a job and isn't a bottleneck. Newer languages, like rust, typescript, go, kotlin are all clean and readable and relatively easy to pick up.

[-]

swansongofdesire@reddit

the only rust silo in the org

If reports on the internal TikTok culture are accurate, it’s much worse than that: they let devs choose whatever they think is ‘the best tool for the job’, regardless of team expertise. This works out just as well as you can imagine.

Caveat: anecdata. Interviewed there myself, and have interviewed 3 ex-TikTok devs.

[-]

Coffee_Crisis@reddit

This is a viable strategy if you have a truly modular system and code can be thrown out and rewritten with confidence

[-]

EntireBobcat1474@reddit

Oh yeah that's very different

[-]

Xalara@reddit

One thing to keep in mind as well, is that while many developers are generalists, Rust is different enough that it can trip up even generalists.

[-]

jug6ernaut@reddit

Generalists is definitely what an avg company should be hiring for. There are definitely places for specialists, in my experience they are few and far between.

It's 0.0001% of revenue, "isn't huge" is a dramatic understatement

That being said, it's not like the CEO had to manage this project. At the team level, it's a pretty reasonable amt of cash

[-]

Exepony@reddit

All else aside, this intern is going to have one hell of an XYZ bullet point on their resume now.

[-]

HistorianMinute8464@reddit

How many pennies of those $300,000 do you think the intern got? There is a reason the original developer didn't give a shit...

[-]

CardiologistIcy5307@reddit

Sounds like a great strategy for B2B SaaS companies.

[-]

captain_obvious_here@reddit

many developers claim that optimization is pointless because computers are fast, and developer time is expensive

I rarely read that it's pointless. I most often read that it's not always a high priority. And in a world where new features make or break a company, it makes perfect sense.

[-]

Sounds about right - whenever Go or Node or Python tries to get performant they just try to hook into C++ or Rust to achieve it.

[-]

Thin-Yard471@reddit

I understand how you feel. That's absolutely fascinating! Seeing a measurable improvement like that in core metrics—CPU, memory, latency—is the kind of thing that truly shifts perspectives. Makes you wonder what other "insignificant" savings there might be hiding in the benchmarks.

Hear me out, coming from someone who spends a lot of time looking at code performance (often wishfully):

Profiling first.

That's my cardinal rule for optimization work. Before I get excited about "making the code faster" or suggesting a rewrite, my immediate reaction is always: "Okay, where are we hitting these bottlenecks? Can we measure the impact?"

apoleonastool@reddit

300k is peanuts. It's a fun exercise for an intern, but the company won'tceven notice it.

[-]

bigtimehater1969@reddit

A lot of this is just "impact"-bait. None of this work helps Tik Tok's business in any way, and $300,000 is probably a drop in the bucket (notice how every number has a before and after except for the cost. It's probably like a small company rewriting code to save $3).

But you see $300,000, and you see numbers decrease, and you get impressed. This is how you chase promotions at big companies - find busywork that results in impressive metrics. What the metrics measure is irrelevant.

[-]

Traditional_Pair3292@reddit

I work at a Faang company and I saved $1m per year changing one line of code that was doing a full recursive file search every 5 seconds. When you have these massive scale companies it’s not hard to do

[-]

fagenthegreen@reddit

[-]

cjthomp@reddit

Nowadays, many developers claim that optimization is pointless because computers are fast, and developer time is expensive.

Bullshit, "premature optimization" ≠ "optimization"

[-]

qckpckt@reddit

This could easily be used as an argument to say that optimization is pointless. $300k a year is nothing to a company like TikTok.

They probably get multiples of that every year in free compute credits as incentives.

In a perverse way these kinds of optimizations could even be bad for a company. I worked at a place where AWS paid the wages of some contractors that we employed to deliver new tenants on our platform.

When we made the platform significantly more efficient, AWS complained loudly that the projected costs of their sponsored project were below their estimates, and ultimately stopped covering the costs of the contractors.

[-]

BlueGoliath@reddit

Nowadays, many developers claim that optimization is pointless because computers are fast, and developer time is expensive.

By "developers" you mean "high IQ" people on Reddit mostly.

Go is not a super slow language. However, after profiling, an intern at TikTok rewrote part of a single CPU-bound micro-service from Go into Rust, and it offered a drop from 78.3% CPU usage to 52% CPU usage. It dropped memory usage from 7.4% to 2.07%, and it dropped p99 latency from 19.87ms to 4.79ms.

No way. Tell me more.

[-]

PatagonianCowboy@reddit

People should rewrite in Rust more often

[-]

LessonStudio@reddit

There are those who will want you burned at the stake.

Until you do something familiar in rust, you just don't realize what the hell it is. I think most C and C++ people see it as doing pointers really carefully and think, "What's the big deal? I don't even like smart pointers."

But, every company I know using rust is basically, "We could not do what we do the old way. Not even close." the tech debt is accumulating at a snail's pace, the productivity is through the roof, and stays there right up to the end, and the bug count is more often someone misinterpreting a requirement or a bad design, than the usual threading, memory, or other errors people make when doing the dance of the seven veils with pointers. Those basically go away with rust.

Then, there are the little things like handling the wonderful results to completion, instead of going, "Ya ya, those other things never happen anyway."

And on and on.

The expression "If it compiles, it probably works" is so very true with rust. But the C/C++ crowd say moronic things like, "If you just don't make mistakes, you don't need a crutch like rust."

I remember a story about some moron manufacturing consultant who was hired by a factory. He asked the floor workers, "Can you make one widget in a row correctly?" "Can you make two in a row?" and so on. He then said, "Focus on doing that over and over, and mistakes will just fade away." Total garbage.

Instead, the correct approach is to pretty much drop what everyone is doing when a mistake happens and try to figure out the root cause. Often there is a process or even tooling change required to prevent this mistake again. They don't see them as crutches, but a method for higher quality products. Rust is just the tool which looked at the sources of the problems and started eliminating them. C/C++ keep looking at more complex ways to screw up your code even worse. Let's add rare and obscure template constructs so that nobody but a few pedants can understand the code, and even the, only after intense study.

[-]

kapybarah@reddit

I'm sure everyone tries to optimize as best as possible the code that runs on rented hardware. Maybe some client side code can be less than ideal in terms of performance for certain types of application because of 'how fast computers are' but even that is only somewhat true because users expected things to be fast, ironically, because computers have become really fast.

Also just because it was done by an intern doesn't inherently mean it was simple or easy. They could very well be an excellent dev already

[-]

Gibgezr@reddit

And, of course, they would have gotten the same speed-up from rewriting in .

[-]

heatlesssun@reddit

He probably ran it through an AI because the other folks were to arrogant to try.