Top companies with no preprod. Their prod also contains their preprod.
Posted by xamott@reddit | ExperiencedDevs | View on Reddit | 44 comments
I have heard that Meta, Fortnite, and others do not have a preproduction or even a “test” environment. Maybe I’m just old but that seems to fly in the face of what we do. But it’s clearly a trend at major, modern tech behemoths, so that would indicate I’m missing something. Can anyone explain to me why this is the trend? Why do they think there’s no value in a test/staging/integration/UAT/preprod environment? They just handle that ON production, while logically separating out test data from prod data. But that separation logic itself is a risk.
asurarusa@reddit
I’ve been working on web software for over a decade at this point and I’ve never worked at a company that did a good job of maintaining pre prod envs. This eventually leads to diminishing returns because while the pre-prod environment can catch show stoppers, the subtle stuff still gets through because there is such a difference between what the software will encounter in the preprod environment vs actual production.
On top of that the industry’s understanding of quality has changed. It’s gone from ‘we shipped a bug we failed’ to ‘ok the bug was live for 30 min before we caught it and rolled back, we need to make improvements but no big deal’. This increased appetite for making customers a part of the testing process is another reason why people don’t feel like they need pre-prod, inconveniencing the customer is not a hard and fast line and there is a proliferation of tools to do limited and phased rollouts, a/b tests and feature flags so you can release, see what happens, then turn things off if the go wrong pretty easily.
-manabreak@reddit
Often the preprod lacks real data that's needed for it to be useful. Sure, it can be generated, but more often than not it's either not generated or it doesn't reflect real life use.
Jaded-Asparagus-2260@reddit
I feel like nowadays it's more like "did the bug lose us a lot of money? No? Then don't fix it. Focus on shipping features that make money. Give me money. Money pleaaaaase."
ranger_fixing_dude@reddit
If the expenses (time, maintenance, dev efforts) for the staging/test environment are high, and the cost of error is relatively small, you can just release to production.
Pretty much all small companies start like that in order to iterate faster, and over time add a test environment for either specific code parts (e.g. billing), or in case they had some major (usually repeated) outages after which it is decided to bring some stability and that is a common approach.
But if historically it works and teams simply hide some parts using feature flags or something like that, it can be challenging to introduce later. No idea about Meta or Fortnite, though.
ranger_fixing_dude@reddit
Also replied to another comment here, but it is possible to mitigate the damage. Automatic gradual rollout with monitoring errors/reports, very easy rollback, sophisticated feature flag mechanisms and so on.
spicycli@reddit
Maybe it’s not about the fact that they don’t believe in testing. I believe that they just test in prod using feature flags, limiting features to a subset of users or QAs. This removes the burden of maintaining additional environments
Sakura48@reddit
Bruh what if the feature flags were done wrong?
ranger_fixing_dude@reddit
Companies which do that very likely prioritize recovery -- e.g. easy rollback, gradual rollout, alerts, etc. In fact if they need to rollback somewhat often, they might be quite resilient to truly catastrophic issues as they do that often compared to a more conservative approach.
spicycli@reddit
Same answer as what if there’s a configuration drift between environments and you release something that worked in staging and does not work in prod
doyouevencompile@reddit
Completely different use cases.
allllusernamestaken@reddit
It's also a matter of risk. FB rolls out a feature to 0.01% of users and it's broken. What's the impact? A slightly confused user who may have seen an error page or some unexpected behavior when trying to share a meme... so basically no impact.
If it was a feature that processed payments, or made irreversible changes, it would be different.
Ok_Chemistry_6387@reddit
Meta has internal instance. Also a lot of testing and automation. They also do phased rollouts. Ie update a small number of servers (i heard nz gets used a lot) monitor, then rollout further and rollback if errors increase. They use sophisticated feature flagging too no features rolled out to everyone at once. They will rollout to specific demographics get measurements they need etc. A lot of reporting tracing etc.
gigamiga@reddit
How do people do that with database changes
pboulos_@reddit
I don’t work at Meta, but it’s likely some version of this: https://martinfowler.com/bliki/ParallelChange.html
TimonAndPumbaAreDead@reddit
Any environment is a test environment if you're brave enough
shadowdance55@reddit
My favourite way to phrase it is that all companies have a test environment; some companies also have a separate production one.
uplink42@reddit
I once saw a company pitch their service to a customer claiming they had unified staging and production environments as a feature.
mugwhyrt@reddit
"Experience the latest features as soon as they're approved by devs!"
i_exaggerated@reddit
Any environment is a prod environment if you’re brave enough.
Jaded-Asparagus-2260@reddit
All teams have a testing environment. Some teams are lucky to have a separate environment to run production in.
HakoftheDawn@reddit
I like your username
johnpeters42@reddit
PAIGE NO
rkeet@reddit
Git flow, and expanded flows, are slow and cumbersome. A lot of overhead is caused for multiple reviews of the same as code goes "up each layer". Each long lived branch has an environment. Great for slow organizations, non mature teams, lots of dependencies, non-tested code, and the like. Deployment through manual action (hit "play")
Github Flow is a lot faster with short lived branches against a single main. Anything merged to main immediately gets deployed on a staging environment. Requires fair team maturity, ci/cd required in flow, strong test flows. Release to prod through Release branch creation from main or tagging main.
Then Trunk Based Development, where work is done directly against main, no branches (usually). Deployment to prod from merge/push to main. Total reliance on automation to prevent faults getting through. Strong team maturity required. Work scoped to very tiny tickets.
I prefer the Github Flow. Not too much overhead and enough room for the plenty of mistakes I make. I start my new brainfarts with TBD though, just me anyway, so might as well.
At work I try to steer new projects to GF to start of with. The automation level required ensures we start with discussions on standards, after which they're put into code, so no more pissing about on code styling in reviews. Initial investment that pays off fast.
Hope that helps :p
rco8786@reddit
At large scale maintaining a pre production environment that meaningfully replicates what production looks like is simply not worth the effort.
Meta has 1000s of production services and they all feed data from one to another. It’s a huge disparate graph of data. In production that data “makes sense”. In some preprod environment you end up with little islands of test data but the whole graph is disconnected. It makes meaningful testing hard to impossible.
TheTacoInquisition@reddit
The answer is down to how you test in other places, and the protections you have in place when when it does go "wrong" on production.
I've worked in a place with a very mature test suite, focused on behavioural testing rather than unit testing. There was a "write a failing test FIRST" mentality for all new code, and making sure that when changes were being made to existing code, that there was at least one test covering the code under change (if refactoring, we'd not expect the tests to fail, but making behavioural changes, we'd ensure a test was failing as the changes were made and then fix it).
It was hands down the safest environment I've worked in, and pushing to production instead of staging first felt very robust.
We also used a LOT of feature flagging for new work, so we'd do the smallest thing we could make into a PR, and push, make the next smallest change, push, and so on. The PRs were easy to review and get through quickly, and the feature flagged code didn't need to be feature complete to be on production.
So yeah, places that have a decently thought out flow of getting unfinished changes into production quickly and safely, coupled with a safety first mentality for automated CI tests means it's practical and reliable to ditch the test/staging part of the delivery process.
This requires a engineering culture to back it up though. Even one cowboy can make it unworkable.
tuna_safe_dolphin@reddit
Sounds like a great company or engineering team at least. Why did you leave and what happened to them?
Possible-Pirate9097@reddit
Are you new here?
Vivid_Fan9346@reddit
At last place, each team would deploy to prod multiple times during the day (every merge to main). No preprod; just prod. Blue-green deployments. Tenanted solution so test data was just another "prod" customer. The risk was acceptable to the business.
At current place, we have dev, test, uat, and prod. Deployments happen on a stricter cadence. Different market; different level of risk tolerance. Different level of impact if a bug is found in prod.
tsereg@reddit
What can go wrong if Facebook doesn't work?
alexs@reddit
For many product categories test environments are hard and/or extremely expensive to keep representative of production.
Feature flags, green blue deploys etc are a good middle ground.
olzk@reddit
AI hype is petering, so… there has to be something else for the managers to ride into their next promotion
defenistrat3d@reddit
They are likely using feature flags around all new features. They can control which customers or test customers have which feature flags enabled when.
Though I would still want my QA environment. Gives me the fuzziest to see new things working there first.
TheSauce___@reddit
We do this at my job. It’s really as simple as “the best way to know if it’ll work in prod is to test it in prod” - a pre prod environment is just another instance you need to maintain & keep synced
TheGRS@reddit
Works if you don’t have financial implications.
gefahr@reddit
It can be made safe to do in any environment with enough care.
Now whether you have the energy to fight your ISO auditors on this is an entirely different question.
pydry@reddit
I worked at a company where we had a staging environment but our team basically stopped using it because nobody ever caught a bug there. It was just a gateway to prod.
The e2e and unit tests were incredibly thorough and you could turn a user description of a bug or feature into a test very easily and reliably. We had 100% test coverage.
We also used pull request environments and feature flags.
pseudo_babbler@reddit
They solve the problem by paying all their engineers 600,000 dollars a year and make them claw each others faces off to impress their managers, while constantly moving the performance goal posts. Or at least that's what /r/ExperiencedDevs would have me believe.
I think it's just a matter of scale though. Small corporates can have test environments. I've always had them, sometimes they have problems, sometimes they're fine. Once you get to massive scale though you just can't guarantee every little system in your vast ecosystem will have both an as-live version in preprod and also a dev team actively making changes to it, so you need to get more creative around contract testing, automated UI testing, blue/green releases, dark deployment, feature flags, every trick in the book to allow your teams to all go fast without blocking each other.
If you tried to run, say, a preprod version of Facebook it would be just a nightmare. Either it's as complex as production (with.. bots generating content ready profile content?) and just as expensive to run or it's not really the same as production and doesn't give you any confidence that your changes will work anyway.
Frenzeski@reddit
Read charity majors opinions on testing in prod
kobumaister@reddit
Sometimes the volume of data or the scale of the system makes it impossible to test in pre environments. I'm sure they have some environment for validation, but is faster and cheaper to control the delicery of the features through A/B of feature gates than having to simulate a high volume environment.
sleepyguy007@reddit
i was at reddit for a short while end of 2020. at the time if you wanted to deploy say the entire website (and i did... because everyone got to), you'd spin up a cluster on prod with Spinnaker and deploy to that. then "test" it and have it deploy to the other clusters. there were some test subreddits i remember.
h4l@reddit
Imagine your production environment has the ability to gradually release a new version by starting with a small fraction of requests and scaling up to 100% if it behaves well.
This is not so different to releasing a new version in a staging environment, checking it works OK, then releasing in prod. Staging has a small amount of traffic compared to prod.
You could argue there's more risk going from staging to 100% traffic in prod, than from 0.1% traffic in prod up to 100% over a few hours. When you do it gradually, there's no hard behaviour difference that could make your staging deployment invalid (e.g. configuration in staging that doesn't exist in prod).
Bright-Confusion-341@reddit
I used to work at Uber, you could connect code running on your laptop to prod services and database. Ofcourse, everything is audited and new features are shipped behind feature flags.
General moto being, trust your devs to do the right thing.
Mundane-Charge-1900@reddit
I worked at another well known company that did this.
A lot of it was that historically they focused on moving very fast. The business domain was very competitive and constantly changed. A test environment was seen as extra overhead that slowed things down too much.
Then once they got to a business and market maturity where they could focus more on quality, it was impossible to add a test environment at scale.
We had thousands of services, databases, event topics, etc.
Instead, they ensured quality in two major ways 1. Feature flags and experiments were widespread with very sophisticated systems for managing it. The complexity of rules supported was the highest of anywhere I’ve worked bar none. 2. It was possible to create a “staging” version of any service. Many had them but it was not an isolated system. It still talked to mostly other real dependencies including production mode. Still, you could test in a somewhat isolated way.
rentalhealth@reddit
depends on the domain area. low-risk operations (that is, no liability for failure other than money lost), it's infinitely better to have fewer test stages and be able to revert quickly than it is to thoroughly test and have 1hr+ long builds, complex patch landing/merging workflows (then the same hour long+ builds again), differences between environments with no meaningful data, etc.
I work in a place where we can do that. Before that I had some on call incidents where we had no idea what the source of a bug was (repo was org wide) so we just ran git bisect and shipped the 2hr+ build each time. worst fucking day of my life.