The MySQL-to-Postgres Migration That Saved $480K/Year: A Step-by-Step Guide
Posted by narrow-adventure@reddit | programming | View on Reddit | 42 comments
Posted by narrow-adventure@reddit | programming | View on Reddit | 42 comments
Which_Roof5176@reddit
The continuous replication part is the real takeaway here. Most migrations get stuck trying to manage dumps and end up making cutover way harder than it needs to be.
Keeping both DBs in sync and then switching is what actually makes low-downtime possible
narrow-adventure@reddit (OP)
Absolutely, I hate a lot of things about AWS but their RDS and the DMS are incredible for this…
jlindenbaum@reddit
This is interesting thanks for sharing.
Some questions:
I ask because my previous job we ran around 9TB out of one MySQL with replication at around 80k queries per second on GCPs second largest cloud sql instance. We had the odd locking issue for certain types of ALTER, but mostly mitigated with ghost migrations.
ArielCoding@reddit
Impressive work, now that the migration is done, you can use ETL tools like Windsor.ai for the reporting pipelines to automate schema mapping, so you’re not requiring every connector after the cutover.
Capable_Chair_8192@reddit
So refreshing to have an article that is not AI generated, not trying to sell anything, just gets straight to the point and is full of nice technical tips. Thanks for this.
Annh1234@reddit
Guy had under 1tb database with a few million records, he probably means 480 or 4.8k/year not 480k lol
narrow-adventure@reddit (OP)
Nope, 480k. Disk space is cheap, computer and ram are expensive. It was a b2b sass with a lot of constant usage.
Most of the cost was coming from replicated ephemeral environments like the post explains though, that means there were multiple replicas (often up to 10) of the full production database running for manual and automated test environments.
This was deemed cheaper than constantly managing and updating a smaller db that was not representative of the actual user data and always behind on the actual features.
Hope that’s helpful, just trying to provide context!
Annh1234@reddit
That sounds ridiculous... Even with today's stupid server priced you can get 10-20 servers with 25g network, replicate the data between then and end up much cheaper.
Just as a reference, back in the day, 10+y ago, I worked on a project with a 1-2PB MySQL database, shards replicated on 70 machines and the total hardware cost for it was like 300k and hosting was 10k month.
Today you can get the same under one box under 100k hardware and 1k monthly half rack hosting.
Today or dev db for another b2b sas is about 2tb, replicated on 5x r640 servers from 2018 and runs us 10k/year averaged with the hosting and hardware. We routinely max the network on it, and most the cost is in nvmes every few years.
Where can I find clients dumb enough to pay 480k for that lol
narrow-adventure@reddit (OP)
Well look, this was partially my decision and my responsibility and I’ll give you my reasoning for it, no need to be so harsh and call me stupid, we can discuss it in a civil manner.
I’ll walk you through why I keep sponsoring Bezoses life style and you can tell me about an alternative approach. RDS provides reliable backups in 1min intervals with read/write replicas and failover, they provide quick replicas where you get a new instance that doesn’t replicate the data from the main db until it’s actually used they achieve this by going into the internals and modifying them. I don’t know of an open source alternatives to it. To do all that in house a single engineer in the Bay Area to manage this infra will cost you more than any savings you could ever have.
My bill now is much smaller (diff company) but I’m always looking to save money on it, if you have an alternative for bare metal hosting that doesn’t require another addition to the team I’m all ears.
Annh1234@reddit
Pretty civil here, but rather put that 480k in my pocket rather than sponsor Bezose lol
We were 2 guys dealing with the big 2tb project, in Montreal Canada (cheaper salaries). And sure, we might have been more competent than the average joe, but but work load was pretty light. Once we installed those servers for a week, the biggest issue was in 2011 when there were no hard-drives and our seagate barracudas were dropping like flies with no replacements in sight. (google 2011 hdd crisis)
Your 1 minute backups, those are called delayed replication, if you have the disk space your all good.
And if you have 5 live replicas, the only "backup" you need is if a stupid dev drops a table or something (1 server crashes, you get another one up, it replicates from the 4 others and your good).
Sure RDS might have much better networking and so on, but spending 24 times more just cause... not with my money.
sob727@reddit
Medium, wont click
bobody_biznuz@reddit
Curious, what's wrong with medium?
pixelbart@reddit
Most content is neither rare nor well-done
leixiaotie@reddit
r/rareinsults
no_awning_no_mining@reddit
Also r/welldoneinsults
gimpwiz@reddit
I really really hope you came up with this yourself, because that's terribly witty. I love it.
pixelbart@reddit
I didn’t, unfortunately. I don’t know who did though.
sponge62@reddit
Oscar Wilde tier burn.
narrow-adventure@reddit (OP)
Any alternatives or are just direct blog websites preferred?
tecedu@reddit
github pages, or just buy a domain
sob727@reddit
So I've thought about it without finding a solution.
What if I want to put up writings without AI crawlers being able to get it?
pfn0@reddit
damn, that's an amazing burn for medium.
no_awning_no_mining@reddit
What it shamed that's it's going to be hidden for most users.
sob727@reddit
Yeah I'm upvoting the downvoted guy for that reaso
Cloned_501@reddit
It is called a sear my good fellow.
sob727@reddit
Wow amazingly witty
andrewia@reddit
Annoying design, especially when it nags you to register for an account.
lucidbadger@reddit
Yeah, medium is horrible site
cazzipropri@reddit
I bet that if you repatriate from the cloud you'll save even more.
Hard to say if you can, depending on what else you have in the cloud.
narrow-adventure@reddit (OP)
I’m sure it’s possible but here are the 2 main RDS features that keep me paying for Bezos’ yachts: - 1min backup intervals based on partial binlog replication. RDS takes a full db snapshot periodically but saves the full binlog in between, meaning up to 1min backups are available and they restore pretty quickly. - you can get a db replica very quickly, the data is not instantly copied but as needed, you get the db fast and as you access the data it gets moved over into the new instance.
I wouldn’t know how to setup that locally or maintain it or even guarantee for it. I wouldn’t even want to have to guarantee that it would always work. Outages and infra maintenance add costs quickly.
OkProduce1754@reddit
Good article, thanks for share.
Edgeaa@reddit
40k a month on RDS??? jesus at this point just reserve an EC2 on put postgres on it, you'll save about 30k a month, even if you were to hire a dedicated database admin you'll still save hundreds of k a year.
deja-roo@reddit
No one who is familiar with RDS and using EC2 would ever make that decision.
Edgeaa@reddit
I read it fully thank you very much. They talk about having a bill of 80k a month, being lowered to about ~40k a month after migrating to postgres because they could downgrade the instances used.
They mention many ephemeral instances but don't go into detail about what it entails, but if it costs 40k a month I can assure you you can find something cheaper with a bit of dev work. The pricing of RDS is about x4 the price of the same reserved EC2, and if you pay that much there is definitely a way to greatly lower those costs. Even if it's just using the main in RDS and the other ephemeral databases in EC2s or something else, there is definitely a way to be found. For a potential saving of 100k+ a year, it's not something you should dismiss this fast.
headykruger@reddit
You'd likely also need to hire someone to take over running those db's now too right?
Never bet against AWS on the pricing front, they know what they are doing.
swizznastic@reddit
Arent snowflake and databricks entirely built on reducing AWS costs by providing alternative ways to get the same results?
thisisntmynameorisit@reddit
what do you mean on the pricing front?
Edgeaa@reddit
Depending on the system there might be a way to dev a one-off script of some kind that might take care of that for you. Again I don't know their setup so I'm just guessing here.
That's where I don't agree, I'll take those odds any day, especially at that price. When you have few users it doesn't make sense to spend thousands a year to do what AWS already does, but when a single service's overhead cost starts to be as pricey as even 3 months of an engineer full time, that's where I would at least consider a change (I'm not saying to go balls deep and fully commit, but at least consider options). 100k~200k$ is beyond that, so yes I would heavily consider stepping back from fully managed AWS services that cost an arm. If the pricing model of RDS was different, like "paying fixed overhead for RDS, and then the server cost from EC2" that would make sense, but that's not the case. The cost of RDS is about 4x the price of EC2 (which is already overpriced compared to a datacenter), and this overhead cost makes less and less sense to pay the biggest your platform gets. The only reason why companies keep using it is usually either (1) they have a shitload of money and don't care either way or (2) they are stuck with it.
nwsm@reddit
You would need (at least) one DBA and (at least) one SRE. It’s not dev work it’s systems work and that work is ongoing. This is what you are paying for with a managed database.
cheezballs@reddit
You, uh, don't work at enterprise levels huh? That's nothing. Go look at Oracle licensing prices.
CrushgrooveSC@reddit
Is there perhaps a location for this content that isn’t low-value Nickle-scam spamware?
narrow-adventure@reddit (OP)
Not as of right now, but I’ll make sure there is an alternative as soon as possible