Fargate vs EC2 - When to choose Fargate?

Posted by agbell@reddit | programming | View on Reddit | 62 comments

[-]

agbell@reddit (OP)

Hey,

Article author. Much of my previous experience was in backend engineering, but now, at Pulumi, I'm learning more about cloud offerings, which can be a confusing space.

So, this is me trying to determine when you would choose AWS Fargate over EC2 to run your containers on ( EKS cluster for my specific case ).

Fargate gives you isolation and better scaling but at a premium price ( \~2.5 or more ). That might be worth it for some use cases.

Has anyone been burned by the Fargate or found a sweet spot where it works well?

[-]

The math doesn't really work out here. 1vCPU/2GB on Fargate costs $.04937/hr, same on ec2 (c7a.medium) costs $.0551/hr. T series instances have significantly less CPU capacity, so they are not really comparable here. Even then the difference is far from 2.5x, for example t3a.small costs $.0204/hr and has 20%×2vCPU/2GB, comparable Fargate (.5vCPU/2GB) costs $.02913/hr or 40% more. I got prices for ew1, not that I think that makes a difference.

So if you have bursty workload then T series ec2 can save some money, but on steady load Fargate can actually end up being cheaper!

[-]

agbell@reddit (OP)

The post breaks down an example. But you will be running one pod per fargate, and many pods per larger EC2 instance. Not sure anyone is running a EC2 instance for every container.

[-]

ammonium_bot@reddit

in less then the

Hi, did you mean to say "less than"?
Explanation: If you didn't mean 'less than' you might have forgotten a comma.
Sorry if I made a mistake! Please let me know if I did. Have a great day!
Statistics
^^I'm ^^a ^^bot ^^that ^^corrects ^^grammar/spelling ^^mistakes. ^^PM ^^me ^^if ^^I'm ^^wrong ^^or ^^if ^^you ^^have ^^any ^^suggestions.
^^Github
^^Reply ^^STOP ^^to ^^this ^^comment ^^to ^^stop ^^receiving ^^corrections.

[-]

zokier@reddit

The article compares 0.5vCPU Fargate to t3.medium with 8 pods, which ends up being 0.05vCPU per pod on average. No suprise that 10x more cpu costs more, it's bit silly to claim that the two are comparable. The article also says "EC2 costs less than Fargate on a pure cost-of-compute basis", but even in that example fargate easily wins in terms of $/compute.

Sure, the one benefit ec2 is that it allows <.25vCPU per pod, but that is very different than cost of compute imho, it's more of cost of non-compute :) If you try to do some actual computation then the math changes dramatically.

[-]

bwainfweeze@reddit

Plus if fargate is running on the t3 generation hardware that would be nuts. Shouldn’t we be comparing against m6 or m7?

[-]

agbell@reddit (OP)

I mean, I like 'the cost of non-compute' phrase and see your point. But yeah, I don't want to do more compute on my core DNS faregate instance. Technically right vs practically right, in the use cases I'm looking at.

Or course faregate spot might change the numbers. Your mileage may vary, etc.

[-]

bwainfweeze@reddit

My app did not end up being cheaper or faster on c7a. I think C7a is priced incorrectly at least for node apps. It should be about 8% cheaper to keep with previous generations on price/performance.

We stuck with 7I and 6i.

[-]

caltheon@reddit

Fargate can be cheaper than EC2, especially if you are willing to use SPOT instances

[-]

pineapplepizzabong@reddit

I am in the process of this migration now. I will report back once we get some data.

[-]

staticfive@reddit

The simplicity is compelling, but hearing that it can’t run daemonsets (which we use for Cilium and nginx ingress controllers) makes it a bit of a dealbreaker for a lift and shift.

[-]

agbell@reddit (OP)

To Fargate from EC2?

[-]

pineapplepizzabong@reddit

For more context we have no say in the plan really. Top down mandate for "more server-less". Could be a win for us, could not be. I can follow up once we get some hours in.

[-]

agbell@reddit (OP)

I mean, it can make sense. If you need isolation, or things are bursty, and you don't want to scale up EC2 nodes to handle the bursts. Those are two that come to mind.

[-]

pineapplepizzabong@reddit

They want to "manage servers less". Our traffic is a classic 9 to 5 normal distribution, no spikes or surges. Our EC2s currently scale fine (sub 1% error rates) and are part of a reasonable ASG. The services are considered critical so our clusters skew to over-scaled and over-redundant so money wise FarGate might be better.

[-]

WriteCodeBroh@reddit

“Manage servers less” seems to be the key. We chopped multiple categories off of our corporate vulnerability tracker, saving hundreds of hours in updates to IaC files to increment a golden image version lol. That alone probably makes up for the difference in cost between Fargate and EC2 at a large org.

[-]

pineapplepizzabong@reddit

EC2 to FarGate

[-]

Nice-Offer-7076@reddit

Jobs that run for an hour / day or less. I.e. no long running or always on services.

[-]

nevon@reddit

I've used ECS ec2 to run thousands of regular always-on type applications, and then run more "run-to-completion" type jobs (think cronjobs or operational tasks) in the same logical clusters but on Fargate capacity. One of the reason being that scaling ec2 really isn't that fast to accommodate large short-term workloads, and you tend to end up with a lot of overhead because ECS will never terminate a task to place it on some other more congested instance even if doing so would mean you could terminate an instance that is almost empty. With Fargate, the issue of overhead and worrying about packing becomes AWS' problem instead of mine, so the difference in price doesn't end up mattering much.

If you're taking about EKS however, things might be a bit different since you can run Karpenter to "compact" your ec2 clusters for you, instead of being at the mercy of ECS capacity provider driven scaling.

[-]

agbell@reddit (OP)

Related Question: Why is the world of cloud services so confusing and byzantine?

There are a million ways to run containers, all with unique trade-offs. We've made something very complex out of something designed to be simple and undifferentiated.

[-]

bwainfweeze@reddit

Because it is difficult to convince a man of something his livelihood depends on him misunderstanding.

[-]

MathematicianNew2519@reddit

its ironic how containers were meant to simplify deployment but now we need entire teams just to manage kubernetes

[-]

snuggl@reddit

Another reason is all major cloud providers started before k8s and now they all also need to offer managed k8s so we are looking at at least 5-6 methods that are just cognitive burden

[-]

beatlemaniac007@reddit

Mainly cuz edge cases I'd say. Look at linux. For 30 years it's mostly been evolving under one single man's vision (I know, not really as much anymore but not the point). And linus is a solid candidate for the best programmer of all time. And he is extremely and openly opinionated about having good "taste" when updating linux kernel code, which he has defined as not needing special code for handling edge cases. And yet, look at the complexity of linux, it's a mess. Now consider all these other systems which does not have a Linus for keeping things in check. Shit will always get complicated as it evolves and needs to handle more edge cases.

[-]

caltheon@reddit

I made a lot of advancements in my career just by knowing how to tell companies what complexity they can remove. It usually requires a large mix of both technical and functional knowledge, which most people aren't great at both.

[-]

granadesnhorseshoes@reddit

They are entirely artificial edge cases brought on by the added complexity to offer up platforms as services that work for anyone and so suck for everyone.

That turns into a feedback loop of solving problems you created with one platform, by creating a slightly different platform for some subset of edge cases....

Now there are 15 competing standards.

[-]

pineapplepizzabong@reddit

Sowing confusion means cloud providers can reap in profits IMO.

[-]

stillusegoto@reddit

I’d argue the opposite. Making things more streamlined would make it easier for people to use the services and easier to mask the costs and increase their margins when you basically have a magic black box container services. Hell you wouldn’t even need to declare memory or cpu resources, it would learn from your usage and scale on its own, they you pay whatever they want to bill you for each month.

[-]

Xyzzyzzyzzy@reddit

Agreed. My previous startup had been on a web hosting service that was very robust and configurable, but routinely required lots of manual intervention to do things.

We swapped to Vercel because it "just works". Our Vercel bill went up as our web traffic went up. It was definitely not the cheapest possible solution.

But there was absolutely no question of ever swapping to a different provider to save money, because when developer labor is 90%+ of your costs, you have a limited runway, and you need to invest as much of that as possible into your core product, a cloud service provider that "just works" is worth every penny.

The "just works" part helps you sell additional services too. I looked into using their distributed KV store, which is just Redis behind a very thin curtain. It's objectively overpriced compared to using a different Redis provider or provisioning our own. But it "just works", it's quick to get started with and easy to use, it didn't require adding yet another service provider to our cloud services, and even objectively overpriced Redis is still pretty cheap. If you're just looking for a Redis provider, you probably won't pick Vercel - but if you're already on Vercel, there's an excellent case for paying the premium for its Redis service.

[-]

Jump-Zero@reddit

It's also that when they introduce a product, they (ideally) have to support it for a long time. It really sucks when a company has to move to another hosting platform because GCP decided not to support it anymore.

[-]

bastardoperator@reddit

Call me crazy, I think packages are still easier when it comes to deploying.

[-]

BinaryRockStar@reddit

Packages? Like apt/yum/dnf repo packages?

[-]

trdarr@reddit

money

[-]

Dreadgoat@reddit

Scalability is expensive, and the more you need, the more expensive it gets.

A complete cloud provider needs to offer many options with different degrees of scalability.

If you are a tiny shop that uses containers for the utility of having easily configurable, portable, and stable virtual machines, then it would be ridiculous for you to pay for a service like Fargate to get those machines online. EC2 is an easy choice.

If you are a huge enterprise that needs to be able to orchestrate huge spin ups at any given time in any given region of any given size, clearly EC2 isn't good enough and you should invest your efforts into something like Fargate.

But those aren't the only two points on the scale. Maybe your sweet spot is ECS, or managing a suite of Lambda functions, or even Lightsail. Maybe you need EKS or maybe that's ridiculous overkill. Maybe ECR is a handy tool to manage your containers or maybe it's simpler to just manage them yourself locally.

Choosing the right one requires in-depth knowledge of your domain, forecasting its most likely future, and understanding the cost and benefits of every option. You can't just line them up in a row in order of how much scalability they provide, because even then you have to ask what kind of scalability. Are you cloning the same container a lot? Are they all different? Do they vary wildly in size? Do they need to be individually elastic? Do they need to live in multiple regions, communicate across those regions?

tl;dr Containers on their own are pretty simple! But cloud scaling is byzantine because the suite of problems it solves are byzantine.

[-]

Esseratecades@reddit

It's to satisfy customers performing very slow transitions into cloud with potentially very niche and non-standard use-cases.

Really if you actually make an effort to stay as cloud native and serverless as possible, then the number of services to concern yourself with drops quite drastically.

[-]

mpanase@reddit

AWS wants to charge you as much as possible.

AWS wants you to use multiple of their services.

AWS wants you to think what they offer is unique, and make it difficult to leave by wrapping standard things with their own nomenclature and config methods.

Try GCloud. They use actuall normal words and standards when available.

[-]

Jump-Zero@reddit

It's hard for me to trust GCloud. Google is notorious for shutting services down. I don't want to have to move a bunch of services over because Google doesn't want to support their platform anymore.

I personally love using DigitalOcean. It's simple and easy to set something up. I use it for personal projects all the time. Professionally, I just go with AWS. It's not sexy, but it's reliable.

[-]

editor_of_the_beast@reddit

Because complexity is necessary for practical systems. This should be obvious. Complaining about complexity and suggesting that some elegant simple solution would fix it all is just something humans do because we are not that smart.

[-]

agbell@reddit (OP)

Team not-that-smart, shouldn't-this-be-simple reporting for duty.

[-]

Halkcyon@reddit

Maybe you're overemployed and not qualified for your work then?

[-]

poofartpee@reddit

We get it, you're smarter than everyone else. Save some of your farts for the rest of us.

[-]

editor_of_the_beast@reddit

The people pitching snake oil are the ones pretending to be smart right? Im the one accepting human nature.

[-]

anengineerandacat@reddit

Asked myself this question earlier into my career... it's because you need flexibility for the not-so-niche but not-so-clear cases that come up over time.

Can't always put everything in the same VPC because you might have different clients that need access to specific areas, maybe your building some HIPPA related solution, so that introduces complexity via the form of virtual networks.

Auto-scaling, you need to scale your containers (which is pretty trivial) but you also need to sometimes scale the underlying hardware... well that's a whole lot more complex... it might even require the input of a human so as much as cloud providers try to abstract away that human input the complexity doesn't completely go away and a little more is added via policies on how to scale (more configuration).

There are obviously mechanisms to run services without caring about all of the above but even then you can only abstract away operations so much from the developer.

Ie. Serverless functions are in-essence if you squint just containers that run for a short period of time, with a bit of provisioned concurrency they basically just guarantee that "some" are running always and simply shutdown/start to ensure capacity is met.

You still need to worry about things like resource policies (security), VPC's (security & access), and a gateway of sorts (API Gateway or a managed version with a function invocation URL).

You also need to worry about maximum run times and whatever other smaller nuances that are unique to each provider though you could in essence simplify that down to a private VPC with edge routing and let the edge service manage access (but whoops, now you introduced that whole can of worms).

[-]

cap1891_2809@reddit

In the vast majority of cases the default should be fargate unless you really can't afford it, and understand that there's higher maintenance and therefore engineering costs if you choose EC2

[-]

random_guy_from_nc@reddit

Probably cheaper to go to ecs with , powered by spot instances. Maybe something like spot fleet or spot.io.

[-]

caltheon@reddit

Fargate supports spot instances as well.

[-]

WriteCodeBroh@reddit

What happens when your spot instance gets interrupted? I can’t think of a scenario where spot instances would be appropriate for on demand/streaming services. It’s basically just an offering for batch processes that can be interrupted right?

[-]

mrhobbles@reddit

Any service can go down at any time unexpectedly, it’s how you handle it. Does your client have an auto reconnect mechanism? That would enable it to connect to another instance and resume. Additionally if your client buffers ahead it can reconnect to another instance with no noticeable impact to the user.

[-]

WriteCodeBroh@reddit

Sure but if you are running say, a service with 2 spot instances, what happens when both go down? Are you willing to have times when the service is completely unavailable? Maybe, but for the majority of use cases I’m guessing no. Client side handling of service outages can only get you so far in an environment where every node is volatile. I guess maybe you could use spot instances for burst scaling (not reliably) but I would still want at least 1 persistent, reliable node for an on demand use case.

[-]

jdeeby@reddit

We have a data/ML pipeline running on Dagster. All of our jobs use Fargate on EKS. We find this is the easiest way to scale up during peak load without worrying about node autoscaling. We’re a pretty lean team so it’s beneficial for us that we manage less infrastructure.

[-]

avamore@reddit

Yes. People always ask why did I pick fargate over EC2...

Because a team of 1 managing instances in not scalable.

[-]

tomster10010@reddit

How hard is it to find an artist to do a thirty minute sketch instead of an AI image that looks like crap?

[-]

wamon@reddit

Ok boomer

[-]

ChannelSorry5061@reddit

Why didn't you hire an artist to make a sketch to accompany your comment?

[-]

Halkcyon@reddit

It's disappointing, too, since the images don't add anything to the article.

[-]

ExtensionThin635@reddit

Always, ec2 is a pain in the ass and doesn’t auto scale; unless you have stateless apps on vms then it does, but that leave it meaning you aren’t efficiently using resources and making questionable decisions.

[-]

Professional-Yak8222@reddit

Scanning your IaC code is not a waste of time. It helps find issues before they happen in production. Cloud provider alerts may find misconfigurations, but they usually notify after the infrastructure is already live. IaC scanning tools like Checkov, Open Policy Agent (OPA), or Snyk can detect security risks, compliance problems, or inefficiencies in the code. Fixing these problems early can save effort and reduce the risk of bigger issues later. It works as an extra check to go with the alerts from the cloud provider.

[-]

Halkcyon@reddit

You are wrong. ECS is ECS whether you use EC2 or not. EKS is an entirely different product. You are only correct about Fargate being the AWS-managed version of ECS.