Ah the old cloud provider switcheroo. Yip this is the way they make money. They ...

Kneecaps07 · on Jan 24, 2022

There's an argument to be made for quality of life for your employees. As someone who has transitioned from on-prem server management to mainly cloud work, my job happiness has skyrocketed. I haven't set foot in a data center in three years and I do not miss it one bit.

Dealing with hardware failures, hardware vendors, confusing licensing, having to know SKUs, racking new cabinets, swapping hard drives, patching servers - it's all awful work. When you go cloud only, you can be more productive instead of dealing with some of that nonsense work.

drdaeman · on Jan 24, 2022

I always was a software developer first, but in the old days I spent enough time in the server rooms doing all sorts of sysadmin work, and those days I dabble in devops.

And, honestly, I miss the old days. Today, $cloud has some weird spasms where you suddenly get an influx of connection timeouts or tasks waiting for aeons to get scheduled and you just can't log in to a switch or a machine and figure out what the exact hell is going on. You just watch the evergreen $cloud status page, maybe file some tickets and pray someone bothers to investigate, or maybe live with those random hiccups "sorry $boss, everything is 100% good on our side, it's $cloud misbehaving today", adding more resilience -> complexity -> unreliability in the name of reliability to the system. Either way, with the clouds I feel handicapped, lacking the ability to diagnose things when they go wrong.

I don't miss those three days we spent fighting a kernel panic. Was about a decade ago - we outgrew the hardware and had to get a new one with a badass-at-the-time 10GB SFP+ NIC that worked nice for the first few weeks but then its driver suddenly decided to throw some tantrums on almost a hourly basis. I don't even remember the details - a lot of time flew since then, but thankfully we found some patch somewhere in the depths of LKML and the server was a perfect clockwork ever since. That wasn't fun, but that was an one-in-many years incident.

Either way, I do feel that in the ancient ages hardware and software used to be so much more simple and reliable. Like, today people start with those multi-node high-availability all-the-buzzwords Kubernetes-in-the-cloud monstrosities that still fail now and then (because there are so many moving parts shit's just bound to fail at incredible rate), and in the good old days people somehow managed to have a couple of servers in the rack - some proper, some just desktop towers sitting by - and with some duct tape and elbow grease those ran without incidents for years and years.

Have I turned old and sour? Or maybe it's just the nostalgia about the youth, and I've forgotten or diminished most the issues while warmly remembering all the good moments?

pojzon · on Jan 24, 2022

Cloud popped up mostly due to ease of use. Its a lot easier to hire cloudops engineer with somehow enough knowledge to deploy something on the cloud than someone who will be managing a datacenter and have it running.

The later ppl still do what they did, they just work for Cloud Providers making probably quite a bit more than they did previously.

IMHO its a win win situation for everybody. Less skilled engineers can be “productive” and former sysadmins have huge salaries.

Symbiote · on Jan 24, 2022

In between your two extremes are colocation (no managing buildings, power, cooling, racks, security, optionally network), dedicated servers (no managing/installing servers, disks, warranties) and basic VMs.

agsamek · on Jan 28, 2022

We do colocation and we have to deal with HD and ram failures from time to time. Replacement of the hardware part is managed by the provider, but discovery and software requieres our involvment.

I just wonder what happens if a ram or hd failure hits a cloud provider node. Is the architecture on average really able to come over such failures without help and intervention.

kortilla · on Jan 24, 2022

This reads like a software engineer being happy work caters lunch so he/she didn’t have to cook for the whole team anymore. Didn’t anyone discuss maybe hiring a cook?

fragmede · on Jan 24, 2022

Yes but soon then you're running a kitchen and then a cafe and catering business, as well as a software startup. Which, given how many startups had in-office lunch/food pre-covid is maybe not a bad way to think of that.

kirubakaran · on Jan 24, 2022

Ah yes the Maserati problem https://www.quora.com/Whats-a-Maserati-Problem

By the time you're running a [catering business | massive sysadmin team], you're already a huge success. Congratulations!

Dave3of5 · on Jan 24, 2022

I think this depends. For OPS people no longer having to physically go into a DC I agree but you've now pushed a bunch of work developers especially now will have a harder time as they used to make code and there was someone who sorted infrastructure now the devs themselves are kept up all nights with AWS stuff going up and down.

If cloud improved QOL for ALL employees I'd agree but I think it just shifts work around and costs more.

goodpoint · on Jan 24, 2022

> Dealing with hardware failures ... it's all awful work

I've met plenty of datacenter technicians that loved they work and the opportunities for growth it provided.

Some companies really know how to manage a datacenter with minimum pain. Some don't.

BlueTemplar · on Jan 24, 2022

It's not like all those jobs have been taken over by automation - someone still has to take care of these cloud servers ?

Handytinge · on Jan 25, 2022

> Dealing with hardware failures, hardware vendors, confusing licensing, having to know SKUs, racking new cabinets, swapping hard drives, patching servers - it's all awful work.

Each to their own, but I think you'll find there's a fairly significant portion of sysadmins who love that work!

3pt14159 · on Jan 24, 2022

I can see both sides. If you're a startup that needs to be able to scale quickly if product market fit is achieved, the cloud really saves your bacon. Or is your ten person team really going to figure out how to get Postgres to reliably run with billions of records, with encrypted backups, etc?

It's basically a form of permanent debt. Faster product market fit, higher long term infrastructure costs until you have enough breathing room to start pulling it into your own datacenter. At that point you have some negotiating leverage with the cloud provider.

On the other hand, if you're not looking for explosive growth man oh man is DigitalOcean or anyone of a number good providers of good old VPSes / Cloud-lite.

capableweb · on Jan 24, 2022

I keep hearing this argument against using your own infrastructure again and again, and I'm not sure how true it is.

I've worked with teams on both sides, and everyone is gonna have to deal with figuring out how to run at scale, it's just different ways of achieving that.

I've worked with teams that manage their own infrastructure with dedicated servers, and not having to think about scaling for a long time as the one beefy server could just take whatever load you threw at it.

I've also worked with teams who don't manage their own infrastructure and thought they were ready to scale without issues, but once the scale actually happened, it turned out there was more things to consider than just the amount of servers you run, race-conditions were everywhere but no one thought about that.

Definitely a case of "right tool for the right job", but I don't think it's as easy as "Self-managed: harder to scale, PaaS/Cloud: easy peazy to scale".

Dave3of5 · on Jan 24, 2022

Yeah agreed I haven't worked with Google scale companies but I've always found scaling issue to to development related not infrastructure related. So examples would be a bad db query that takes the system down, overly chatting webserver that issues too many queries to the backend, pulling large datasets into the webapp causing exhaustion of memory ...etc. AWS / Azure can't be these issues they have to be fixed in your code.

There is definitely a place for AWS/Azure and their offering of services is fantastic but they are not a silver bullet for scaling your website to millions of active user.

On another point though the vast majority of websites you'll ever build won't have that level of active users. It's a good problem to have though as it means your site is doing really well.

Hermitian909 · on Jan 24, 2022

> I've always found scaling issue to to development related not infrastructure related. So examples would be a bad db query that takes the system down, overly chatting webserver that issues too many queries to the backend

This is actually one of the strengths of the cloud, startups that can't afford talent throw compute resources at the problem. Running your own servers isn't hard per se, but it requires a certain breadth of less centrally documented knowledge than the cloud and a willingness to fuss. Developers like that can often command higher prices than most startups pay these days :)

dijit · on Jan 24, 2022

Having someone with good cloud chops is still a difficult ask.

Putting it all on the devs is exactly how you end up in the haveibeenpwned database and on the cover of magazines (for the wrong reasons).

We’ve traded sysadmins for more expensive DevOps. I would love to see a study on if we actually hire less people than if we just did it the old school ways.

martinald · on Jan 24, 2022

I don't disagree; but I think the cloud (AWS/Azure/GCP) have sort of shielded people from how cheap/powerful the underlying hardware has became.

For ~100eur/month on hertzner you can get a 16core Zen3, 128GB RAM with 8TB of NVMe SSD.

Unless your stack is horrendously badly optimised you can serve SO MUCH traffic off that - definitely billions of postgres records without breaking a sweat.

So the scale argument somewhat disappears - if anything, people end up adding much more complexity to the product to get round the high hardware costs of the cloud (complex caching systems for example, instead of just throwing loads of hardware at the problem).

jjav · on Jan 24, 2022

> I don't disagree; but I think the cloud (AWS/Azure/GCP) have sort of shielded people from how cheap/powerful the underlying hardware has became.

I guess I shouldn't be surprised, but I do find myself often surprised to realize that for a younger generation of developers they have never experienced hosting on bare metal. So they have not been exposed to costs & benefits vs. the cloud approach and feel that no local machine could ever be as fast as AWS. Even though in reality even a pedestrian server is immensely faster and cheaper than any AWS offering.

Now, sure, there are tradeoffs in ease of scaling up and other considerations, but it's good to keep and eye on the actualy tradeoffs you're making and how much it's costing.

dkarl · on Jan 24, 2022

As a software developer, I think the best thing about the cloud is knowing that if you need the capacity, and it makes sense cost-wise, you'll get it. In-house servers might be cheap, but in my experience it could be incredibly hard to get that money spent when it's needed, and I've seen companies throw expensive software engineering time at optimizing software when it would have been much cheaper to solve the problem with hardware.

Not only can you end up spending $10k of engineering time to optimize and test a random, non-core-competency bit of code instead of an extra $1k/year on hardware, you also have to maintain the optimized code instead of the simpler code.

Maybe I just worked at companies that did a poor job of managing servers, or had a dysfunctional relationship between software engineering and operations, but at least that's no longer something I have to worry about in a cloud environment. If spending a little extra on hardware is the best solution to the problem, process/planning/politics won't get in the way.

Nextgrid · on Jan 25, 2022

> in my experience it could be incredibly hard to get that money spent when it's needed, and I've seen companies throw expensive software engineering time at optimizing software when it would have been much cheaper to solve the problem with hardware.

That's true with owning your hardware, but what about renting from Hetzner/OVH/etc? You get servers set up in minutes unless you have a very specific request (the only time I've had lead time with these providers is when I had a very custom request, a machine with 300+ TB of storage - yes that is not a typo). Everything else has been delivered pretty much instantly.

But even if let's say you have a very specific use-case such as needing a 300TB server that would typically require lead-time, well in that case the prices are so cheap that you can just keep it around all the time sitting mostly unused and still come out ahead compared to cloud pricing.

jjav · on Jan 25, 2022

> As a software developer, I think the best thing about the cloud is knowing that if you need the capacity, and it makes sense cost-wise, you'll get it.

Yes, that's the beauty of it and sometimes you need it.

OTOH, how often do you need to grow capacity without any lead time like that? If you are in a hyper-growth stage in a startup you absolutely need it and it is a lifesaver.

But, most companies never see a hyper-growth stage. Even those which do, it's a relatively short timeframe (you can't grow exponentially very long).

All the rest of the time it's a fairly large premium to pay just in case another hyper-growth period happens. Sometimes it's totally worth it. But good to review the likelyhood and cost tradeoffs every now and then.

martinald · on Jan 25, 2022

To give you an example - we run quite a lot of workloads on Azure app service, which isn't the same as bare metal, but does allow serious scaling if required.

We run most workloads on a 3.5GB/2 "vCPU" box. This costs around $70/month per instance. We actually haven't scaled this out past 8 instances, at a cost of $560/month (and that has been extremely rare).

On bare metal we could have ran it on a $100/month 16core/128GB box and always had that capacity in reserve. While app service gives a lot of benefits, the scalability argument is somewhat moot as basically you can provision all the capacity you would scale to 24/7 and still the same/less than cloud.

Maybe it's just the projects I've worked on, but I haven't really ever seen people require 100x or 1000x the capacity in a very short period of time (which obviously bare metal could not do). I've seen traffic grow that much - but generally over weeks, months or years.

folmar · on Jan 25, 2022

> the best thing about the cloud is knowing that if you need the capacity, and it makes sense cost-wise, you'll get it

It stopped to be the case in pandemic times at most cloud operators due to general hardware and capacity shortage.

bee_rider · on Jan 25, 2022

AWS seems to have some pretty decent Xeons (hard to tell because Intel makes special SKUs for Amazon, I think). I guess it depends on what you consider 'a pedestrian server' -- 128 threads/512GB of memory isn't cheap, although maybe in the enterprise universe maybe it is, I'm more of an academic. So, it is nicer than the 10 year old cluster I tinker around on, not as nice as the system I send real runs to...

Ostrogodsky · on Jan 24, 2022

> For ~100eur/month on hertzner you can get a 16core Zen3, 128GB RAM with 8TB of NVMe SSD.

What option is that? The closest I see is the CCX41, but that is 40% more expensive, 140 Eur/month, half the RAM (64 GB) and ~4% of the disk space (360 GB)

https://www.hetzner.com/cloud

flutas · on Jan 24, 2022

All I can see is maybe the AX101? It matches all the specs they put down, although the SSD is RAID 1 @ 4TB total.

https://www.hetzner.com/dedicated-rootserver/ax101

martinald · on Jan 24, 2022

Yes, 8TB total but in RAID. Also keep in mind Hertnzer quotes prices VAT inclusive, whereas most clouds add VAT on top. For US customers you can take ~20% off those prices.

Dave3of5 · on Jan 24, 2022

> Or is your ten person team really going to figure out how to get Postgres to reliably run with billions of records, with encrypted backups, etc?

Actually AWS won't help you here. I have literally been on a 2 day training course or aurora with AWS and the explanation of how to scale was actually just the same as any traditional non-cloud explanation. Correct usage of indexes, partitioning data, optimising queries (especially any non trivial query output by an ORM) and read replicas.

In terms of explosive growth if you're talking about something like google or tiktok again slapping it all in AWS will not automatically just work. There is a lot of engineering that you'll need to get to their level.

I also think you haven't really looked at the SO link I sent through with thoughtful engineering they have huge user base with a tiny footprint.

> DigitalOcean or anyone of a number good providers of good old VPSes / Cloud-lite

Not sure why you are dunking on DO here they are a fully fledged cloud provider with much the same stuff you would need. You can also run up a huge bill on DO as well.

Bedon292 · on Jan 24, 2022

There are two parts to this. You are correct that RDS doesn't help you with picking the index strategy, or optimizing queries. I don't see that as running the DB though, that is how you interact with it once its running. What it does do it help you reliably run the DB server itself.

Without any effort you can stand up a redundant, high availability deployment. With all of the data encrypted at rest. And configure nightly backups, which are stored on redundant storage in multiple physical locations and also encrypted. You can then restore those backups into a working system with the click of a button. Oh, and minor version patches happen automatically with no downtime. And you can click a button to do major version updates.

The last time I did analysis on it, which was a while ago, all of those features cost us less than 8 hours of my time each year. It would probably take more than 8 hours of my time each year just to handle security patches on the systems. Let alone the amount of engineering that it would take to get a system as redundant and reliable as a DB in RDS. I will happily pay them to take all of that off my plate so I can focus on other things, like optimizing the queries.

jjav · on Jan 24, 2022

> Without any effort you can stand up a redundant, high availability deployment.

Yes, it is seductive. Sometimes worth it.

But realize you'll be paying monthly in perpetuity for the convenience of that one-time setup which could've been done a a few days, give or take.

> all of those features cost us less than 8 hours of my time each year

I'm surprised! Our RDS costs are about 10 engineering hours per month (120 eng/hrs per year). This is with hardly any customer traffic or data yet (early startup phase).

It's worth it for now, but it'll become unreasonably expensive later.

Bedon292 · on Jan 24, 2022

I should clarify that the 8 hours was above and beyond the costs of running it yourself on AWS. So that is not counting the 2x ec2 instances, plus the minor s3 and elb costs. Didn't really run the numbers for equivalent hardware elsewhere, since that wasn't an option for us. Eyeballing it real quick right now, its still maybe an hour / month vs other places for the hardware. It is a relatively small instance though, saving probably are much better as it gets to larger sizes. Pre-paying for reserved instances helps here as well.

ignoramous · on Jan 24, 2022

> I can see both sides. If you're a startup that needs to be able to scale quickly if product market fit is achieved, the cloud really saves your bacon.

Depends on the team size of the said startup [0]. In my opinion, tech-shops are better off using new-age cloud providers like fly.io / glitch.com / render.com / railway.app / replit.com / deno.com / workers.dev etc [1].

[0] https://tailscale.com/blog/modules-monoliths-and-microservic...

[1] https://www.swyx.io/cloud-distros/

fiddlerwoaroof · on Jan 24, 2022

> is your ten person team really going to figure out how to get Postgres to reliably run with billions of records, with encrypted backups, etc?

Most of the problems here will be DBA problems like understanding query plans and such. Even with AWS RDB, I’ve had to upload various setting files to tweak tunables to get things working.

mcbain · on Jan 24, 2022

That stackoverflow infra blog post is out of date. They use more than a single webserver now. For example: https://stackexchange.com/performance

dijit · on Jan 24, 2022

Now they have 9.

They still serve a lot more traffic than I do and I have hundreds of instances; thousands of containers.

nightpool · on Jan 24, 2022

You have thousands of containers? Physician, heal thyself.

dijit · on Jan 24, 2022

I mean, at my last job I had thousands of physical machines too.

Scale can depend on many things.

Here's a couple of reasons why it can easily be thousands:

1) Cronjobs, CI jobs, ETL, FaaS are all systems that exist. What used to be a process is now a container. (one need only check the PID count on their local machine to know that this can be many quite easily).

2) Microservices; I'm a larger fan of fat "services" but doing actual micro services tends to leave you with a lot of containers running

3) Actual compute need. If my original hosting strategy was thousands of machines, well, I'm going to have thousands of containers, if not more.

nightpool · on Jan 24, 2022

Sure, but the implied message of your comment that you were saying you could replace all of your instances and containers with just 9 machines, since StackOverflow "serves a lot more traffic than you do" (i.e. "has more actual compute need"). I think most reasonable engineers would say that "thousands" of containers would be a massive mistake to use for that size of task, even if few of them would go to the extent that Stack Overflow did of using only 9 machines.

folmar · on Jan 25, 2022

Thousands is not a lot. If you do microservices and have 100 of them, 3 replicas of each for dev, qa and prod, you already are at 900.

andrewxdiamond · on Jan 24, 2022

Most importantly, SO is extremely read-heavy, write-lite, and cache-friendly.

A similar “scale” e-commerce site would be significantly more load, have more dynamic data, and just be overall harder to run.

Dave3of5 · on Jan 24, 2022

Looks like they have actually reduces their footprint. It not that they do run on a single webserver it's that they can run on one.

chasd00 · on Jan 24, 2022

> Looks like they have actually reduces their footprint.

i don't remember who said it but a quote i really like is "it's not finished when there's nothing left to add, it's finished when there's nothing left to take away"

mark-r · on Jan 24, 2022

It's commonly attributed to Antoine de Saint-Exupéry and is a lot older than I thought, from 1935 and originally in French.

https://english.stackexchange.com/q/38837/178351

traceroute66 · on Jan 24, 2022

See also Let's Encrypt: https://letsencrypt.org/2021/01/21/next-gen-database-servers...

nova22033 · on Jan 24, 2022

our department now spend > £1 million per year to AWS in hosting costs. A 20% reduction in those fees could pay for a few sysadmin(s).

You can hire a "few" sysadmins for 200k/year?

Dave3of5 · on Jan 24, 2022

In the UK/Europe yes:

https://uk.indeed.com/jobs?q=System%20Administrator&vjk=5149...

Probably not at FAANG level salaries but I doubt there are many sysadmins working for FAANG companies anymore.

DevOps btw are more expensive and infact in the UK DevOps can be higher paid that a developer. I suspect most of the DevOps working for this company are on £65k+. According to:

https://ifs.org.uk/tools_and_resources/where_do_you_fit_in

That puts those earners in the top 3% or from that website:

" In the below graph, the alternatively shaded sections represent the different decile groups. As you can see, you are in the 10th decile group.

In conclusion, Your income is so high that you lie beyond the far right hand side of the chart. "

mattbee · on Jan 24, 2022

£200k / year, in the UK? That's about 2-5 depending on experience.

sparselogic · on Jan 24, 2022

A 20% reduction would result in ~£800k/yr.

Arnavion · on Jan 24, 2022

They're saying that if the AWS costs decreased by 20%, they could use the now freed-up money, ie 200k, to pay sysadmins.

InefficientRed · on Jan 24, 2022

> > £1 million per year

I'm curious about your workload. I tend to only use cloud for workloads where it's either (1) by far the only feasible option (e.g. need GPUs for short periods of time), or else (2) basically free.

> I mean I'm not against cloud it's just not the cheapest option

This is certainly true for most workloads. It's also true that buying is better than renting, but here I am living in a rented apartment.

The logic from on high might be something like "if demand is uncertain and capex is risky, why buy when you can rent?"