I don't understand why code hosting platforms like GitHub, GitLab or BitBucket h...

laumars · on Oct 22, 2018

> Is there anything special about it?

Yup. You notice when they're down. Whereas when your self-hosted git server goes offline for a couple of hours, nobody else notices.

nik736 · on Oct 22, 2018

I am not talking about git hosting per se, but compared to other SaaS companies it seems GitHub/GitLab and Bitbucket are down much more often.

ashelmire · on Oct 22, 2018

Facebook had an outage recently, Google has had outages. Hulu has an outage every week. You can check https://downdetector.com/ for recent history of many major services.

It's a fair bet that developers are more likely to notice and complain about software services being down on the internet.

sagichmal · on Oct 22, 2018

I don't know about GitLab and Bitbucket, but GitHub's uptime is relatively phenomenal, given its load and feature set. I've never been in a programming environment where it's been the weakest link among hosted services.

kojeovo · on Oct 22, 2018

[citation needed]

skocznymroczny · on Oct 22, 2018

Lots of users. Also, a lot of package managers rely on hosting platforms like GitHub to host their packages, so if Github breaks, a lot of CI processes around the world break.

TeMPOraL · on Oct 22, 2018

Which is kind of ridiculous. If your CI breaks because GitHub is down, it means it's not caching dependencies locally, but keeps re-downloading them every time it runs (e.g. every commit), generating tons of waste and unnecessary load on the hosting service.

Or, to put it bluntly, if your CI works like this, it's contributing to climate change.

goykasi · on Oct 22, 2018

I think you are wrong. Our CI infra caches all dependencies, but it depends on github for new internal code pushes (kind of the point). If github is not sending events, CI doesnt kick.

Youre ignoring half of the problem. If you dont receive events from github because they are down, your CI doesnt work either -- dependency caching doesnt matter at that point.

TeMPOraL · on Oct 22, 2018

That's assuming you're putting your own organization's code on GitHub. Then of course if GitHub doesn't work, neither does the CI that's hooked to it. This is a separate topic.

gpm · on Oct 22, 2018

Or it's something using something like cargo (rust's package manager) - and checking if any dependencies have a newer version by checking the package registry (which is stored on github for no apparent reason).

realreality · on Oct 22, 2018

That’s an excellent point. How can you tell if the CI system uses caching, other than waiting for a github outage to notice something broke?

saagarjha · on Oct 22, 2018

Many CI systems provide a build log, do they not? Look for a “git fetch” in it instead of a “git clone”.

TeMPOraL · on Oct 22, 2018

Check in documentation, or when in doubt, run a job, redirect GitHub to 127.0.0.1 in /etc/hosts on the CI server, and run that job again.

NL807 · on Oct 22, 2018

Define regularly.

I can recall only 2 incidents this year. I think that's not too bad considering the level of traffic they have to contend with.

Illniyar · on Oct 22, 2018

Considering this is going on for several hours now, their SLA is down to at least 99.9 and going down by the hour. Their business SLA is 99.95% (though I have no idea what it refers to), so it's quite possible that they are in breach.

Still not bad, but 2 incidents like this a year, is usually considered unacceptable for infrastructure service providers.

dudul · on Oct 22, 2018

I wonder how they define their SLA though. If only some of the features are down, does it impact the SLA?

lugg · on Oct 22, 2018

> Define regularly

More often than what is considered a standard 99.99% uptime SLA? (about an hour per year.)

You seem to be making it out like a couple of days a year of lost [1] developer productivity is no big deal.

That said, these things happen and you should probably check your workflows if you're all that blocked by GitHub being down.

metaphor · on Oct 22, 2018

> More often than what is considered a standard 99.99% uptime SLA?

GitHub SLA is 99.95% and apparently exclusive to Business Cloud customers[1].

[1] https://github.com/pricing

lugg · on Oct 24, 2018

I wasn't saying it applied. Just what's expected from a large international company relied on by so many.

dlahoda · on Oct 22, 2018

It seems kind of 5 hours per year.

caspar · on Oct 24, 2018

What's special about it is that you have terabytes of data to keep available, and at scale git does not play well with technologies like NFS or cloud object storage like S3, so each major provider either pays a lot of money to specialized vendors or has homegrown solutions to deal with the problem.

So on top of your usual problems with keeping a cloud service up and running, you also have that git IO problem to contend with, and to rub salt in the wound, that wrinkle also makes it difficult to fully adopt many "standard" cloud architectures or vendors (such as AWS) which work for non-IO-heavy applications: you always have this major part of your infrastructure that has this special requirement holding you back at least partially (and that can hurt your availability for related services which are not even IO-heavy).

(That said, it's hard to guess whether that was the problem, a contributing factor, or unrelated entirely based on the details provided here.)

source: I work at Atlassian (though not on the Bitbucket team) and occasionally chat to current and former Bitbucket devs on this topic.

redwood · on Oct 22, 2018

I suspect it's just that you hear about them all

fouc · on Oct 22, 2018

It's just noticeable when it happens during someone's work day.

zzzcpan · on Oct 22, 2018

No, that's what a typical single organization with a typical RDBMS-centric non-resilient architecture can provide. But of course it's very hard for organizations at certain sizes to do it better, it's something they have to start with.

madeofpalk · on Oct 22, 2018

It’s hard.

But really, do they have that many problems?