Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't understand why code hosting platforms like GitHub, GitLab or BitBucket have so many issues regularly. Is there anything special about it?


> Is there anything special about it?

Yup. You notice when they're down. Whereas when your self-hosted git server goes offline for a couple of hours, nobody else notices.


I am not talking about git hosting per se, but compared to other SaaS companies it seems GitHub/GitLab and Bitbucket are down much more often.


Facebook had an outage recently, Google has had outages. Hulu has an outage every week. You can check https://downdetector.com/ for recent history of many major services.

It's a fair bet that developers are more likely to notice and complain about software services being down on the internet.


I don't know about GitLab and Bitbucket, but GitHub's uptime is relatively phenomenal, given its load and feature set. I've never been in a programming environment where it's been the weakest link among hosted services.


[citation needed]


Lots of users. Also, a lot of package managers rely on hosting platforms like GitHub to host their packages, so if Github breaks, a lot of CI processes around the world break.


Which is kind of ridiculous. If your CI breaks because GitHub is down, it means it's not caching dependencies locally, but keeps re-downloading them every time it runs (e.g. every commit), generating tons of waste and unnecessary load on the hosting service.

Or, to put it bluntly, if your CI works like this, it's contributing to climate change.


I think you are wrong. Our CI infra caches all dependencies, but it depends on github for new internal code pushes (kind of the point). If github is not sending events, CI doesnt kick.

Youre ignoring half of the problem. If you dont receive events from github because they are down, your CI doesnt work either -- dependency caching doesnt matter at that point.


That's assuming you're putting your own organization's code on GitHub. Then of course if GitHub doesn't work, neither does the CI that's hooked to it. This is a separate topic.


Or it's something using something like cargo (rust's package manager) - and checking if any dependencies have a newer version by checking the package registry (which is stored on github for no apparent reason).


That’s an excellent point. How can you tell if the CI system uses caching, other than waiting for a github outage to notice something broke?


Many CI systems provide a build log, do they not? Look for a “git fetch” in it instead of a “git clone”.


Check in documentation, or when in doubt, run a job, redirect GitHub to 127.0.0.1 in /etc/hosts on the CI server, and run that job again.


Define regularly.

I can recall only 2 incidents this year. I think that's not too bad considering the level of traffic they have to contend with.


Considering this is going on for several hours now, their SLA is down to at least 99.9 and going down by the hour. Their business SLA is 99.95% (though I have no idea what it refers to), so it's quite possible that they are in breach.

Still not bad, but 2 incidents like this a year, is usually considered unacceptable for infrastructure service providers.


I wonder how they define their SLA though. If only some of the features are down, does it impact the SLA?


> Define regularly

More often than what is considered a standard 99.99% uptime SLA? (about an hour per year.)

You seem to be making it out like a couple of days a year of lost [1] developer productivity is no big deal.

That said, these things happen and you should probably check your workflows if you're all that blocked by GitHub being down.


> More often than what is considered a standard 99.99% uptime SLA?

GitHub SLA is 99.95% and apparently exclusive to Business Cloud customers[1].

[1] https://github.com/pricing


I wasn't saying it applied. Just what's expected from a large international company relied on by so many.


It seems kind of 5 hours per year.


What's special about it is that you have terabytes of data to keep available, and at scale git does not play well with technologies like NFS or cloud object storage like S3, so each major provider either pays a lot of money to specialized vendors or has homegrown solutions to deal with the problem.

So on top of your usual problems with keeping a cloud service up and running, you also have that git IO problem to contend with, and to rub salt in the wound, that wrinkle also makes it difficult to fully adopt many "standard" cloud architectures or vendors (such as AWS) which work for non-IO-heavy applications: you always have this major part of your infrastructure that has this special requirement holding you back at least partially (and that can hurt your availability for related services which are not even IO-heavy).

(That said, it's hard to guess whether that was the problem, a contributing factor, or unrelated entirely based on the details provided here.)

source: I work at Atlassian (though not on the Bitbucket team) and occasionally chat to current and former Bitbucket devs on this topic.


I suspect it's just that you hear about them all


It's just noticeable when it happens during someone's work day.


No, that's what a typical single organization with a typical RDBMS-centric non-resilient architecture can provide. But of course it's very hard for organizations at certain sizes to do it better, it's something they have to start with.


It’s hard.

But really, do they have that many problems?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: