Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1. Having multiple hubs makes it much easier to do zero-downtime deploys.

2. Having multiple hubs makes us more resilient to transient machine failures.

3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).



That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: