Hacker News new | ask | show | jobs
by ssanderson11235 2971 days ago
Speaker here. If you want to follow along with the slides from the talk, you can find them at https://speakerdeck.com/ssanderson/hosting-notebooks-for-100....

Also, happy to answer any questions that people might have.

4 comments

I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.

A few basic questions: Why multiple hubs? (was there some point of scale where you needed this) Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)

Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.

Thanks for the great talk!

> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1. Having multiple hubs makes it much easier to do zero-downtime deploys.

2. Having multiple hubs makes us more resilient to transient machine failures.

3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).

That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.
Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.
The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.

I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).

Do you know if there's any way that we can see what's happening during the mini demo?
I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.
Hey Scott, how did you get your unlisted YouTube link for your presentation? I don't think I ever found mine from the same conference.
I found it in the YouTube playlist from the event: https://www.youtube.com/playlist?list=PL055Epbe6d5aP6Ru42r7h....
Weird. Mine isn't there, or I am blind. shrugs Though I did find it by searching the O'Reilly user :)
Come on. Quantopian doesn't have 100k users... maybe 5k?

EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.

I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.