Hacker News new | ask | show | jobs
by DevX101 4613 days ago
Dropbox wasn't a trivial engineering problem. While making facebook for a couple thousand users isn't too hard, scaling it to hundreds of millions was no trivial feat either.
7 comments

I was under the impression Dropbox is a UI to Amazon S3. I use them and don't mean to deny it's a slick package. But I could duplicate the entire thing in about a day with inotify and rsync. The scaling is mostly on Amazon.

That said, I must point out that the scaling issues all of these super modern "change the world" companies, such as Twitter and Facebook, have are purely for the purpose of centralizing power. It's a self-inflicted headache. It's great business to be in control of millions of users. But it's a sad state of affairs for the internet in general. We've gone from open, decentralized, and peer-to-peer protocols to one company controlling everything.

There are interesting scaling problems. But Facebook isn't one of them.

> But I could duplicate the entire thing in about a day with inotify and rsync. The scaling is mostly on Amazon.

No, you could duplicate a narrow piece of Dropbox's functionality in a day. You could not even get anywhere near duplicating the product and the service in anything less than months.

Dropbox pretty much uses librsync and, I believe, inotify on Linux. With Amazon S3. That is Dropbox. Literally.

Polishing any product takes time. But that's universal and has nothing to do with engineering. What exactly is non-trivial about Dropbox?

Some of the technical challenges I can think of:

1. Version control.

2. LAN sync.

3. Dropbox probably also does a lot to try and minimize their S3 costs - so probably a good deal of compression/duplication-detection on the backend.

4. Any and all networking issues. Even if using OSS gets you 98% of the way, Dropbox probably gets you 100% of the way.

5. Performance. I don't know what Dropbox is doing, but about a year ago I tested out Google Drive, SkyDrive, Dropbox, and SugarSync. I didn't want to use Dropbox because it gives you the least free storage, but Dropbox was consistently 2x faster than the competition (without using LAN sync). That makes a huge difference so I switched to it, and if Dropbox engineers are beating out Google and Microsoft engineers, they're clearly doing something special.

It's a shame that the "Shit HN Says" Twitter account is dead. This would be a dead cert to end up on there.

In all seriousness, from a highly abstracted view, Dropbox isn't all that complicated. However, the same things are said about numerous other projects. I remember when Stack Overflow took off, and everyone around here said "I could build Stack Overflow in a weekend".

It turns out that, yes, you can build Stack Overflow in a weekend. However, all you've managed to build is a basic Q&A site. What you've missed is practically everything that makes Stack Overflow work. The rep count probably doesn't work as well, the moderator controls aren't there, the OpenID integration isn't there, the markdown controls aren't as sophisticated, it's slower, etc.

I don't have a link to it, but not long ago there was a repost of the original "Show HN" for Dropbox, where there were similar criticisms of how "trivial" Dropbox is to implement. The devil is in the details.

Also, I have to admit that I used to be wayyyy too stoked on scripting languages until I did a thought exercise realizing that the most interesting thing I could then ever do in the web-world is make an RPC to a far more interesting service.

Do you know about Facebook's architecture? I'll give you a hint which was brought to my attention a couple weeks. PHP is sitting on their front end, but trust me it is NOT doing any heavy lifting...

Lately I feel much more like "everything has its place" and I'm not so invested in hack hype. Java I'm seeing is a great language because it's kindof nestled between C++ and the web scripting/framework world. Lots of possibility for unique, almost C++ desktop-software-level but many strong frameworks and libraries for the web as well. And there are a lot of JVM languages that tie neatly together, so there is kindof a well-knit multi-lingual toolkit that is slightly less daunting than working with alien PHP-to-C++ RPCs (though this may be all psychological).

I'm not sure of the implication you're making there, and it confuses me.

Defined 'frontend' in that context? Also, are you referring to the HipHop compiled PHP as the 'backend', or are there some fancy Go/Erlang/Lisp/whatever tricks acting as MQ layers, shuttling data around that I'm not aware of?

I was implying that PHP wasn't doing any of the heavy business logic, but then I re-read some articles (most notably this one) & I'll admit I'm a bit confused.

https://developers.facebook.com/blog/post/2010/02/02/hiphop-...

I WAS referring to PHP as frontend, but based on the article it seems like there may be a lot more HipHop PHP than I suspected, because they mention that it would be too niche to write a ton of C++ code. However, there is this line which I think clarifies somewhat -- "HipHop allows us to write the logic that does the final page assembly in PHP and iterate it quickly while relying on custom back-end services in C++, Erlang, Java, or Python to service the News Feed, search, Chat, and other core parts of the site."

It seems like even though PHP is compiled & heavily utilized, it is still used mostly for page assembly on top of a Service-Oriented Architecture, with services written in C++, Erlang, Java, & Python.

Scaling is not domain complexity. Scaling may require significant technical expertise, but it's a safe problem: you only do it when you need to, which implies traction.
In his post he acknowledged the scaling difficulty, but the point is, having to scale is a good problem to have and you'll be able to afford good engineering at that point since you have "traction".
Scaling a LOLCats database to hundreds of millions of users isn't trivial either.
ha yes but his point is that once these problems reach a non-trivial point the big guns get brought in. It's not the startup team solving them...
Trivial is not necessarily the opposite of difficult though. Solving an easy but non-trivial problem can be done with average engineers, if you a sufficient number of them to get the work done.

This is what happens when a company hires genuinely exceptional people to do easy but non-trivial work:

http://www.quora.com/Working-at-Google-1/Whats-the-worst-par...