Hacker News new | ask | show | jobs
Ask HN: how to push to a live site?
9 points by iphpdonthitme 6481 days ago
So I've read a fair amount of books and links on the internet describing how to scale stuff. Something I haven't come across are general strategies on pushing to a live CRUD site without taking it down. Can anyone point me to resources that I've missed, or perhaps just tell me?
7 comments

Deploying a long-running JavaScript application, like 280 Slides (and most Cappuccino apps), on one hand can be trivial (just copy the client resources to your webserver!), but also can be an interesting challenge. Namely keeping clients running an old version of the client-side code using the corresponding version of the server-side app. When you add in something like Gears for offline access it gets even tougher. There was a good presentation at Google IO that covered all these issues: http://sites.google.com/site/io/taking-large-scale-applicati...

We wrote a little custom code (we call it "bake") for deploying Cappuccino applications:

"bake": http://github.com/280north/cappuccino/tree/master/Tools/bake...

sample "bakefile": http://github.com/280north/cappuccino/tree/master/Tools/bake...

It pulls your code from git (but could easily do local files, scp, rsync, svn, etc), runs an optional build command (like "ant"), copies source paths to destination paths in a deployment directory, gzips the deployment directory, scp's the code to your server(s), ungzips it, and does a little magic...

Each version is placed in it's entirety in a uniquely named (unix timestamp) subdirectory. We could just redirect from "/" to "/1221268756/" (for example) but that's incredibly ugly, so we use the little known HTML <base> tag to trick it. The index.html file in "/" is identical to the one in "/1221268756/" except it has a <base> tag which tells the app all URLs are relative to "/1221268756/" instead of the default containing directory ("/").

And it actually seems to work really well. The big advantage of this is you can set your cache expire date arbitrarily far in the future, and your entire app will be cached until you change index.html to point to a new <base>. 280 Slides, which is ~2.6MB uncompressed, loads on my computer in about 1.5 seconds if it's cached. The only problem with this approach is when you deploy, all clients will have to re-download every resource, even ones that don't change. A more granular system would be ideal, but significantly more complex.

I looked at Capistrano briefly but decided against it for some reason I can't remember. Perhaps that would have been better, but c'est la vie...

Approach 1)

A long time ago I worked (in a very small testing capacity) at a very large video/streaming media serving company.

As I recall, the scripted procedure was to tag a release in CVS, and a script would pull that down into a new directory and swing the symlink.

This site had a very mature and infrequently changing billing system code path, which I believe was not modified in this process. If this doesn't describe your workload you would probably have to change this to support concurrent execution of multiple versions for people who have a session established

Approach 2)

Using any mainstream hardware load balancer (or presumably a similarly featureful free/open software LB), configure it to point at N+1 machines in a cluster. Administratively remove machines from the new session pool one at a time (virtual machines can make this flexible and easy to roll back). Once the established sessions have expired or been forced out, upgrade the software and roll them back in.

One neat aspect of this approach is if you have an "oh shit" hockey stick scaling issue, you can watch it happen on one machine before deploying it to every machine in your cluster. Also good for A/B testing, as mentioned in recent articles here.

If you happen to be using Ruby, then Capistrano (http://www.capify.org/) is awesome.

I imagine there are similar solutions for other languages.

Worst case scenario, you can roll your own. A very basic trick is to use symbolic links on your server - deploy the site to a new folder, and simply point the symbolic link to the new folder when you're done.

Capistrano works great for non-Ruby sites too! Our site is in PHP and it was easy to get cap working for us. We also use github.com so the site essentially pulls from there, which means we can also rollback in case we ever need to.
Concur, I looked a loooong time for something better then home grown solutions.

Capistrano is great. I've used it to great success deploying Django(Python) apps to complex environments.

I second that, some tweaking the recipe and Capistrano does a neat job with PHP sites as well!
Could you define "push"? If it what I think you mean, you may want to check out some of the deployment chapters in the documentation of rails, django and other frameworks. They're usually not too bad at explaining how to get from dev to live.

SVN also has some documentation on their site, and I imagine git would have something similar.

Using lisp if you have a network based REPL you can modify the running code live without taking the site down.
We push code changes with Git to the cloud, then log into the server and git pull. Then we use the repl on the cloud server to reload the code causing a recompile of changed code including dependencies. Strangely enough, the site continues to function during the recompile, which usually completes in less than a minute depending on how many macros are effected by the change.
Deploy by svn. It's amazingly simple, and simple is usually best.

If you're smart you can orchestrate db changes non-destructively. If less smart use capistrano.

What's an example of a db change that "smart" people can do where Capistrano allows you to be less smart?
The capistrano approach is to collect a set of SQL sequences that will alter a sample server configuration from one configuration to another, with the purpose of repeating this on other servers such as your production server.

This is fine if you are paid by the hour.

The smarter way is to make changes that span codebase versions. You want to normalize a table so entities can have more than one address? Build an addresses table crosswalked to entity id and let the new code use it. Once you're happy with it you can drop some columns, but if you need to roll back, you haven't thrown anything away.

To me the difference is like that between hiring a hooker and charming a cheerleader. Either way you should make backups. ymmv.

Hmm, I don't think I understand the diference. Aren't you both in the capistrano case and the 'span codebase' example adding a table for example, so wouldn't both cases require repeating sql sequences?

Although maybe what you are saying is that you can make 'dumb' sql changes which break compatibility with past code vs 'smart' sql changes and code changes which are backwards compatible?

Hi, Would you be able to share some of the books and links on the internet you have been reading that describes how to scale web services?
- Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applications by Cal Henderson

- Scalable Internet Architectures (Developer's Library) by Theo Schlossnagle

http://philip.greenspun.com/seia/

... and a whole buncha links on google with "scaling", although "scaling mysql" seems especially popular