Hacker News new | ask | show | jobs
by wasd 3928 days ago
I'm not sure if this is great idea. Use of reverse proxies has lots of benefits. You can manage load balancing, SSL Termination, serving static content (much faster), caching, compression, centralized logging, and using different applications on the same ur space (foo.com/app1, foo.com/app2). They also have the added benefit of another layer of security (look for and prevent various HTTP exploits and prevent them from getting to the web server). I'm not saying you can't do this in node but nginx/apache are really good at what they do.

EDIT:

I've read through your comments and you seem like a pretty experienced developer. I don't think any of the information in my comment should be news to you so I'm curious to hear more about why you feel this way.

7 comments

I use nginx at Neocities to serve all our static sites, and as a proxy for our front site.

I like nginx, but it has way too much of a sacred cow treatment by the dev community. It has plenty of problems, the configuration is a psuedo-language that doesn't always make the right choices and is difficult to heavily customize, and I've gotten to it be -very- unstable under certain circumstances, including really bread-and-butter things like SSL caching. If there's a bug, you'll have a good old time debugging it's massive collection of C code. It's great, but it's not perfect.

Making nginx do custom things that you'll probably need to do in a serious environment (example: dynamically programmable SSL SNI) requires craxy mods and hacks that have only recently been made available (by third parties) and heavily reduce nginx's performance. Further, they only provide purgable proxy caching via their commercial version, which costs an exorbitant amount of money. The free purger, naturally, makes nginx lock up. I wouldn't mind chipping in a bit for nginx because I want to support their team any way, but at their current prices ($100/node/month or something like that) we simply can't afford it.

I realize this is not a popular opinion right now, but node.js is completely up to the task of running a reverse http proxy. They are basically (you likely won't notice the difference unless you're running the New York Times) competitive with nginx for performance, and as a tradeoff for an unnoticable slowdown you get a full, turing complete programming language to completely control the flow of your data. Nginx under the hood is just a reactor pattern with children that share a socket. Node.js has a cluster module that uses the exact same strategy. Mind you this is from someone that has done talks critical of reactor pattern scaling.

Also, if you have blocking I/O apps, it doesn't matter what you configure nginx to do, it's still going to lock up when someone DDoSes it with slow loris connections. Make your ruby app thread safe and use Rainbows! instead of Unicorn, or you're going to have a bad time.

competitive with nginx for performance, and as a tradeoff for an unnoticable slowdown you get a full, turing complete programming language to completely control the flow of your data.

It's almost like erlang and mochiweb never existed but people sure are willing to re-create it all in javascript.

JavaScript: Spending the past 20 years catching up with 1990s-level technology.

> Making nginx do custom things that you'll probably need to do in a serious environment (example: dynamically programmable SSL SNI) requires craxy mods and hacks

> you get a full, turing complete programming language to completely control the flow of your data

Did you try nginx's lua support ? Because it doesn't seem to be that experimental and has its fair share of documentation already, on top of being much more performant than Javascript:

https://blog.cloudflare.com/tag/lua/

http://openresty.org/

One of the things I don't understand about nginx is why a HTTP daemon still contains a mail proxy today!
So when are you going to release "node-ginx"? :)
You're on to me. ;)

There is node-http-proxy available (https://github.com/nodejitsu/node-http-proxy), which also has some plugins available to do some of the advanced features nginx supports.

I'll likely be writing a custom proxy server tailored to our needs such that it probably won't be useful as a general purpose proxy server, but if you're looking for something, that's a start. Making it more general purpose unfortunately would require more work, and I'm pretty time stretched right now.

I'm not saying it's better than nginx, of course. I'm just saying that if you need to do some crazy programming that can't be done with nginx, you're free to use something else. Don't be fearful of treading your own path, just make sure you know well how HTTP works before doing it.

Here's a stupid example I whipped up quickly for a reverse proxy for our IPFS nodes that demonstrates how quickly you can put together a custom reverse proxy to do something weird: https://github.com/neocities/hshca-proxy/blob/master/app.js. That flaming piece of junk hasn't crashed once since I deployed it.

For that matter godaddy's website builder now "publishes" to a cassandra cluster that is served via a cluster of node servers with local redis as a local in-memory cache... it works really well. The distribution model is working much better than the previous publishing via ftp to a dedicated backend linux host (apache). I haven't been there for about a year now, but I'm pretty sure a lot of those aspects have proven out.
I don't know about the OP's reasons, but one of mine lately has simply been dependency management and simplicity of deployment. It's just handy to be able to package it all together, especially for packaged software where deploying another component would be more configuration. I suppose this is why docker containers (or in the past, virtual appliances/VMs) have become so much more popular.

The good embeddable web servers are usually pretty lightweight, scalable and can be programmatically configured. Things like Jetty are popular, but look at languages like Go that have HTTP serving built in via libraries and scale nicely via coroutines.

Vert.x etc. are cool for performance reasons, being lightweight and usually much less thread hungry (using async operations, sometimes in many less threads).

That said, I do agree that reverse proxies are still really useful for all the reasons you mentioned. Reverse proxies on top of some of these high performing embedded HTTP serving engines is a good practice, when you need it.

And there is no need to throw out the tried and true engines, like Apache, Nginx, etc.

Just depends on the use case and needs I suppose.

To play devil's advocate (not that OP is an devil), you want a CDN serving up static assets anyway, and maybe take care of SSL termination and security depending on your sensitivity needs; haproxy is great for load balancing and centralized logging; vulcand to handle reverse-proxying; and at that point, all you're left with is compression, which a reasonable web server should be able handle. Now you've got a suite of specialized tools that will do their jobs well, and you probably have most of them in your stack anyway.

Granted, it's more complexity, but nginx certainly isn't the must-have that it used to be.

HAProxy can do the SSL termination, reverse proxying, and compression jobs quite well by itself. Though vulcand's etcd-based runtime configuration looks friendlier than HAProxy's.
Firstly, I'm flattered that I sound like not a n00b :)

I should say that I haven't ever really tried this at production scale. My background is mostly consulting wherein my responsibility is to deliver a provably working solution for someone else to manage and operate. So there's my bias in this.

Other commenters have made a lot of the points I would. You can easily handle TLS in Java or JavaScript. Or you can terminate with an ELB as I usually do. A lot of load can be pushed to a CDN.

But really, I'm not convinced it would be that much slower. I know this dated, but a simple apache bench test shows Tomcat outperforming httpd for static assets [1]. I've never had a site that was remotely bottlenecked by static assets, but I've had many bugs due to obtuse mod_rewrite configs. It's cheaper to have to fewer bugs than to spin another server.

[1] http://www.devshed.com/c/a/BrainDump/Tomcat-Benchmark-Proced...

> load balancing

You'd rather use a dedicated load balancer like Route53 or haproxy. Don't think choosing Apache or Nginx is the right option for those really. Plus something like vert.x is very usable as a load balancer already.

>SSL Termination

Just about everything handles this already. Current best practice is to use SSL for all communication between your own servers anyway, so there's no gain. If you SSL terminate on your load balancer, these days you want to use a new SSL connection between your load balancer and application server anyway if possible.

> serving static content, caching, compression

New app servers like vert.x support Linux sendfile and handle very well for serving static content etc. Currently, nearly everyone uses Cloudflare to handle all of this anyway. No real reason to duplicate it if Cloudflare is set to handle it.

> centralized logging

Centralized logging is usually done by sending all of your logs off from each service/server to be aggregated on a dedicated box running Logstash or whatever. You don't use your reverse proxy for this?

>using different applications on the same ur space

From using the web, I don't think this is done anymore. In fact it seems to be the opposite - foo1.app.com, foo2.app.com seems to be the trend. Basically the opposite of multiple apps on 1 domain because of the big move towards microservices. Extra domains are the cheapest thing there is.

> added benefit of another layer of security

Security doesn't work that way in my experience. It's more about minimizing attack surface. If you use node and nginx and apache then any exploit that hits any of those 3 will hit you. If you only use node, then you can only get hit by exploits on node. So I'd argue it's the opposite. The more layers, the less secure.

> nginx/apache are really good at what they do

Sure, but you need to find the most efficient tool to handle your needs with the least amount of complexity. Only add something if it solves an issue that you can't solve in a simpler way just as well.

> Current best practice is to use SSL for all communication between your own servers anyway, so there's no gain.

I've never heard this. Who does this? Everyone I know of either just naively relies on the privacy of NAT-style non-routability in something like an AWS VPC, or, if they're more paranoid (or their provider has no private-networking feature, or they need HIPAA compliance, or whatever), uses IPSec—which is, happily, exactly the proper use-case for IPSec.

(Unless what you actually mean is requests between separately-maintained microservices that are supposed to treat one-another as if they were produced by third parties, like AWS strives to do. But the "your machines" makes me doubt that; you wouldn't think of those other machines as "yours" in that case.)

> From using the web, I don't think this is done anymore. In fact it seems to be the opposite - foo1.app.com, foo2.app.com seems to be the trend.

Green-field applications, no matter the size, are usually deployed to separate subdomains, yes. For long-term maintenance, though, nothing beats being able to just mount a new backend (probably written in a different language, even) on top of your legacy app's /admin/ or whatever else. It's effectively about patching a resource space with new backends to handle parts of it, without having to touch the legacy code to get it proxying to the new server. Businesses that embrace the "cool URLs don't change" philosophy—for example, newspapers who want their heavily-linked-to story pages accessible forever—take this approach all the time. Their web servers are rats' nests of routing rules to different backends, to make everything seem, from outward appearance, to be the same as it always was, even when everything is now in the CMS-of-the-week.

The other place this happens is API servers—you might want /1.1/ and /2.0/, or even /feeds and /emotes, going to different clusters. (If you're doing that in the path instead of using content negotiation.) That kind of business-policy-level routing is not the rightful domain of a load balancer, even if haproxy et al can be configured to do it; you want your load balancers to be dumb stateless infrastructure components, and your web servers to be maintained and configured and updated as part of the service you're deploying.

> If you use node and nginx and apache then any exploit that hits any of those 3 will hit you.

There are a bunch of clever things that genuine, battle-tested "web servers" do that "application web servers" don't. Preventing Slowloris attacks, for example—it's something every HTTP server would do in an ideal world, but since it complicates the code and prevents streaming parsing, you really only want to handle it once (by buffering requests) at the input end. There are umpteen other such attacks that web servers just abstract away. Even with, say, Erlang's "battle-tested" reputation, I wouldn't trust it sitting on the open web without nginx or something else in front.

Usually, though, your load balancer is also a "web server"... if SSL has been terminated there so that it can actually parse the requests and responses. This tends to be why some people actually chain haproxy -> nginx -> their app server: they put haproxy in dumb TCP load-balancing mode, while nginx terminates SSL and thus gets to be the "web server."

Regarding encryption inside your network, it's a new trend since the NSA business has been going on. Google famously decrypted at the edge and the NSA demanded (and received) a backdoor into their network to ba able to see the unencrypted internal traffic. As a result, Google now encrypts at every level.

http://techcrunch.com/2014/03/20/gmail-traffic-between-googl...

I've never heard this. Who does this?

For various compliance reasons, customer data must never be transmitted in plain text even internally. I've seen point-to-point serial links even require encryption when they come under compliance scope.

He's just saying that he prefers to configure his reverse proxy in Javascript, or some other mainstream language than in the Nginx configuration language.

I can see where he's coming from. But I still slightly disagree with the feeling.

Even gods make mistakes