Hacker News new | ask | show | jobs
by therealmarv 3982 days ago
I would be sold on Docker if that would be easy. I have e.g. this stack:

- 1 webserver/proxy, let's say nginx

- 1 simple Rest API server, let's say in flask

- 1 database, let's say PostgreSQL

and I want to connect all 3 things and I want to preserve logs for the whole time and preserve the state of the database (of course). Also not to forget make all bulletproof for the Internet.

And here all sorts of problems arise: What underlying OS, how to connect this containers, how to preserve state of my database and logs (it's not trivial as the article proofs again). So overall Docker makes life not easier on this simple use-case, it makes life (of the sysadmin) more complicated.

8 comments

Ultimately I'd say Cloud Foundry solves the problem but requires a lot of "support" VMs to make it work such that it might be overkill for your situation.

For example:

- What underlying OS? CF provides a minimal Ubuntu Linux "stemcell" and then has a standard "rootfs" for Linux containers

- a Python buildpack to assemble the container on top of this OS for your Flask server

- a built-in proxy/LB so you don't need one, if you want a static web server there's a static buildpack for Nginx

- an on demand MariaDB Galera cluster for your database if you want HA; PostgreSQL is there too but non-HA I think

- A standard environment variable based service marketplace & discovery system for connecting the containers to each other or to the database

- high availability (with load balancer awareness) for your containers at the container, VM or rack level

- reliable log aggregation of your containers (which you can divert to a syslog server).

As I said the only trouble is when you want to make this "bulletproof" is that there are a dozen "support VMs" are all there to make your app bulletproof and secure, e.g. an OAuth2 server, the load balancer, an etcd cluster, Consul cluster, and the log aggregator, etc. So it's overkill for one app, but good if you have several apps.

For single tenants and experimental apps, there's http://lattice.cf which runs on 3 or 4 VMs and is a subset of the above, but not what I'd call "production ready".

spoken like a pivotal employee... cf has almost zero support for data services, and suffers from nih at almost every layer of the stack from routing to mq, to one of the worst ux for installs ever (aka bosh), cf is a great example of commercial opensource primarily controlled (inspite of foundation) by one entity (pivotal/vmware) that figure out how to switch from monetizing virtualization to single processes. you ever try the ui on the opensource cf?.. oh there isn't one.
And where do you work? What does any of the above have to do with the OP's question?

1. Data services, not true. There's MariaDB, Cassandra, Neo4J, Mongo, Postgres, among others. Yes, they're in VMs, but recoverable/reschedule-able persistent volumes in container clusters are at best experimental features anywhere you look.

2. NIH, compared to what? CF reuses etcd, consul, monit, haproxy, nginx, etc. will use runC and appC as those get hammered out.

3. Lots of people love BOSH.

4. If you don't like all the decisions Full CF makes, this is why Lattice exists, it delegates config/install to Vagrant or Terraform (which have their own problems) so anyone can take the core runtime bits with Docker images and use them in new and interesting ways.

5. What container or cloud platform project isn't based on code contributed by one or two vendors? Realistically? None. The CF foundation at least is an honest attempt to give all the IP to a neutral entity (including the trademark soon), has several successful variants (mainline OSS, Pivotal, Bluemix, Helion, Stackato), and has customers and users joining the foundation, not just vendors.

- 1 webserver/proxy, let's say nginx

- 1 simple Rest API server, let's say in flask

Dokku - https://github.com/progrium/dokku

Can't really beat `git push deploy/uat`

- 1 database, let's say PostgreSQL

I just run PostgreSQL on the host and connect to it from the containers. Sure I could containerise PostgreSQL itself but I don't really see the point.

I then run my own Dokku plugin (dokku-graduate: https://github.com/glassechidna/dokku-graduate) for graduating my apps from UAT to production.

This is exactly the problem here, just run Postgres on the host means that you have a hybrid setup, some of your services dockerized the rest are not. This is not appealing to some people. There are other services mostly in the heavy disk IO space that is not easy to move to Docker. It might be worth to call these out in the documents and save some time to sysadmins figuring this out the hard way. If you want to dockerize your application like a REST api or a simple Java app, that works perfectly and the advantages are obviously there though.
I have zero problem with a hybrid setup. I'm not running Docker just for the sake of running Docker.

I'm running Docker (specifically Dokku) because it drastically simplifies deploying new builds, and graduating those builds between environments.

I know a large part of this article was that Docker complicates rather than simplifies the situation. I guess if you're trying to be a Docker purist (for no reason) then sure. The same is generally true if you try be a purist of any kind.

I think the deployment is the weakest point of the Docker ecosystem.

The reason why I am using Docker is the forced honesty on the environment side, if your app runs on your laptop it does not mean it will run on the production boxes. If the Docker container runs on your laptop it gives you higher confidence that it will run on the production infra. No missing JARs, environment variables, misconfigured classpaths, etc.

If the goal is to simplify deployment process, why not use something like Capistrano or Fabric? You can run 'Cap deploy <dev|prod>'
Fabric is a great tool at first, but the problem is that it's always procedural. Want a deploy script? Write it. No other way around that.

Something that's more declarative is definitely superior. Why? Because it will be shorter and easier to debug. I am not a fan of `git push` as a deployment strategy (because git is a version control tool, not a deployment tool), but it does force you to create and use a system that's by definition declarative. This is why I use dokku for my new projects.

Because, as mentioned, I already deploy in one line:

    git push deploy/uat
and I didn't have to write a single deployment script to achieve it.

Plus, by using Dokku I get the benefits of containerised apps.

What are the benefits of containerised apps in your case?
It's useful to "mount" the database to the app process via injected configuration.

Application and database servers are different animals. Not sure why a 'hybrid' approach would be surprising or unappealing.

While I do agree it's not as easy as it should be, connecting containers it's actually doable (https://docs.docker.com/userguide/dockerlinks/), while logs are still a big mess because there is no unified strategy.

Databases are also tricky to run in containers, because even those with the best replication strategies can afford losing nodes but at a high cost (like re-balancing nodes, etc), and containers still don't have the stability to provide an acceptable uptime that's worth the risk.

On a side note, since you mentioned nginx and RESTful APIs, I would check out Kong (https://github.com/Mashape/kong) which is built on top of nginx, and provides plugins to alleviate some of these problems (http://getkong.org/plugins/).

Kubernetes solves this by mounting external volumes (say NFS or iSCSI) on the host and then exposing them to one or more docker containers. This seems like a pretty ideal solution for any Docker user.
Are you suggesting that a person with a total of 3 nodes use a system like Kubernetes which requires at a minimum (correct me if I'm wrong) 5 nodes just to function? If you really, really want to use Docker with a typical Nginx-App-DB setup just whip up the necessary shell commands to start/stop/log containers and throw that in Ansible or the like.

edit: I guess you can cram all of the various Kubernetes master/etcd servers on a single node but whoops there goes reliability.

Crap like that actually happens. I have a mathematician friend who does a bit of programming ask me about Docker, as someone had told her to use it. She works as a researcher in academia, so probably only needs to run her script a few times to get results. Why the hell would you recommend Docker?
> probably only needs to run her script a few times to get results.

Agreed that she doesn't need to use Docker. But if she is writing a paper on those results, she might want a way to reproduce her findings years down the road (even after she switched Distros), or to collaborate with others who want to reproduce/build on her research (and may not be running her distro).

It's easy to think "oh, this script just requires python 2.7", but most of the time you actually have many more dependencies than that (libxml, graphviz, latex, eggs, etc.) A Dockerfile requires some work to setup, but it tracks your requirements in an automated way.

So I'm not going to say "all researchers should use Docker". But I will say "Docker could be useful to some researchers". Just like Source Control, it's a tool that solves real problems. Source Control has gotten easy enough to use that it's recommended everywhere. Docker (or some other container standard) will get there eventually.

For research apps docker would be a godsend. Resea ch software is of the "install exactly this version of x,y,z,r,g and h" and then apply these patches....

Docker is really good for dev environments. I've had a relatively painless time dockerizing snapshots of old internal web apps so I can hack on them without installing things into my main desktop environment. It lets me have lots of server things side by side.

How did you come up with that number of nodes?
Sorry I was wrong. I assumed based on Kubernetes' use of etcd it would be 3+ nodes. It turns out Kubernetes master is a single node currently which means they haven't built high availability into the master at all... which is a pretty scary way to run a thing that manages your entire infrastructure. There's already a few topics on the mailing list about etcd losing its data and Kubernetes doesn't know how to recover. Yuck.
The Kubernetes master does support high availability.

https://github.com/GoogleCloudPlatform/kubernetes/blob/relea...

Well, yes, because for one thing, Kubernetes (if done correctly) provides HA/failover that did not previously exist.
Is it really more likely to believe that someone who is not otherwise in the PaaS business is going to find it easier to run 5+ nodes of other services instead of, say, two nodes running their application directly for failover/HA?

This is not to say that Kubernetes is bad but … it's a commitment which isn't appropriate for everyone. If you aren't exercising its abilities heavily, that's probably going to be a distraction from more pressing work unless you're scaling up heavily right now.

thanks I will definitely look more into Kubernetes.
At my work we use HAProxy, gunicorn/Flask (web apps and APIs) and a PostgreSQL container for development - we aren't planning to migrate the production databases to Docker anytime soon). We are using CoreOS for the host servers, dumping logs to logstash, and connect the containers via HAProxy. The big advantages we have seen are that it's easier to ensure consistency between development/qa/production, our deployment process is cleaner (basically docker pull container:latest, docker rm old-container, and docker create container:latest), and it's easier to resolve the few issues we have had (usually just a deploy vs tinkering on a live server). Our attitude has been to only use Docker for the parts of our stack that make sense (web apps and not production databases). It's been 8 months since we migrated and we haven't had any trouble yet.
That's all really easy with Docker Compose. Preserving state is just a matter of mounting a volume.

https://docs.docker.com/compose/ https://docs.docker.com/reference/run/#logging-drivers-log-d...

docker-compose comes to save the day when it comes to how to connect containers. Your Dockerfile will specify which underlying OS is used. Preserve state of your database with data volumes.
Too bad they bundle a version of OpenSSL with known security vulnerabilities that hasn't been fixed in the month since it's been brought to their attention.
docker-compose is no magic, it only maps a YAML file to docker's command arguments. While I think docker-compose is useful in some cases, I strongly advise to not use it at first so you understand how docker actually works.

Once you understand how docker works, using the YAML file can become useful to lighten your load.

agreed, I used a bash script based on glowmachine github repo[1], but switching to docker-compose made everything much easier - as long as you have the knowledge of the docker cli.

[1] https://github.com/glowdigitalmedia/glowmachine-docker/blob/...

It doesn't connect containers across nodes though, so it's use in realistic production scenarios is limited.
From what I read that's where Docker Swarm comes in. Could be wrong though.
The article makes a lot of good points, but the stack you're speaking of would be pretty trivial to get going...particular if you did it on a single host setup. You could probably have it spun up in an afternoon with Ansible or some such.

Multi-Host is moderately more difficult. A full orchestration and resource scheduling stack that scales with load even more so.

But you have to ask what your needs are if you're being realistic.