Hacker News new | ask | show | jobs
by necro 4645 days ago
I wanted to throw my experience into the ring because there seems to be such a fear of colocation. We knew nothing about colocation and decided to build some supermicro servers ourselves and install them and a switch in a colo 4 years ago. I read the all stories "i had to get up in the middle of the night to drive to the colo. it was the worst move ever to colo", and they are total bull. Even the biggest noob can setup things so it's totally remote. Servers have a dedicated ipmi port ( remote console over ethernet ) that will make it as if you're sitting at the server remotely. You can even mount a cd/image on your laptop, remotely so the hardware thinks that cd is in that machine. Hell, I can reinstall the bios on the server remotely, OS, everything. Why on EARTH would you have to drive to your colo? You can get servers that have 4 ether ports that you can bond in pairs to different switches. You can have hardware raid so loosing 2 drives in a server is no big deal, and you can take care of it at a later time. We have drives fail sometimes, but things keep on ticking. With the costs you save you can have triple redundancy if you like, and the benefit of consistent latency and better performance always. We have 250TB of storage and its double redundancy and also a remote backup. It cost ONCE what we would have to pay for a few months on the cheapest storage service.

We run straight kvm virtualization on our own hosts for flexibility. We run dbs on bare metal. I hear all these stories of people vms "crashing" all the time but i can tell you we have only had 1 instance of a vm, or in this case host dying in 4 years. Happened to be one of the video conversion hosts that is pinned 24/7 and it turned out it just hit some un recoverable memory hardware error. No big deal there were others.

Flexibility? We can clone and spin up VMs at will. We can live migrate and upgrade hosts. We can automate things with virtlib to our hearts desire.

Costs? $1500/month for direct equinix colo ( includes power, full rack, and gigabit connection from tier1 provider ) Never had a power issue, never had a network issue. We also use a CDN for static stuff and thats extra. We started with 3 servers, now are at a dozen, and adding a new one does not add a new monthly expense.

You can have a E3-1240 V2 @ 3.40GHz server built for $1500 and that as a host can run most of our front end stack. Sure we have 6 of those for backend crap, redundancy, but we actually run most of our stack on 1 of them. Mostly we do that for shits and giggles, but also because the interaction between the www, redis, mcd, zeromq is a few ms faster when it does not go over physical net. So if you over optimize like us, and want 30ms page gen times, you can nerd out like that.

s6 CPU: 8 MEM: 32080MB total running CPU: 16 MEM: 16384MB r-fp1 running CPU: 2 MEM: 1024MB r-mcd2 running CPU: 2 MEM: 1024MB r-www2 running CPU: 2 MEM: 4096MB r-www3 running CPU: 2 MEM: 4096MB r-red1 running CPU: 2 MEM: 2048MB r-red2 running CPU: 2 MEM: 2048MB r-zmq running CPU: 2 MEM: 1024MB r-zmq2 running CPU: 2 MEM: 1024MB

front end proxy, www front ends, redis, zeromq, memcached, etc. Excluding mysql db which is on bare metal. This serves our site that handles about 200 page views per second peak day, and that is at 25% host utilization. Our pages generate ( no caching ), including redis, zmq, and maybe 25 db mysql calls per page in about 30ms. You can optimize things too like...you know that the default config on a server will kick down the cpu to 1.6Ghz if its not really loaded, and that means page gen times in our case would be 15ms slower. Hell, we dont have to try to save power, so we can kick that sucker to 3.4Ghz all the time and make sure users get the benefit of that. Nice to be in control of the host.

We never needed remote hands or anything like that, but that is available a phone call away. I visit the colo in San Jose once a year and I schedule it with my motocycle trip down there. Sometimes I just dust the servers off, pet them a little and look at the pretty lights.

Of course ec2 has it's use. If your html traffic spikes higher than 1 gbps, then it's nice to have the flexibility of a fatter distributed pipe. If you want to optimize for rtt then it's nice to be able to spin up in a different geographical areal.

I think what bugs me the most is that a lot of companies use the argument of, if you get high traffic, like slashdotted or hackernews you can spin up a 100 front ends easily and handle it. We've been on the top of hackernews and the change in traffic was in the noise floor as compared to 200 r/s we normally handle. The point I'm trying to make is that if you engineer your app better, and understand and fix issues with generating your pages faster, you wont need the fancy scale to 100 front ends bullshit. ( tip. it's probably your database queries anyways so optimize that. it's not the print/echo statement that is outputting html on the front end ) Of course some do require webscale and it's a good way to go with ec2 and all the extra costs and engineering, but it seems that every joe blow and his blog or app seems to think they need so spin up to 100 front ends.

Sorry for the rant. I actually think that ec2 and the likes are the future and as tech gets better and prices get better I can see it making sense for more and more. I just wanted to give a contrast with our current setup.

2 comments

Thanks for the info. Do you have a plan for hardware upgrades? I guess hardware from 4 years ago can serve plenty of websites for a long time. But eventually an upgrade will make sense. Will you upgrade entire servers, just a few disks, RAM, etc.?

How hard is the KVM virtualization to set up? That also seems like a fairly big task, or at least a specialized one.

When we started we bought these boards http://www.supermicro.com/products/motherboard/QPI/5500/X8DT... and at first we outfitted them with one low end CPU and small amount of ram 6G as those were our needs and that's what we could afford at the time. 2 years ago we upgraded those machines with dual 5560 cpus, and 48Gram for not very much money, and in fact they run our production DBs right now. They still are very competitive if you stack them up to current e5 models. We added more servers last 2 years and they have been E3-1240 V2 based single cpu, 32G ram. You can't beat the price/performance there. So in 4 years we still have not obsoleted much but some older ram and base cpus.

KVM is really easy to setup. Install the package on your linux distro, start up virt-manager if you want gui, "start" new machine and install whatever you want from any cd image you have. Of course once you wrap your head around it you'll want to do it with cli tools and custom automate it. But basic virt-manager might take you a long way. Once you have multiple machines and you want to migrate between hosts you'll have to setup a shared storage. That can be as easy as an nfs share/mount. We started with just 1 ssd for that, but then built a dedicated box with many intel ssds on hardware raid 10. Never had an issue. But shared storage/live migration is not always needed and can add more risk. If you engineer it that all your hosts are independent and you have redundant services for everything, then you dont need to live migrate. If you need to free up that host, just turn it off, as you have redundant services running on other hosts.

In fact on our Dev systems we run KVM on our osx laptops nested in a vmware vm. ( vmware can nest like this passing hardware flags to the guest host ) So on osx you run vmware, which runs a linux vm, then that vm is used as a kvm host to run other vms via kvm. This way we can run exactly 100% the same image locally as is in production.

In fact if you really want to do some crazy plumbing... the VM host on my laptop has a VPN link to our DC, this puts it on the same internal network as our DC production hardware. I can then live migrate a production VM ( like a web front end ), onto my laptop, while it's fully operational doing processing for the production environment. On my laptop it will still be, via VPN, receiving and processing live web requests on our website, and properly sending back data to the proxy and user. Not very performant, but the flexible plumbing is nice if you want to test/debug a clone of the exact production system locally.

$1500/month for Equinix + Bandwidth? We got quotes from them before and rack was low, but I wasn't finding super cheap bandwidth like that. Did you go with Cogent or similar?
We actually went via Bandcon, which was then bought by highwinds. BW is around the going price $2.5/Gbps and it seemed to be level3 at the beginning and now it's seems more of a mix. ( I should specify that we have a gbps port but we only use about 100 mbps as it's only the html we serve from there ) We use another 2 Gbps of traffic via CDN for all the static/video content but that is of course a different cost ) But it's nice when the CDN ingest point is in the same physical DC as we are.

I just looked what 250TB would cost us on s3, $20k/month, or $240k/year. ( im not even counting the put/get usage )

You can build it, for ease of math, 100x 3TB seagate constellation. 100x $250 = $25k, another $5k easily covers a 45 jbod and raid card and server with ssd zil and arc for zfs and you're done. so $30k. Get 2 more for redundancy and backup as you see fit.

So over 3 years, 720k vs apples to apples 90k ( if you got 3 of those servers) so you save say $600k. You can get a decent remote dev for $200k/year for that time.

if you know how to negotiate, even tier 1 transit providers like Level3 can be had at around $500/month commited throughput for a burst-to-gigabit fiber link. $1k for the power and half cab is also within reason.

won't be redundant power or connectivity though - but it's not cost prohibitive - just double the price.

people - this is what amazon makes their margins on. doing this work for you, so you can click a button in your underwear. if it makes sense for you, it makes sense - but if it doesn't - it doesn't.