Hacker News new | ask | show | jobs
by ComputerGuru 3926 days ago
In addition to the good point raised by @jcreedon regarding their single datacenter (which I think is a bit of a bigger deal than he does, primarily because I don't think it scales linearly per-GB for the first few datacenters, though it might thereafter), I'm more concerned about the bandwidth.

There's no talk about their backbone or their network capacity. I get that they have terabytes of upload coming in, but as anyone who's used their software can tell you, it's throttled. I don't know how many users they have to tell you how much bandwidth they're actually handling, but can they handle people using B2 as a distribution point for large files for customers? For example, I have a huge S3/CF monthly bill from customers downloading ~400MiB ISO images tens thousands of times a month. Amazon CloudFront is ~$0.085/GB for the first TB, while BackBlaze B2 is an incredible $0.05/GB - but at what performance? Will my technical support representatives be getting angry phone calls about halting download speeds or do they have the capacity for something like this?

Hosting the world's data is no tiny task, I hope they're ready for it and I do, truly, wish them all the luck. I've been a BackBlaze customer for a few years now (at least 5 or 6, I imagine) as a tertiary or quaternary backup (haven't had to restore... yet), and B2 looks and sounds promising, but as far as technical details go, this post is nothing.

EDIT: In response to the reply below, I believe it's throttled by default in the client, though that can be turned off in the application settings. Also, you've replied to my claims of throttling but have ignored my question regarding backbone capacity and network readiness...

3 comments

Sorry about skipping your network capacity question. I just got over-excited about throttling. :-)

We currently have about 100 Gbps symmetric capacity into our datacenter on a couple of redundant providers, but the key is we have open overhead and we'll purchase more as our customers need it.

But here is the best part (if you want OUTBOUND capacity) - our current product fills the INBOUND internet connection, but currently we only use a tiny, tiny fraction of the OUTBOUND connection. So if you want to serve files out of our datacenter we have a metric ton of unused bandwidth we would LOVE you to use. And if you fill it up, we promise to purchase more.

But also keep in mind, Backblaze is very experienced with STORAGE and I have a lot of confidence we won't lose any of your files. What we don't have a huge amount of experience with yet is serving up viral videos and such. So just bear with us during this beta period while we figure it all out. Personally I'm looking forward to that part (all the CDN/caching layers).

>But here is the best part (if you want OUTBOUND capacity) - our current product fills the INBOUND internet connection, but currently we only use a tiny, tiny fraction of the OUTBOUND connection. So if you want to serve files out of our datacenter we have a metric ton of unused bandwidth we would LOVE you to use. And if you fill it up, we promise to purchase more.

:) well, yeah -- but thats also what B2 charges for.... so The business model requires that BW to start getting consumed :-)

That's the first thing I noticed: This is awesome if most of your data is never touched. The moment you want to serve it up a lot, of course they promise to purchase more: Their bandwidth prices are as outrageously high as Amazon's.

If you serve up viral videos etc. and start eating a ton of bandwidth, even a "do it yourself" CDN out of VPS's could quickly save you a fortune...

But if your inbound capacity is pretty full these days, how can you manage to onboard _large_ new clients at this point? Can you scale your inbound bandwith as fast (and at the same cost) as adding a new vault a month?
Our inbound is not completely full, and we always try to have extra capacity/headroom for new customers. But if you plan to upload more than 5 petabytes at a rate of faster than 15 Gbps sustained, you probably want to contact us ahead of time to let us know it's coming and we'll increase our capacity for you. We can absorb anything less and it won't cause us any issues.

As somebody else mentioned, since we're in a commercial datacenter with a bunch of network providers already serving us, it's pretty easy to dial up our capacity as we need it.

>we're in a commercial datacenter with a bunch of network providers already serving us, it's pretty easy to dial up our capacity as we need it.

Whats the lead time in your case?

In my historic experience doing this regardless of if I was even in MAE-West... cross connects and provisioning were eons in internet time...

I'd estimate a week? That's probably what you meant by "eons". :-)

It could go faster, but if we need to buy a new (expensive) network switch that can take a few days to arrive. And as you mention, the datacenter guys are happiest if you give them 3 - 4 days and a work order to do the cross connect.

Building out more vaults (the blocks of 20 storage pods we store data in) is usually about the same if we rush it, but we have a big (multi-petabyte) buffer spinning ready to accept data at anytime. We have a regularly scheduled delivery of pods once per month based on projections, but we have been known to tell our provider to go ahead and build three months worth of pod chassis (everything except for the drives) immediately and ship them to us. We supply the hard drives, so that either comes from our own stashes or we quickly order some more from various sources.

It's a pretty different world now, both price and speed-wise.
Not to speak for brian, but as someone who used to do physical datacenter operations, most facilities have a bunch of fiber already provisioned (in the ground). Its just a matter of getting the networking gear and provider provisioned. Turnup can be done as quickly as 24-72 hours, depending on the provider and the dollar amount involved.
Thanks for your reply :)
> as anyone who's used their software can tell you, it's throttled

Brian from Backblaze here: no it is not throttled (by us). If you only have a 10 Mbit/sec upload capacity you are throttled by your ISP. Also make sure you visit our "Performance" tab in the online backup client and tweak a few settings, like increase the number of threads.

I have 100/100 up and down, and I barely push any more than 3MB/s when uploading. and in that time the client is easting all cores alive. I appreciate that it may not be the ISP. but the client does seem to end up being a major bottle neck.

I moved to Linux a few months back, and was going to basically cancel my Backblaze sub when I got around to it since you have no interset in making a Linux client. Maybe B2 can act as a solution to this at a price penalty.

Or a price savings! If you do the math, I think the break even is at 1 TByte. If you only need to backup 500 GBytes from your Linux server then you'll save 50%.
Not server, desktop, Hence my annoyance you don't have a client for it.

I can understand your biz reasons for not having one though.

Now with B2 we immediately support Linux and provide a client for it out of the box (written in python). Granted, it is only a command line interface so give us a little time to polish it up and add some features.
Another Linux user here who would love to use a command line Python tool to backup my data to Backblaze: with Python I can see how my stuff is encrypted. That's the only reason I'm not using Backblaze right now: closed source client.

My only interest in B2 is backing up for a lower cost than the ridiculousness of S3: at $0.022/GB, I might as well buy a 3TB hard drive myself, put it at a friend's and push my data there. Every month. At the end of the year, I'd have 36TB in hard drive capacity if I bought drives instead of paying for 3TB of S3 storage.

(All numbers are estimates and "roughly"s. Also I don't have external backups now because I'm too lazy to write the software myself, so there is something to say for paying instead of not having it.)

Get me access to the private beta and you'll have a ruby gem very quickly. kyle@kyledrake.net
On my 100/100 fiber in Denmark I have seen back blaze speeds up to 65 megabits a sec.

It is faster transferring big files rather than many small files.

When I was a customer of BB I noticed no issue with the uploads, but actually when I had a flood and my hardware was destroyed, redownloading all my information was order of magnitudes slower.

I tried from multiple physical locations but I could not increase my downloads past 1-2mbps, and for TB of data, that seemed like it was throttled by BB considering I was easily uploading 20mbps.

I contacted BB support and they ignored me, so I switched to a competing services and have had no issues ever since.

This really made me sad because BB's blog is amazing and their tech is really cool, but when you see people saying "its throttled" its because of real experiences out there, and not just ones limited to an ISP issue.

Brian from Backblaze here. I wonder if that was during the incredibly annoying "Comcast goes to war with Netflix" era that Backblaze got caught up in. That was Nov 2013 through Feb 2014, you can read a little about it here: https://www.backblaze.com/blog/obama-backs-net-neutrality/ (scroll down for our graphs showing our customers getting throttled). That seriously sucked for Backblaze.

But either way, we added threading to the bzdownloader (our custom application to download large restores) and if you tried it today crank it up to 10 threads and I swear you'll be happy with the download performance.

Brian, thank you for responding, I appreciate your clarity and honesty.

My issue did occur during that period, and I am impressed you can call that out from memory, it must have been a frustrating time for BB.

If that alone was the problem, you would have just 100% won back a customer, but the thing that irked me the most was the customer support response.

I know you do not work for your helpdesk, but their response was more the reason I left, their apparent lack of concern was what turned one of any service provider malfunctions into a dissatisfied customer looking for a competitor.

I can laugh about that period now, but yeah, it was a bad few months. We bled out good customers like yourself and we felt helpless. My basic faith in the internet was shaken up - I always thought I would send packets and they would be delivered quickly, and here are these HUGE players in the space messing with each other throttling each other and changing routing to get around throttling (and hurting Backblaze as collateral damage).

> do not work for your helpdesk

It's unfortunate when a customer gets a bad experience. The helpdesk guys are faced with this monumental task of responding to tons and tons of basic questions by Mom & Pop customers that are not computer professionals. Then mixed in are competent programmers and IT guys that know what the heck they are talking about. The helpdesk guys sometimes get it wrong who they are dealing with and it infuriates the competent computer users.

I think we should issue "professional computer user" cards where you can get a different level of support from all these companies. If you were helpful on forums you could earn points for your card, but if you ask helpdesk too many dumb questions your card could be revoked and you would go back to the first tier support. :-)

Is there anywhere I can traceroute to / test upload & download speed? I'm on a 10/1 connection, and I can only use about .5 of that 1 before my connection is completely tanked. (Thanks, Australian Governments). If I could do a trickle upload and write some good scheduling, I'm definitely moving to BB2 - from Glacier.
I think it's also important to point out this is cheaper then AWS Glacier. You could think of B2 as pure backup for now, and after tracking metrics expanded it out to more products. I doubt even Backblaze would suggest you make this your primary, mission-critical storage, hence the Beta title. But even so, there are plenty of non-main use cases. Especially at this price.
I think that's well put.

I'm bothered by the whole idea of putting all my data with any one vendor (with Backblaze or Amazon) and thinking you don't need a backup. I claim "RAID / Reed-Solomon / real time mirrored copies" is NOT "Backup". If your programmer makes a mistake and a line of code deletes some mission critical data from Amazon S3, then all the Reed-Solomon encoding in the world doesn't help you, the data is still gone.

What you need is a copy of all your data from Amazon S3 in another vendor lagging behind for 24 hours that is NOT real time mirrored. Maybe you lose all the customer data generated that day, but your business survives by restoring from backup. (I chose 24 hours arbitrarily, each business needs to choose their upper limit of loss where they can survive.)

A good rule of thumb for a CONSUMER is three copies of your data: 1) primary, 2) onsite backup, and 3) offsite backup. If you are a business that will lose millions of dollars if a programmer makes a mistake or an IT guy is disgruntled, add 4) another offsite backup with a totally different vendor that doesn't share a single line of code with 1-3 and has separate passwords.

> If your programmer makes a mistake and a line of code deletes some mission critical data from Amazon S3, then all the Reed-Solomon encoding in the world doesn't help you, the data is still gone.

I'm surprised at the implication here, that you'd use Glacier on a non-versioned bucket. Making destructive updates impossible doesn't cost much extra in archive fees.

Ok, so let's say you forget to pay your Glacier bill because the IT guy left and the credit card changed and the alert emails go nowhere. Bye-Bye-Glacier! No payment, no customer data, Amazon might delete your data due to a tiny administrative screwup.

My point stands: if you don't mind losing your data, store it in one vendor. But if you would REALLY lose your business and put 10 people out of work if the data is lost, storing it in Amazon (or Backblaze) without a second copy backed up somewhere else and a third copy backed up in yet a third location (with a totally different vendor with a totally different payment system) is irresponsible.

But... let's say you have three accounts with three storage companies. Now let's say you outsource the management of those accounts through one company... or even, equivalently, delegate it to a subsidiary or partner of your company. And then you accidentally stop paying them, or they can't requisition the budget necessary to pay the providers, or whatever. Now you're still stuck, even though you're nominally doing things "in-house."

What you actually need is a provider that will guarantee the durability of your data even if they (temporarily) cut off your access to it for lack of payment[1]. Anything else is just a level of indirection that suffers the same problems.

---

[1] I don't actually know if anyone does this, let alone AWS. Here's a quote from Tarsnap's FAQ—where you'd think cperciva is someone who would have considered the "I had no idea my infrastructure was relying on this service until it shut off" use-case:

> You will be sent an email when your account balance falls below 7 days worth of storage costs warning you that you should probably add more money to your account soon. If your account balance falls below zero, you will lose access to Tarsnap, an email will be sent to inform you of this, and a 7 day countdown will start; if your account balance is still below zero after 7 days, it may be deleted (along with any data you have stored) at our discretion. (If you can't add money yet but will be able to later, contact us and explain the situation. We're reasonable people and simply knowing that you're alive and haven't forgotten that you were using Tarsnap is very helpful.)

7 days is probably reasonable in the case where there's an active IT staff who will notice when, say, servers stop backing up. But if nobody's watching for that...

7 days is probably reasonable in the case where there's an active IT staff who will notice when, say, servers stop backing up. But if nobody's watching for that...

Do you have a solution in mind for the case of a company where email is going to /dev/null and nobody is reading the output of their cron jobs?

I mean, if I can't contact someone, it doesn't really matter if I wait a week or a month...