Hacker News new | ask | show | jobs
by anamax 6344 days ago
[I can't find an e-mail address in your profile or the blog that it mentions.]

I was going with 4x file size and assumed that bandwidth for images is linear in file size, aka no bulk discounts.

One of my points is that IOs have a cost too, a cost that is largely independent of file size.

To first approximation, disk IO capacity is proportional to the number of disks. (Yes, some disks, especially the flash ones, support a lot more IOs/sec than others.) If you're IOs bound, you either have spare disk space or can probably increase the amount of disk space at a sublinear price. (1.5TB drives are <3x as expensive as 500GB.)

For some data transfers, AWS has a "per operation" charge in addition to a bandwidth charge. The latter is proportional to file size but the former is not.

My application has a lot of processing costs. Most of them are on metadata, not the images, so they don't grow with image size. I also do a lot of stuff with "thumbed" images - producing them is a function of file size but storing them and moving them around isn't.

My model is different from yours in at least two ways.

(1) I'm estimating some things.

(2) I'm using AWS and GAE prices. (I'm assuming the highest prices because if my app is getting enough use that I'm getting bulk discounts, I've got other problems.)

If I had a model that tracked my actual experience, I wouldn't listen to some bozo on a website....

FWIW, data in a db is significantly larger than the actual data.

1 comments

OK Thanks. Couple comments:

Our bulk storage disk sees very little in the way of IOPS-requested. (An entire Gbps pipe couldn't fill the IO capacity of 4U of the 1TB SATA drives we use for bulk upload storage, and we have way more than that. ;) ) DB and hosts disks are another story entirely, and I have to admit that we don't really account for all those costs as "upload related" (by and large, they are not upload-driven) so we have some model inaccuracy there as well, but for every 2MB file we have on SATA, we probably have 20-40K in thumbnails on faster disk and well under 1K in fairly narrow, and not heavily indexed DB rows on 2 tables.

As for listening to a bozo on website...well, HN is by and large not bozo-filled, it was pretty clear you weren't one, had given some thought to this problem, and I'm more than ready to admit when I'm potentially able to learn from someone else something that might save me/my company money.

It does sound like your app has substantial sub-linear cost components, and I admit ours has some smaller ones as well but that we just don't model them tightly enough to see those components.

Thanks for the info, and I wish you the best in your endeavor(s).