|
|
|
|
|
by sokoloff
6344 days ago
|
|
I realize that 4x increase in pixels doesn't mean 4x increase in the file size, but I was discussing in terms of filesize. Unless you have absolutely insanely higher metadata stored per image than we do, your metadata probably amounts to under 2.5% of the image data for hypothesized 2MB files, so maybe 2.05 goes to 10.05 which is basically still a 400% increase. Part of the metadata is on fast (DB) disk, so your costs don't scale absolutely linearly, but bulk cold disk is still scaling up 4x. To a large extent, I have worked through it for our application (I run IT for a top 100 e-commerce site that does a very substantial amount of uploads in the holiday season; we choose to self-host several dozen TB and have an emergency overflow possibility out to S3 if we fill up our in-house storage). I can't see how storage costs are meaningfully sub-linear, and bandwidth costs can be due to bulk pricing, but are still first-approximation linear with upload size. (You might argue that you can use 95/5 pricing to work around that by forcing users to schedule their uploads for an off-peak time, but then you could do that in the base case as well.) I would love to hear more about your surprises in the model, either on HN or privately, as this represents a substantial portion of my budget, and if I'm missing something, I'm not too proud to change course. :) |
|
I was going with 4x file size and assumed that bandwidth for images is linear in file size, aka no bulk discounts.
One of my points is that IOs have a cost too, a cost that is largely independent of file size.
To first approximation, disk IO capacity is proportional to the number of disks. (Yes, some disks, especially the flash ones, support a lot more IOs/sec than others.) If you're IOs bound, you either have spare disk space or can probably increase the amount of disk space at a sublinear price. (1.5TB drives are <3x as expensive as 500GB.)
For some data transfers, AWS has a "per operation" charge in addition to a bandwidth charge. The latter is proportional to file size but the former is not.
My application has a lot of processing costs. Most of them are on metadata, not the images, so they don't grow with image size. I also do a lot of stuff with "thumbed" images - producing them is a function of file size but storing them and moving them around isn't.
My model is different from yours in at least two ways.
(1) I'm estimating some things.
(2) I'm using AWS and GAE prices. (I'm assuming the highest prices because if my app is getting enough use that I'm getting bulk discounts, I've got other problems.)
If I had a model that tracked my actual experience, I wouldn't listen to some bozo on a website....
FWIW, data in a db is significantly larger than the actual data.