|
|
|
|
|
by boyter
2509 days ago
|
|
Image resizer scaling is one of the more interesting problems I have worked on in the last 10 years of so. I was part of a small team that designed and built the resizer that powers the nine.com.au network of sites. Modest by USA standards it gets close to hundreds of millions of views a day across the whole network. We ended up using shared nothing architecture. The whole thing ran on 6 T2 large AWS instances using a slightly modified version of Thumbor where if you rotated the key the disk cache would be shared so avoid a large scale cache invalidation. It worked quite well and we rotated the key every few weeks. Things I learnt. Pretty much all image resizers have the same performance as all the good ones call out to C libraries in the end. Akamai (the CDN we used) despite having site-shield on would still hit the back-end ~100 times for the same image on occasion as I suspect all of the whitelisted machines could request the same image if their internal sharing didn't kick in fast enough. Long tail images were the ones that brought the resizer to its knees. The hot images would quickly enter the local disk cache and were not an issue. Purge the whole cache though and the long tail images would quickly overwhelm the instances. The last thing I learnt was to have a backup cloud-front ready to flip over to. At one point Akamai had issues and the resizer was facing origin load. It capped out at about 300 RPS which couldn't keep up with what was expected. It got even worse when the T2 instances ran out of credit. Spinning up cloud-front solved that issue once the DNS flip kicked in. One good thing to come out of it was I helped write the C# thumbor library as we had one site that was using C# and nobody could move over to the new resizer without it. |
|
A very dumb proposal: no caching, resize on the fly, the gpu has many gigabits of resizing performance for as long as JPEG is involved. One GPU works in decoding with VDPAU, one in encoding with CUDA.
That knocked down any google app engine based "elastic" service on economic basis, but the catch is that you have to send that GPU resizer to every colo. That did not work out as with google app engine you were getting access for google's POP network for almost free, and they were already paying for gigantic amount of CDN traffic.
----
When I worked as sub-subcontractor for the Alibaba's RDMA wired DC project, there was one demo by another team where they got DSP devs involved and they got a 10GB/s JPEG transcoders for under 100W. I think, most of power budget was going to the FPGA that was linking it all with the NIC :/
An expensive toy, but it again demonstrated to me just how powerful is the "lockdown" power of all those "cloud" companies. You can not buy anything like this on the open market.
Imagine what it could've been if they offered something more cash worthy over the RDMA there.
I said long ago that the killed product during the Bitcoin boom was not the mining itself, but leasing and renting the rigs. Your capital costs get covered near instantly, and you can cash out the next week. I believe that all that "cloud" thing will eventually follow this path.