| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by boyter 2684 days ago

Love reading this. It has always been one of those interesting things I kept in the back of my mind in my day to day.

I was very excited when I actually got to implement it on a real world project.

I was writing a scale out job which used ffmpeg to make clips of video files. To speed it up I kept the downloaded files (which could be 150 GB in size) as a local cache. Quite often a clip is made of the same file. When the disk was full (there was a separate disk for download and clip output) selected two of the downloaded files randomly and deleted the older one. Loop till there was enough disk space, or no files.

It's something I thought I would never actually get to implement in the real word, and thus far is working very well, the caching speeds things up and the eviction seems to avoid too many cache misses.

1 comments

zodvik 2684 days ago

2-random was also used in this HDFS change - https://issues.apache.org/jira/browse/HDFS-8131.

Default block placement policy chose datanodes randomly for new blocks. This change selects 2 datanodes randomly and picks the one with lower used disk space.

link