Hacker News new | ask | show | jobs
by donavanm 2690 days ago
This is well explored space in more tradtional storage land. Look in to cumulative distribution functions. This is going to show you your workload distribution and effective “working set” to optimize for in a time period. This will get you to a place where youre optimizing total cost vs performance as a function.

Youll also want to be cognizant of convex “shoulders” in the distribution that will trip up naive optimization algorithms. I do t have the link offhand, but search for hill climbing in relation to CDFs. Some related work might be in the relatively unexplored “cache insertion” problem area. Check out TinyLFU as an example of knowing what to cache being more beneficial than what to evict.

For more advanced techniques look in to some of the published work from places like Coho Data. They had a great paper back at usenix 2015ish around optimizing placement in dynamic workloads across different storage media.

And lastly experimentation is great to prove a hypotheis, but not the most effective discovery. Youll want to get representative workload traces and use those to replay/simulate against different constraints. Check out Fio and its IO trace capabilities for an example.