Hacker News new | ask | show | jobs
by HenryR 2814 days ago
I am surprised that anyone is trying to differentiate on storage at this time, precisely when that's the part of the stack that's being cannibalized by the cloud vendors (look at the rate of innovation in HDFS over time; the effort is going elsewhere). Are you just targeting on-premise clusters, or is there some differentiation planned for the cloud as well?
1 comments

We think that there is a niche for a higher performance dist FS than S3. We have integrated NVMe hardware with our HDFS implementation (HopsFS) and made its metadata layer distributed. NVMe means you can, for example, work with datasets with millions of files for deep learning - instead of having to munge them into parquet files because your FS is slowing down your machine learning pipeline.

Reference: https://www.logicalclocks.com/millions-and-millions-of-files...

We have also redesigned the stack around our distributed metadata layer.

We are primarily targeting on-prem right now, but HopsFS would be the fastest DFS in the cloud if you ran it there today.

How does HopsFS compare to Lustre?