| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Zvez 477 days ago

calling everything 'for AI' is the new standard

>if you're reading from, like, big Parquet files, that probably means lots of random reads

and it also usually means that you shouldn't use s3 in the first place for workloads like this. Because they are usually very inefficient comparing to distributed fs. Unless you have some prefetch/cache layer, you will get both bad timings and higher costs

1 comments

CobrastanJorji 477 days ago

But a distributed FS is far more expensive than cloud blob storage would be, and I can't imagine most workloads would need the features of a POSIX filesystem.

link