Hacker News new | ask | show | jobs
by edutechnion 3901 days ago
riofs (https://github.com/skoobe/riofs) seems much faster than s3fs and deserves a spot in any benchmarks.
3 comments

Yes I looked into it (I only found out about it after I started working on goofys). They have a stub flush() which does nothing (https://github.com/skoobe/riofs/blob/master/src/rfuse.c#L104...), so of course they are faster and the benchmarks won't be meaningful.
AWS S3 doesn't expose a flush()-like call in their API, hence what is the point of exposing something that the underlying service doesn't support?
data on s3 is durable after a successful PUT (or a complete multipart upload), so their flush() is implied.
Fun fact: the S3 API actually makes no such claim.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT...

Nope. Let's try the developer guide.

http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class...

Nope again. Let's try the FAQ.

https://aws.amazon.com/s3/faqs/

They claim S3 is "designed for" 99.999999999% durability, but there's neither a guarantee nor a clear definition of when an object starts to be covered by one. While it's both intuitively obvious and conventional wisdom that an object is durable at the end of a PUT, as far as I've ever been able to tell Amazon doesn't come out and say so.

That's one of the problems with company-defined "standards" BTW. This kind of issue would surely have been noticed and discussed in any kind of open standards process. It's what makes those processes so tedious. De facto standards can be turned around a lot faster, but there's a necessary sacrifice in precision to go with that.

exactly, so why do you need a flush then?
I looked at the code some more and they do handle release(), so much of my point above was invalid. I expect riofs's streaming write performance to be comparable to goofys because we both use the same implementation strategy.
Not surprising that s3fs is slower, it's implementation quality is not very high. Goofys looks much better on first sight. It would be nice if anybody could do a benchmark between goofys and riofs (without cache). But honestly, if you have some proper request handling there is not that much to tune. The biggest performance gains can be achieved from a good cache implementation and make such a system useful in a production environment, that's what riofs was written for.

Disclaimer: I initiated and supervised riofs.

first off apologies for the criticism about flush(), I only skimmed the code and flush() is usually where I look for first.

Anyways goofys at this point is just a toy project. There are more optimizations that I can potentially do (proper read prefetching) it's mostly good enough.

I've updated the benchmark to include riofs