| HN Mirror

> It’s ok for the narrow set of use cases that it is good at

My comment was partly in jest, and mostly hoping to spur conversation, not as a true disagreement or criticism with your comment.

Still, saying "It's OK" for those narrow use cases may be too dismissive, even if "great" is an exaggeration. There are plenty of examples where DBMSes (especially relational ones) have fared poorly in comparison.

> Outside of that it needs something that has some intelligence

I fear I'm missing your point here. Certainly "the filesystem" as in the Unix syscall interface to a hiearchical arrangement of files, lacks intelligence, but that doesn't mean the specific underlying implementation must.

We've even come a long way from every being the Berekeley Fast Filesystem. Besides the many choices of underlying filesystems (including CoW ones like ZFS and BtrFS)

> and ability to read optimise it.

I assume by read optimization you don't just mean something like the buffer cache, but a user-specified index?

> Heavily locked stuff, lots of small things, huge number of locatable data entries, not so much. Which is Jenkins.

Does Jenkins use external locking? It would be odd in light of some of the comments elsewhere in the thread touting its advantage of being a single process. Of course, even if it's using only locking internal to itself, there's a good argument that its authors needlessly re-invented a DBMS (which we've seen happen when other niche-use databases get used for broader purposes).

I'm not sure a large number of tiny files is inherently problematic for a filesystem, merely problematic for existing implementations, and some are better at it than others. What about something like libferris (assuming perfectly spherical cows and ignoring the performance implications of FUSE for a moment), which can back a filesystem with an arbitrary database?

IOW, is Jenkins-using-the-filesystem an Ops optimization/tuning problem, or is it a more fundamental problem that can only be addressed with modifying its code?

> The only way it performed well was keeping it to 2Gb spindles.

I worked with the aforemention hardware extensively, early in my career, but at a company that wrote a data warehousing (aka OLAP) RDBMS.

I'm reasonably confident in saying that your performance observations have almost nothing to do with the filesystem itself and everything to do with I/O performance in general. Large numbers of smaller spindles were absolutely required good database performance and scalability.

> I’m well aware of scale issues on file systems

I didn't mean to suggest you didn't, rather the opposite, as "the collective we" was a euphemism meant to imply "everyone but us".

Anyway, my overall point is that you and I may be acutely aware of the real, practical problems with scaling filesystems, but we're rare. Since there's nothing fundamental/theoretical that makes the filesystem an obviously poor choice at modest scale (the definition of which increases as computing power increases), the lesson does not get learned by everyone.

Instead, because truly large scale becomes rarer and rarer as computers become more powerful (CPU more than I/O, of course, but then.. SSDs), every time the lesson is re-learned by an individual/company, they think it's a new, or at least unique problem, and we end up with a re-invention of Portable Batch System (a fairer characterization than re-invention of cron, IMO).