Hacker News new | ask | show | jobs
by pcowans 6756 days ago
I'd caution against using the filesystem unless you think things out very carefully. Typically you'll have a binary tree for the index in an RDBMS which gives you log(N) scaling on lookups for joins etc. If you use the filesystem your lookup time will probably be linear. You also pay a penalty in space for the fact that files have to be a whole number of inodes, so having many small files is a really inefficient use of disk space.

In essence, RDBMS are highly optimised for the sort of things people do with databases, but you need to know what you're doing with them.

1 comments

Definitely, you don't want to scan the filesystem to do searches. But finding user 1234's profile in profiles/1234.data is easy. In my experience this works well (even with the wasted space) though operations like fsck start to get painful when you have 1 million+ files.

An anecdote on bad linear search: in the 90s I worked on a system in C++ that made heavy use of inheritance and other C-plus-plus-ery. There was a bunch of file opener classes each that found files in a different way (regexp, straight path, glob). They differed only in a method named match. The base class called match on every file name in the directory until it returned true. That was fine for regexps but even if you opened a file _by name_ it took N/2 syscalls to readdir to find it! This worked OK until someone created a directory with 10k entries on a production box. Ouch.