Hacker News new | ask | show | jobs
by simtr 3673 days ago
Select * is the problem, not the blobs. Storing blobs should be more efficient that putting them in a filesystem somewhere else (which is effectively just another database) and dealing with the overhead of a bunch of other filesystem operations and losing referential integrity, etc.
3 comments

How do you figure that? In addition to making database performance less predictable and introducing all of the problems that BLOBs bring, you lose most of the benefits of the database in the process.

File systems are about storing files. Databases are about intelligently organizing data for retrieval and reliably delivering atomic transactions.

Any system that I've seen scale up well separated blob data to a traditional or object file system. In addition to scaling the database more effectively, this allowed the infrastructure teams to optimize delivery of blob data from a platform POV.

There are a few cases where it makes sense to store files in the database, but the constraints are fairly specific. The one time I did it to god effect was when all the files were fairly small (<40k), and one of the defining features of the system needed to be it's resilience. We were able to fold the file storage into the normal master/slave replication setup we were doing, which was a big reduction in complexity, compared to a separate replicating file store.
Sounds like you scoped it well. I see that as a similar use case to putting crypto keys or user pictures in LDAP.

Often folks doing this try to re-invent a content management system like FileNet in the DB.

In our case, it was for storage of electronically signed documents. Really it was an HTML template (the same displayed to them) with the inputs replaced with the values they presented, converted to PDF, and attached to the account. A few pages of PDFs like that doesn't take much room, and ensuring there isn't a mixup with files and accounts when it's for regulatory compliance makes it well worth any downsides.
It's easier to manage storage for file system objects than in-database objects. Things like performance (potentially on a per-file basis using symlinks), cost (likewise), out of band access (e.g. serving statically directly from web server and not bottlenecking on a DB connection), fragmentation, free space recovery on deletion, etc.
> and dealing with the overhead of a bunch of other filesystem operations and losing referential integrity, etc.

Yeah, better to instead deal with the overhead of the database combined with the overhead of the filesystem! Referential integrity is also very easy to deal with.