Hacker News new | ask | show | jobs
by masklinn 4226 days ago
> PostgreSQL: the docs actively encourage you to simply use the TEXT type. This is a high-performance, UTF-8 validated text storage type which has no length limit.

The emphasised part is incorrect, Postgres's text storage has a 1GB limit (that stands for CHAR, VARCHAR and TEXT which all use the same underlying mechanism): http://www.postgresql.org/docs/9.4/static/datatype-character...

> In any case, the longest possible character string that can be stored is about 1 GB.

Although note that this is 1GB, you have to take in account both multibyte characters and compression (which by default is enabled on text values of more than 2k, using LZ)

2 comments

That being said, if your relational database has 1GB records, you really need to reevaluate whether you are using the right tool for the job.
While I kind of agree, what exactly is the "right tool" for storing large blobs?

You can use the filesystem but then you have to manage data migration/sync by "hand" which is kind of a bitch for distributed systems.

Honestly storing large blobs with meta-data telling you about the blobs is extremely common. I'm yet to hear of how you're meant to do it correctly.

I had to solve this problem at my last job and we had vendors let us know how companies like Apple, Sony, Disney, EA etc have solved it. Basically there are two ways to do it. (1) Store it on some "filesystem". In quotes because how you do it can vary wildly e.g. S3, GlusterFS, Standard directories using DRBD for HA. (2) Take the blob, slice it into pieces, hash it and spread it across a sharded database.

Generally it seems to be that if you have lots of unique large files then use filesystem. But if you have files which are likely to have duplicates then use a database. So a file storage locker may use (1) but a service like iTunes Match would use (2). And IIRC Apple in fact does store at least uploaded music files in Cassandra.

We found that a GlusterFS filesystem is a good way to go for our case of 100s of terabytes of unique files with some failover. Amazon S3 would simply cost far more. The file metadata is in the RDBMS, obviously. But storing the binary data in the RDBMS wouldn't be a good idea, and would choke long before it got to our scale.
Usually storing in filesystem with a reference in database seems to work best. Much easier for database management and backup, and file management as well.

The OP doesn't appear to have hear of SQL Server FileStream or FileTables, which handle this abstraction for you: http://msdn.microsoft.com/en-us/library/ff929144.aspx

On pgsql, the blob type. It stores the object out of the row store, but still in the database.
commonly, s3 or zfs nas for storage and pg for metadata
Commonly ? No not even close. The most common solution is that people just dump it as a BLOB in whatever database they are currently using. Which based on numbers is likely to be MySQL.
That's basically what a client of mine does.
+1
Let's agree he should have said "no practical length limit".

If you are using gigabyte-long text strings in a relational database, you are doing it very wrong.

> Let's agree he should have said "no practical length limit".

I would have agreed if he hadn't explicitly mentioned the 2GB limit of MSSQL's TEXT and NTEXT in the very next paragraph.