Hacker News new | ask | show | jobs
by Annatar 3715 days ago
One should never store binary data inside of a database, because those are either difficult or impossible to index properly, consume inordinate amounts of space, and can bring a RDBMS to a grind. Store the files in a filesystem and store the paths to the files in the database. Refrain from using database engines which cannot provide instantaneous atomicity, consistency, isolation, and durability. If you do not, you will experience data corruption and thus availability issues, possibly of both transient and silent nature. Refrain from using databases which do not provide ANSI SQL, as you will eventually need the SQL data manipulation mechanism, which is consistent.

To avoid the managed hosting penalty, you can start with SQLite, and do the replication and clustering within your application. For capitalizing on RDBMS capabilities, use PostreSQL Citus or Oracle RAC. Stay away from MySQL, as it silently corrupts data and has no OS authentication, making it difficult to automatically deploy.

1 comments

Using the file system for storage, the "index" could be code, e.g. a Python dictionary, a Clojure hash, or JavaScript Object. This might simplify building a first iteration product and avoid the technical and cognitive overhead of learning a properly implementing an unfamiliar DBMS.
Thanks for the reply and tips, but my question was on more of choosing self or managed hosting?
Always choose self hosting. Not only will you learn a lot about operations and proper hardware design, servicing, and redundancy, but you will have maximum control.