Hacker News new | ask | show | jobs
by mike_hearn 67 days ago
Disclaimer: I work part time on the DB team.

You could also consider renting an Oracle DB. Yep! Consider some unintuitive facts:

• It can be cheaper to use Oracle than MongoDB. There are companies that have migrated away from Mongo to Oracle to save money. This idea violates some of HN's most sacred memes, but there you go. Cloud databases are things you always pay for, even if they're based on open source code.

• Oracle supports NoSQL features including the MongoDB protocol. You can use the Mongo GUI tools to view and edit your data. Starting with NoSQL is very easy as a consequence.

• But... it also has "JSON duality views". You start with a collection of JSON documents and the database not only works out your JSON schemas through data entropy analysis, but can also refactor your documents into relational tables behind the scenes whilst preserving the JSON/REST oriented view e.g. with optimistic locking using etags. Queries on JSON DVs become SQL queries that join tables behind the scenes so you get the benefits of both NoSQL and SQL worlds (i.e. updating a sub-object in one place updates it in all places cheaply).

• If your startup has viral growth you won't have db scaling issues because Oracle DBs scale horizontally, and have a bunch of other neat performance tricks like automatically adding indexes you forgot you needed, you can materialize views, there are high performance transactional message queues etc.

So you get a nice smooth scale-up and transition from ad hoc "stuff some json into the db and hope for the best" to well typed data with schemas and properly normalized forms that benefit from all the features of SQL.

5 comments

Good points, but Postgres has all those, along with much better local testing story, easier and more reliable CDC, better UDFs (in Python, Go etc.), a huge ecosystem of extensions for eg. GIS data, no licencing issues ever, API compatability with DuckDB, Doris and other DBs, and (this is the big one) is not Oracle.
Unless I’ve missed something, Postgres doesn’t have automatic index creation, nor does it have JSON introspection to automatically convert it to a normalized schema (which is insane; I love it). It also doesn’t do any kind of sharding on its own, though of course forks like Citus exist. It definitely doesn’t do RAC / Exadata (not sure which part this falls under), where multiple nodes are connected and use RDMA to treat a bunch of SSDs as local storage.

I love Postgres, and am not a huge fan of Oracle as a corporation, but I can’t deny that their RDBMS has some truly astounding capabilities.

I think that’s the beauty of PG here, you can find solutions to most of this:

Index creation https://stackoverflow.com/questions/23876479/will-postgresql...

JSON->DB schema https://jsonschema2db.readthedocs.io/en/latest/index.html

Pg shared disk failover is similar but RAC is quite unique, you’re not going to use though with a rented cluster?

https://www.postgresql.org/docs/current/different-replicatio...

Personally for me any technical advantages don’t outweigh the business side, YMMV :)

RAC is a default part of any cloud Oracle DB, I think! I must admit I'm not an expert in all the different SKUs so there might be some that aren't, but if you rent an autonomous DB in the cloud you're probably running on ExaData/RAC. That's why the uptime is advertised as 99.95% even without a separate region.

> Index creation https://stackoverflow.com/questions/23876479/will-postgresql...

I was ambiguous. That's an answer telling how to create indexes manually, and saying that you get an index for primary keys and unique constraints automatically. Sure, all databases do that. Oracle can create arbitrary indexes for any relation in the background without it being requested, if it notices that common queries would benefit from them.

Forgetting to create indexes is one of the most common issues people face when writing database apps because the performance will be fine on your laptop, or when the feature is new, and then it slows down when you scale up. Or worse you deploy to prod and the site tanks because a new query that "works fine on my machine" is dog slow when there's real world amounts of data involved. Oracle will just fix it for you, Postgres will require a manual diagnosis and intervention. So this isn't the same capability.

> JSON->DB schema https://jsonschema2db.readthedocs.io/en/latest/index.html

Again I didn't provide enough detail, sorry.

What that utility is doing is quite different. For one, it assumes you start with a schema already. Oracle can infer a schema from a collection of documents even if you don't have one by figuring out which fields are often repeated, which values are unique, etc.

For another, what you get after running that utility is relational tables that you have to then access relationally via normal SQL. What JSON duality views give you is something that still has the original document layout and access mode - you GET/PUT whole documents - and behind the scenes that's mapped to a schema and then through to the underlying SQL that would be required to update the tables that the DB generated for you. So you get the performance of normalized relations but you don't have to change your code.

The nice thing about this is it lets developers focus on application features and semantics in the early stages of a startup by just reshaping their JSON documents at will, whilst someone else focuses on improving performance and data rigor fully asynchronously. The app doesn't know how the data is stored, it just sees documents, and the database allows a smooth transition from one data model to another.

I don't think Postgres has anything like this. If it does it'll be in the form of an obscure extension that cloud vendors won't let you use, because they don't want to/can't support every possible Postgres extension out there.

You had me at "is not Oracle"
I generally limit Oracle to where you are in a position to have a dedicated team to the design, deployment and management of just database operations. I'm not really a fan of Oracle in general, but if you're in a position to spend upwards of $1m/yr or more for dedicated db staff, then it's probably worth considering.

Even then, PostgreSQL and even MS-SQL are often decent alternatives for most use cases.

That was true years ago but these days there's the autonomous database offering, where DB operations are almost all automated. You can rent them in the cloud and you just get the connection strings/wallet and go. Examples of stuff it automates: backups, scaling up/down, (as mentioned) adding indexes automatically, query plan A/B testing to catch bad replans, you can pin plans if you need to, rolling upgrades without downtime, automated application of security patches (if you want that), etc.

So yeah running a relational DB used to be quite high effort but it got a lot better over time.

At that point, you can say the same for PostgreSQL, which is more broadly supported across all major and minor cloud platforms with similar features and I'm assuming a lower cost and barrier of entry. This is without signing with Oracle, Inc... which tends to bring a lot of lock-in behaviors that come with those feature sets.

TBF, I haven't had to use Oracle in about a decade at this point... so I'm not sure how well it competes... My experiences with the corporate entity itself leave a lot to be desired, let alone just getting setup/started with local connectivity has always been what I considered extremely painful vs common alternatives. MS-SQL was always really nice to get setup, but more recently has had a lot of difficulties, in particular with docker/dev instances and more under arm (mac) than alternatives.

I'm a pretty big fan of PG, which is, again, very widely available and supported.

Autonomous DB can run on-premises or in any cloud, not just Oracle's cloud. So it's not quite the same.

I think PG doesn't have most of the features I named, I'm pretty sure it doesn't have integrated queues for example (SELECT FOR UPDATE SKIP LOCKED isn't an MQ system), but also, bear in mind the "postgres" cloud vendors sell is often not actually Postgres. They've forked it and are exploiting the weak trademark protection, so people can end up more locked in than they think. In the past one cloud even shipped a transaction isolation bug in something they were calling managed Postgres, that didn't exist upstream! So then you're stuck with both a single DB and a single cloud.

Local dev is the same as other DBs:

    docker run -d --name <oracle-db> container-registry.oracle.com/database/free:latest
See https://container-registry.oracle.com

Works on Intel and ARM. I develop on an ARM Mac without issue. It starts up in a few seconds.

Cost isn't necessarily much lower. At one point I specced out a DB equivalent to what a managed Postgres would cost for OpenAI's reported workload:

> I knocked up an estimate using Azure's pricing calculator and the numbers they provide, assuming 5TB of data (under-estimate) and HA option. Even with a 1 year reservation @40% discount they'd be paying (list price) around $350k/month. For that amount you can rent a dedicated Oracle/ExaData cluster with 192 cores! That's got all kinds of fancy hardware optimizations like a dedicated intra-cluster replication network, RDMA between nodes, predicate pushdown etc. It's going to perform better, and have way more features that would relieve their operational headache.

In the spirit of helpfulness (not pedantry) FYI "knocked up" means "impregnated". Maybe "put together"?
Ah, this must be a British vs American English thing, thanks for the info.

Yes I meant it in this sense: "If you knock something up, you make it or build it very quickly, using whatever materials are available."

https://www.collinsdictionary.com/dictionary/english/knock-u...

And, again... most of my issues are with Oracle, Inc. So technical advantages are less of a consideration.
But then you’d have to interact with Oracle.

So.

Yeah no sane person would be that stupid

If you have an option, never ever use Oracle!

Never!

I wanted to hate you for suggesting Oracle, but you defend it well! I had no idea