Hacker News new | ask | show | jobs
by manigandham 2108 days ago
Schemaless data (or at least schema-on-read rather than on-write) is the primary feature. Store JSON documents and index on any field.

Also it was great at sharding and scaling horizontally when first released, and one of the few options available at that time. It's since been eclipsed by much better systems that don't have such a convoluted and fragile setup.

These days there's not much benefit over a JSON field in a relational database, unless you're really invested in JSON/Javascript through your entire stack and want that to reach into the database as well.

2 comments

Most JSON field databases treat the JSON field as text and index it as free text. with MongoDB you can index at any level into the document. Since 4.2 you can use wildcard indexes to index a document and any new fields that are subsequently added automatically.

https://docs.mongodb.com/manual/core/index-wildcard/

The whole point of having a JSON field is so that it has more structured than just text, otherwise you can just use a text field (like SQL Server does). Also they all support various JSON querying and indexing functions that including subfield access with optional computed properties.

Sure MongoDB has some extra ergonomics for dealing with JSON/BSON data, but how much benefit this really adds is still up for debate. As horizontal scalability becomes more natively supported, MongoDB will lose even more of its benefits.

Making JSON the coin of the realm and putting it at the core of your database design and query language is a little bit more than extra ergonomics :-)
> Also it was great at sharding and scaling horizontally when first released, and one of the few options available at that time. It's since been eclipsed by much better systems that don't have such a convoluted and fragile setup.

What applications are much better in your opinion?

I'd recommend sticking with relational databases since they all support JSON columns now. If you need horizontal scalability then there are many choices like CockroachDB, Yugabyte, TiDB, Vitesse, MemSQL, and others.

If still want a document-store then RavenDB is a great choice with proper clustering, full-text search, SQL-like querying, graph queries, etc. ArangoDB is also good choice.

I apologize, because I literally don't know, but have you used any of these solutions at the scale where there might be billions or trillions of rows in a table/collection? I'm currently using Mongo at that scale and would love to evaluate some alternatives.

If it helps for context, we have accepted that ad-hoc queries are not possible, and we have our own solution for searching.

TiDB has a similar case study: Queries over 1.3 Trillion Rows of Data Within Milliseconds of Response Time at Zhihu.com https://pingcap.com/success-stories/lesson-learned-from-quer...

The latest stats in the same case scenario (already-read posts) Zhihu is:

- 2.6 Trillion Rows

- 560TB data

- 200 TiKV instances

Very cool! Thanks, I will look into it.
We used MemSQL with 100s of billions of rows for years in production.

If you're working at that scale, it sounds like that's more of an OLAP use-case where MemSQL and other column-oriented databases would suit better than an OLTP document-store. Maybe you can share more details for better recommendations.