Hacker News new | ask | show | jobs
by jsr 5231 days ago
Definitely an impressive benchmark by any standard. However, there are some things to be aware of:

1) They used Infiniband interconnects. Running on ethernet is likely to yield less impressive results.

2) Their benchmark does simple primary key lookups. If you start doing joins or transactions that need to hit multiple data nodes, things will slow down. Depending on your workload, this may or may not be an issue.

3) NDB is an in-memory storage engine, so you're limited to the aggregate RAM in your cluster for max storage size.

4) AFAIK, MySQL Cluster doesn't re-balance, you need to pre-determine how data is partitioned and changing it at runtime is hard. I don't know if this has changed in the later releases.

3 comments

For point #3, NDB has supported on disk non-indexed attributes for a while now (2-3 years?). So you just need to be able to fit indexes in memory which is a much smaller dataset, but still limiting.

I'm sure for the benchmark it was all in memory though.

Also to add to #3: The NDB C++ API doesn't use raw SQL queries either (it uses a lower lever of abstraction for accessing the database), so it avoids the overhead of having to parse SQL queries. Most production systems and third-party libraries use SQL queries.
#4 is fixed in version 7.0 and upwards