Hacker News new | ask | show | jobs
by TekMol 999 days ago
Not true.

I have been sitting together with folks whose life is configuring MySql and they couldn't tune InnoDB to match the performance of MyIsam.

This was for an application which did a long running numbercrunching job on a large database with a mix of inserts/updates/selects.

Might be different for other types of workloads. But for this workload, InnoDB did not stand a chance even with a lot of tuning.

2 comments

That depends on what you're doing.

For initial bulk file loads using mysqlsh [0], MyISAM doesn't stand a chance because it takes a table lock, so the tool can't parallelize the import like it can with InnoDB.

I wrote a small script to test this [1]. tl;dr 12,500,000 rows totaling 2.75 GB took 6 min 57.2374 sec at 6.59 MB/s for InnoDB, and 11 min 18.4550 sec at 4.05 MB/s for MyISAM. During the test, whereas the InnoDB load utilized multiple cores, the MyISAM load only saturated a single core.

Now, normal INSERTs are likely another story. Working on that test now.

[0]: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysqlsh.html

[1]: https://gist.github.com/stephanGarland/06ff4820f99e5966ba097...

Python script [0] to do bulk INSERTs. I TRUNCATEd the tables, chunked 10,000 INSERTs at a time, and recorded the results. This was split into two passes of 12,500,000 rows each, committing after each chunk of 10K rows.

tl;dr On an empty table, InnoDB is 1.7% slower than MyISAM. On a full table (I raised bulk_insert_buffer_size to 128 MiB, which is a MyISAM-only optimization, and is more than enough to handle 10,000 rows of my data [21 MiB]), they are effectively identical, with InnoDB being 0.1% slower than MyISAM.

It's worth noting that this is with InnoDB's defaults for doublewrite enabled. If (and only if) you're using ZFS, you can disable that, and get an enormous speed boost. These data dirs are on XFS, so I didn't disable the doublewrite buffer.

Finally, I tried some simple `SELECT COUNT(*) WHERE user_id <= ...` - `user_id` is the PK and a UUIDv7, inserted in order so as to avoid page splits for InnoDB. Not as familiar with MyISAM's page layout, but I assume it at least isn't _hurt_ by ordering on insertion. The value selected was near the end (~1000 away), so it had to do a huge covering index scan.

For InnoDB, I saw [22.01, 22.39, 22.69] seconds. For MyISAM, I saw [27.12, 27.53, 26.81] seconds.

If you're able to give me some specific workloads, or settings to tweak for MyISAM, I'm very willing to redo the test, but as of now I'm unconvinced that MyISAM has an edge.

[0]: https://gist.github.com/stephanGarland/cc505a9d9c0d044a340fa...

I can't remember what exactly was slow, but some characteristics of the workload:

Some of the tables were so big that neither the table itself nor the indexes fit into memory.

The important tables where rather narrow. Like 5 integer fields or so. But long (tens of millions of rows? hundreds of millions? Can't remember).

The queries were a continous mix of selects, updates and inserts.

Some of the selects were based on multiple criteria (SELECT ... FROM ... WHERE x=... AND y=...).

Some of the updates were too (UPDATE ... WHERE x=... AND y=... SET ...).

None of the queries took very long. The machine was doing hundreds of queries per second if I remember correctly.

The whole process took weeks (billions of queries, if I remember correctly).

It'll probably take me longer than you'll maintain interest in this thread, but I'll reply when done anyway.

I've created 3 tables with an auto-incrementing unsigned bigint as the PK, and then 4 unsigned int fields with random values. Each table has 25,000,000 rows (and has unique values, modulo the PK). Once they're loaded, and I come up with some queries, I'll create some indices, and then reduce the buffer pool to < index size to mimic those conditions.

I won't be doing billions of queries, though :p

That could be of the transactions, innodb will keep your data reads intact (of that is not of importance, if the update is more important than the result based on the current state of the data, for example with UPDATE SET field = field + 1)

You could maybe have matched with a change in the transaction isolation level for innodb

Tbh it sounds like a whalescript to me. Often it's not the database's fault the performance is bad ;)