Hacker News new | ask | show | jobs
by TekMol 999 days ago

    they work like auto commit
Which makes the code way simpler and performance way higher.
1 comments

No it does not. If you tune your innodb for the hardware you have your inserts will be significantly faster.

This whole thread reads like a few folks tried some setup without bothering to turn a single nerd knob and then complained about the results.

Innodb > myisam in any case where you’re actually inserting.

The opposite of true. InnoDB updates the memory view and writes a journal; then it writes the disk view; it maintains a highwater mark for the journal somehow, not sure I care. MyISAM just writes the disk view; no journal is written.

So what happens with InnoDB as writes increase is that eventually you get freezes while the disk view catches up, which frees up journal space so it can go on another writing spree. If you try to shut down the database during one of these freezes it hangs. If you force it to restart anyway, it immediately freezes again on restart... until it frees up some journal space. Paradoxically in this case you have to decide how long of a freeze you can tolerate and reduce the size of the journal(s) accordingly.

MyISAM doesn't write the journal, so it doesn't have journal writes. "Half as many writes" is the best case, in practice there are additional writes pertaining to indexes. (You can be even more clever and disable updating indexes or even drop all indexes and re-create them after all table writes are completed. You can even write to some temp table, build the indexes on it, then rename it to the actual table; or fail over to a new node with the refreshed tables.)

Everything I know about tuning InnoDB I learned tuning DEC RDB (based on the KODA engine, Oracle bought RDB to get the KODA engine for its distributed locking implementation), not much has changed in 30+ years.

Ok but row level locking. I love this conversation btw. We could be having this same discussion in 2013 with seemingly very few differences. Shows what shit oracle’s stewardship of MySQL has been.
Ok but mtbl. There's a whole culture devoted to immutable (write once) databases at very large scale. There's another whole culture which thinks ACID shouldn't apply to distributed systems and that global context should be avoided.

In the context of cloud, (MariaDB versus MySQL) what both brand options offer is multiple engine support. The cloud tends to offer its own engines, under its own brand(s).

Not true.

I have been sitting together with folks whose life is configuring MySql and they couldn't tune InnoDB to match the performance of MyIsam.

This was for an application which did a long running numbercrunching job on a large database with a mix of inserts/updates/selects.

Might be different for other types of workloads. But for this workload, InnoDB did not stand a chance even with a lot of tuning.

That depends on what you're doing.

For initial bulk file loads using mysqlsh [0], MyISAM doesn't stand a chance because it takes a table lock, so the tool can't parallelize the import like it can with InnoDB.

I wrote a small script to test this [1]. tl;dr 12,500,000 rows totaling 2.75 GB took 6 min 57.2374 sec at 6.59 MB/s for InnoDB, and 11 min 18.4550 sec at 4.05 MB/s for MyISAM. During the test, whereas the InnoDB load utilized multiple cores, the MyISAM load only saturated a single core.

Now, normal INSERTs are likely another story. Working on that test now.

[0]: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysqlsh.html

[1]: https://gist.github.com/stephanGarland/06ff4820f99e5966ba097...

Python script [0] to do bulk INSERTs. I TRUNCATEd the tables, chunked 10,000 INSERTs at a time, and recorded the results. This was split into two passes of 12,500,000 rows each, committing after each chunk of 10K rows.

tl;dr On an empty table, InnoDB is 1.7% slower than MyISAM. On a full table (I raised bulk_insert_buffer_size to 128 MiB, which is a MyISAM-only optimization, and is more than enough to handle 10,000 rows of my data [21 MiB]), they are effectively identical, with InnoDB being 0.1% slower than MyISAM.

It's worth noting that this is with InnoDB's defaults for doublewrite enabled. If (and only if) you're using ZFS, you can disable that, and get an enormous speed boost. These data dirs are on XFS, so I didn't disable the doublewrite buffer.

Finally, I tried some simple `SELECT COUNT(*) WHERE user_id <= ...` - `user_id` is the PK and a UUIDv7, inserted in order so as to avoid page splits for InnoDB. Not as familiar with MyISAM's page layout, but I assume it at least isn't _hurt_ by ordering on insertion. The value selected was near the end (~1000 away), so it had to do a huge covering index scan.

For InnoDB, I saw [22.01, 22.39, 22.69] seconds. For MyISAM, I saw [27.12, 27.53, 26.81] seconds.

If you're able to give me some specific workloads, or settings to tweak for MyISAM, I'm very willing to redo the test, but as of now I'm unconvinced that MyISAM has an edge.

[0]: https://gist.github.com/stephanGarland/cc505a9d9c0d044a340fa...

I can't remember what exactly was slow, but some characteristics of the workload:

Some of the tables were so big that neither the table itself nor the indexes fit into memory.

The important tables where rather narrow. Like 5 integer fields or so. But long (tens of millions of rows? hundreds of millions? Can't remember).

The queries were a continous mix of selects, updates and inserts.

Some of the selects were based on multiple criteria (SELECT ... FROM ... WHERE x=... AND y=...).

Some of the updates were too (UPDATE ... WHERE x=... AND y=... SET ...).

None of the queries took very long. The machine was doing hundreds of queries per second if I remember correctly.

The whole process took weeks (billions of queries, if I remember correctly).

It'll probably take me longer than you'll maintain interest in this thread, but I'll reply when done anyway.

I've created 3 tables with an auto-incrementing unsigned bigint as the PK, and then 4 unsigned int fields with random values. Each table has 25,000,000 rows (and has unique values, modulo the PK). Once they're loaded, and I come up with some queries, I'll create some indices, and then reduce the buffer pool to < index size to mimic those conditions.

I won't be doing billions of queries, though :p

That could be of the transactions, innodb will keep your data reads intact (of that is not of importance, if the update is more important than the result based on the current state of the data, for example with UPDATE SET field = field + 1)

You could maybe have matched with a change in the transaction isolation level for innodb

Tbh it sounds like a whalescript to me. Often it's not the database's fault the performance is bad ;)