Hacker News new | ask | show | jobs
by crazygringo 1961 days ago
As long as you've got primary keys on the huge table, there's a hacky solution -- create a second table with columns for just the first table's primary key and the columns you're indexing and your desired index, and ensure you always write/update/delete both tables simultaneously using transactions. Then when needed, use the index on the second table and join it to your first with the primary key.

Annoying, but it should work for most queries I'd expect without too much SQL.

I've definitely "rolled my own indexing" like this in the past, though it's more often been duplicating strings into a custom "collation" or other transformations.

Another solution is simply to split your table in two, with the same columns in both, and the index only on one of the tables. But of course that really depends on your business logic -- queries that need to retrieve data from both tables together can get pretty hairy/slow, and if you've got auto-incrementing PKEY's then avoiding collisions between the two tables can be tricky on its own. So this is definitely the less general solution.

Of coure it certainly would be nicer if MySQL supported partial indexes. It seems so useful, I'm surprised it didn't happen long ago.

1 comments

The first approach is one of the steps towards normalising a database.
Actually it's the opposite of database normalization.

Normalizing removes data redundancy. This adds data redundancy.

When I design a database structure, it's common to start with the most normalized representation possible. And then to denormalize the minimum necessary for performance reasons -- duplicating rows and/or columns just like here so certain data can be retrieved more quickly, whenever indexes aren't powerful or featured enough.

I think what lucian1900 may be thinking is that instead of

    create table purchase_order (
        id int primary key,
        ordered_on timestamptz not null,
        customer_id int not null references customer,
        canceled_on timestamptz
    );
you could have

    create table purchase_order (
        id int primary key,
        ordered_on timestamptz not null,
        customer_id int not null references customer
    );

    create table order_cancelation (
        order_id int primary key references purchase_order,
        canceled_on timestamptz not null
    );
This is indeed a better normalised schema and it allows you to index order_cancelation.canceled_on without worrying about nulls.
Oh then, absolutely. I was assuming a constraint that columns couldn't be removed from the original table. But if you can, then your example is an even better solution.
Exactly, that’s what I thought was being described.