Hacker News new | ask | show | jobs
by shamsulbuddy 4301 days ago
Recently i have got the task to work on a 10 GB and 12 Million rows table to make it work with complex aggregation queries like SUM or COUNT.

Server - Ubuntu 14.0.4 Mysql - Percona XtraDB Cluster 5.6 RAM - 16 GB CPU - 8 Cores - 2.95 Ghz SSD - 100 GB

I tried many options --

1) Increased the innodb_buffer_pool_size to 8GB on a 16GB RAM machine -- It helped but nothing magical here 2) Add few more Keys on Date based columns and force the users in front end to select at least one date range column -- Seen some performance gain here 3) Tried MyISAM engine -- I would say in present days MyISAM is history since Innodb itself is pretty much comparable to MyISAM -- so didn't see amy much performance gain .. one disadvantage was that loading of this 12 Million rows took ages .. and also "SELECT table_rows FROM information_schema.tables" had hanged , so i was not able to to figure out that how many rows have been loaded in my Table. 4) Finally I tried Partitioning the Table based on a DATE column -- Massive Performance gain .. If the user has selected a date range which falls under one or two partitions , then you can get the results very fast , even if it falls in many partitions still the performance is acceptable .. Useful ref - http://www.slideshare.net/datacharmer/mysql-partitions-tutor... -- remember that its Mandatory that you add that date column as part of Primary Key if you are Partitioning on that date column .

--- For Ref -- my Innodb settings

# INNODB # innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 512M innodb-flush-log-at-trx-commit = 2 innodb-file-per-table = 1 innodb-buffer-pool-size = 6G bulk-insert-buffer-size = 256M innodb_thread_concurrency = 16

2 comments

If it truly was only 10GB of data, something was seriously wrong with your table, schema, or query.
Do you think that 10GB data is easily to manage .. what all common practices do you suggest . any examples with use case
Just an observation that the word "index" isn't found in your post - you do have those, right?
My suggestion is to try TokuDB storage engine on that table. Our (admittedly text heavy) workload got order of magnitude scalability improvements and a dramatic fall in query times.
I tried Tokudb as well and below are the stats comparison for complex SUM and COUNT query. Please note that the number of rows and size is exactly the same and there is a partition on a Date column

Table size 10GB and rows 12 Million

Also note that I am using the default settings of Tokudb

With Tokudb - 1 min 41.38 sec With Innodb - 58.17 sec

So almost double.