Wednesday, October 15, 2014

Updates with secondary index maintenance: 5.7 vs previous releases

My previous results for an update-only workload to a cached and IO-bound database used a table that did not require secondary index maintenance from the update. Prior to that I ran tests using mysqlslap and might have found a regression in InnoDB when the update requires secondary index maintenance (bug 74235). The mysqlslap test did all updates to the same row. The test I describe here chooses the row to update at random and this workload does not reproduce the worst case from bug 74235.

The results are that newer releases tend to do worse at 1 thread, 5.7.5 does best at 32 threads and 5.7.5 does slightly worse than 5.6.21 at less than 32 threads.  We need to fix the regressions in 5.7.5 at less-than-high concurrency workloads.

setup

The test here was the same as described in my previous posts with one exception -- the test tables have a secondary index on (k,c) and note that the column c is set to 0 at test start. The updates are of the form: UPDATE $table SET c = c + 1 WHERE id = $X. The test tables look like:
CREATE TABLE sbtest1 (
  id int(10) unsigned NOT NULL auto_increment,
  k int(10) unsigned NOT NULL default '0',
  c char(120) NOT NULL default '',
  pad char(60) NOT NULL default '',
  PRIMARY KEY  (id),
  KEY k (k,c)
) ENGINE=InnoDB AUTO_INCREMENT=8000001 DEFAULT CHARSET=latin1

There were 8 tables in the database with 8M rows per table. Tests were run for 1 thread with 1 table and 8, 16, 32 threads with 8 tables. The test at each concurrency level was run for 10 minutes. Tests were run for a 1G (IO-bound) and 32G (cached) InnoDB buffer pool. For the cached test the tables were read into the buffer pool at test start. The updates/second rate from these tests had more variance than previous tests. I ran tests many times and report rates here that were typical. For the 1-thread tests that use 1 table, the database (1 table) is 2G. For the 8, 16 and 32 thread tests that use 8 tables the database (8 tables) is 16G.

results at 1 thread

The rates for a cached database and 1 thread match most previous results -- newer releases are much slower than older releases and MySQL 5.7.5 is about 1.5X worse than MySQL 5.0.85. The results for the IO-bound setup don't follow that trend. Newer releases do better. I regret that I did not have the time to explain the difference between 5.6.21 and 5.7.5.


results at many threads

MySQL 5.7.5 and 5.6.21 are much better than older releases. 5.6.21 does better than 5.7.5 at 8 and 16 threads while 5.7.5 is the best at 32 threads.



configuration

Tuesday, October 14, 2014

Sysbench IO-bound updates: MySQL 5.7 vs previous releases

I repeated the sysbench update-only test after reducing the InnoDB buffer pool to 1G. The test database is 16G so the test should be IO-bound. MySQL 5.7.5 is 10% worse than 5.0.85 at 1 thread and much better at 8+ threads.

The previous blog post has more details on the setup. The configuration is not crash safe as the doublewrite buffer and binlog are disabled and innodb_flush_log_at_trx_commit=2. This test continued to use direct IO for InnoDB but this test requires a high rate of reads from storage. The previous test did many writes but no reads. A fast flash device is used for storage.

results at 1-thread

Only 1 table is used so there is a 2G database and 1G buffer pool. This test is less IO-bound than the many threads tests reported in the next section. Two graphs are displayed below. The first has the updates/second rates and the second has the rates normalized to the rate from 5.0.85. MySQL 5.7.5 is about 10% worse than 5.0.85 and all servers do between 3000 and 4000 updates/second.
results at many threads

These tests uniformly distribute clients across the 8 tables. MySQL 5.7.5 is the best at 16 and 32 threads. MySQL 5.0.85 and 5.1.63 saturate at 8 threads.

Monday, October 13, 2014

Sysbench cached updates: MySQL 5.7 vs previous releases

I used MySQL 5.7.5 to repeat tests I did for MySQL 5.7.2 and 5.6.10 with the sysbench client and a cached database to compare performance for low and high concurrency. My configuration was optimized for throughput rather than crash safety. The performance summary:
  • MySQL 5.7.5 is 1.47X slower than 5.0.85 at 1 thread
  • MySQL 5.7.5 is a bit slower than 5.6.21 at 8 and 16 threads
  • MySQL 5.7.5 is faster than others at 32 threads
setup

I used my sysbench 0.4 fork and set the value of the c column to 0 in the sbtest tables as that column is incremented during the test. The test used 8 tables with 8M rows per table (about 16G total) and the InnoDB buffer pool was 32G. One table was used for the 1 thread case. For the other tests the clients were uniformly distributed across the 8 tables. All data was in the buffer pool at test start. The test was run for 10 minutes at each concurrency level (1, 8, 16, 32 threads). Throughput appeared to be stable throughout the test.

Tests were run for MySQL versions 5.0.85, 5.1.63, 5.6.21 and 5.7.5. Tests were repeated for 4k and 8k innodb_page_size with MySQL 5.7.5 (see -16k, -8k and -4k below). Tests were also repeated with changes to innodb_flush_neighbors (=0), innodb_write_io_threads (=8), innodb_page_cleaners (=4) and innodb_purge_threads (=4). Results for those changes are not provided here as there wasn't an impact on QPS. The change to flush_neighbors reduced the InnoDB page write rate, which is a good thing.

results

The regression from 5.0.85 to recent versions is visible in the single-thread results below.
MySQL 5.7.5 is the champ at 32 threads but is slightly worse than 5.6.21 at 8 and 16 threads. Recent versions are much better than old versions.

configuration

I used this my.cnf for MySQL 5.7.5 and then something similar for other versions:
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 
table-definition-cache=1000
table-open-cache=2000
table-open-cache-instances=8
max_connections=20000
key_buffer_size=200M
metadata_locks_hash_instances=256 
query_cache_size=0
query_cache_type=0
server_id=9
performance_schema=0
binlog_format=row
skip_log_bin
innodb_buffer_pool_instances=8
innodb_io_capacity=32000
innodb_lru_scan_depth=4000
innodb_checksum_algorithm=CRC32
innodb_thread_concurrency=0
innodb_buffer_pool_size=32G
innodb_log_file_size=1900M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table

This was the command line for one of the tests:
./sysbench --batch --batch-delay=10 --test=oltp --mysql-host=127.0.0.1 --mysql-db=test --oltp-table-size=8000000 --max-time=600 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-table-name=sbtest1 --oltp-skip-trx --oltp-test-mode=incupdate --oltp-dist-type=uniform --oltp-range-size=10 --oltp-connect-delay=0 --percentile=99 --num-threads=1 --seed-rng=1413171854 run

Sunday, October 12, 2014

Page read performance: MySQL 5.7 vs previous releases

For this test I used sysbench to determine the peak rate for disk reads per second from InnoDB with fast storage. Fast storage in this case is the OS filesystem cache. I want to know the peak for MySQL 5.7.5 and whether there is a regression from 5.6. My summary is:
  • 5.7.5 is 12% to 15% worse than 5.6.21 at 1 thread. I hope we can fix this via bug 74342.
  • 5.7.5 is 2% to 4% worse than 5.6.21 at 8, 16 and 32 threads
  • 5.6/5.7 are much better than 5.0 especially at high concurrency
  • Page size (16k, 8k, 4k) has a big impact on the peak only when compression is used. 
setup

The test server has 40 hyperthread cores and is shared by the sysbench client and mysqld server.

I used my sysbench 0.4 fork configured to fetch 1 row by primary key via SELECT and HANDLER statements. The InnoDB buffer cache was 1G with innodb_flush_method="" (buffered IO). There were 8 tables in the test database with 16M rows per table.  The test database was either 32G uncompressed or 16G compressed so tests with compression had a better InnoDB buffer pool hit rate. Tests were run with 1, 8, 16 and 32 threads. The 1 thread test was limited to 1 table (2G or 4G of data) so this benefits from a better InnoDB buffer pool hit rate. The 8, 16 and 32 thread tests uniformly distribute clients across 8 tables.

Tests were done for MySQL 5.0.85, 5.6.21 and 5.0.85. All versions were tested for 16k pages without compression (see -16k-c0 below). Some MySQL 5.7.5 tests were repeated for 8k and 4k pages (see -4k- and -8k- below). Some MySQL 5.7.5 tests were repeated for 2X compression via zlib (see -c1- below).

results at 1 thread

The results at 1 thread show that peak QPS for 5.7.5 is 12% to 15% worse than for 5.6.21. I hope this can be fixed. I assume the problem is a longer code path from new features in 5.7.

results at 8 threads

The difference between 5.6.21 and 5.7.5 is smaller than at 1 thread. I assume this is the benefit from less mutex contention in 5.7.
results at 16 threads

The difference between 5.6.21 and 5.7.5 is smaller than at 1 thread. I assume this is the benefit from less mutex contention in 5.7.

results at 32 threads

The difference between 5.6.21 and 5.7.5 is smaller than at 1 thread. I assume this is the benefit from less mutex contention in 5.7.


configuration

I used this my.cnf for 5.7.5 and modified it for 5.6:
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 
table-definition-cache=1000
table-open-cache=2000
table-open-cache-instances=8
max_connections=20000
key_buffer_size=200M
metadata_locks_hash_instances=256 
query_cache_size=0
query_cache_type=0
server_id=9
performance_schema=0
binlog_format=row
skip_log_bin
innodb_buffer_pool_instances=8
innodb_io_capacity=32000
innodb_lru_scan_depth=4000
innodb_checksum_algorithm=CRC32
innodb_thread_concurrency=0
innodb_buffer_pool_size=1G
innodb_log_file_size=1900M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
# innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
datadir=...

This is an example command line for a test that uses SELECT and 1 thread per table. My testing script starts 8 sysbench clients, one per table. The value for --seed-rng different for every sysbench client (function of current time and table number 1 to 8) to access different keys each run:
./sysbench --batch --batch-delay=10 --test=oltp --mysql-host=127.0.0.1 --mysql-db=test --oltp-table-size=16000000 --max-time=120 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-table-name=sbtest1 --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-all-cols --oltp-dist-type=uniform --oltp-range-size=10 --oltp-connect-delay=0 --percentile=99 --num-threads=1 --seed-rng=1413134256 run

Friday, October 10, 2014

Low-concurrency performance for updates with InnoDB: MySQL 5.7 vs previous releases

First I tested updates with the Heap engine as a base case and now I report on InnoDB to find regressions that are specific to it. The workload is described in the previous blog post. Response time is measured at 1, 4 and 32 threads for two types of updates. The unindexed update increments an unindexed column. The indexed update updates a column that is in a secondary index. Tests are run with the binlog enabled and disabled. Tests are done for MySQL versions 5.0.85, 5.1.63, 5.5.40, 5.6.21 and 5.7.5. In some cases tests are also done with the adaptive hash index disabled. The test database has 64,000 rows but all updates are to one row (lots of contention). The client is mysqlslap running on the same host as mysqld. The test server has 40 hyperthread cores.

For the single-thread update unindexed workload MySQL 5.7.5 is about 1.6X slower than 5.0.85. This is similar to what I reported in previous posts. For the single-thread updated indexed workload MySQL 5.7.5 is about 4X slower than 5.0.85 and much slower than 5.6.21. This is a regression for which I created bug 74325. This and a few other bugs might be motivation to get similar tests into the upstream release process.

The binaries below are tagged with -noahi, -nobl and -bl. The -noahi binary has the binlog and adaptive hash index disabled. The -nobl binary has the binlog disabled. The -bl binary has the binlog enabled.

update indexed, binlog off

Result summary:
  • at 1 thread: 5.7.5 is 3.8X slower, 5.6.21 is 1.43X slower
  • at 4 threads: 5.7.5 is 3.88X slower, 5.6.21 is slightly faster
  • at 32 threads: 5.7.5 is 1.96X slower, 5.6.21 is faster

update indexed, binlog on

Result summary:
  • at 1 thread: 5.7.5 is 4X slower, 5.6.21 is 1.37X slower
  • at 4 threads: 5.7.5 is 2.1X slower, 5.6.21 is 1.09X slower
  • at 32 threads: 5.7.5 is 2.26X slower, 5.6.21 is faster