This has results for an IO-bound sysbench benchmark on a 48-core server for Postgres versions 12 through 18. Results from a CPU-bound sysbench benchmark on the 48-core server are here.
tl;dr - for Postgres 18.1 relative to 12.22
- QPS for IO-bound point-query tests is similar while there is a large improvement for the one CPU-bound test (hot-points)
- QPS for range queries without aggregation is similar
- QPS for range queries with aggregation is between 1.05X and 1.25X larger in 18.1
- QPS for writes show there might be a few large regressions in 18.1
- for tests that do long range queries without aggregation
- the best QPS is from io_method=io_uring
- the second best QPS is from io_method=worker with a large value for io_workers
- for tests that do long range queries with aggregation
- when using io_method=worker a larger value for io_workers hurt QPS in contrast to the result for range queries without aggregation
- for most tests the best QPS is from io_method=io_uring
- an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
- 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
- 128G RAM
- Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic)
- the config file is named conf.diff.cx10a_c32r128 (x10a_c32r128) and is here for versions 12, 13, 14, 15, 16 and 17.
- for Postgres 18 I used
- conf.diff.cx10b_c32r128 (x10b_c32r128)
- uses io_method=sync and is similar to the config used for versions 12 through 17.
- conf.diff.cx10c_c32r128 (x10c_c32r128)
- uses io_method=worker and io_workers is not set
- conf.diff.cx10cw8_c32r128 (x10cw8_c32r128)
- uses io_method=worker and io_workers=8
- conf.diff.cx10cw16_c32r128 (x10cw8_c32r128)
- uses io_method=worker and io_workers=16
- conf.diff.cx10cw32_c32r128 (x10cw8_c32r128)
- uses io_method=worker and io_workers=32
- conf.diff.cx10d_c32r128 (x10d_c32r128)
- uses io_method=io_uring
The read-heavy microbenchmarks are run for 600 seconds and the write-heavy for 900 seconds. The benchmark is run with 40 clients and 8 tables with 250M rows per table. With 250M rows per table this is IO-bound. I normally use 10M rows per table for CPU-bound workloads.
I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for base version)
- base version is Postgres 12.22
- compare 12.22, 13.23, 14.20, 15.15, 16.11, 17.7 and 18.1
- the goal for this is to see how performance changes over time
- per-test results from vmstat and iostat are here
- base version is Postgres 18.1
- compare 18.1 using the x10b_c32r128, x10c_c32r128, x10cw8_c32r128, x10cw16_c32r128, x10cw32_c32r128 and x10d_c32r128 configs
- the goal for this is to understand the impact of the io_method option
- per-test results from vmstat and iostat are here
- a large improvement for the hot-points test arrives in 17.x. While most tests are IO-bound, this test is CPU-bound because all queries fetch the same N rows.
- for other tests there are small changes, both improvements and regressions, and the regressions are too small to investigate
- QPS for Postgres 18.1 is within 5% of 12.22, sometimes better and sometimes worse
- for Postgres 17.7 there might be a large regression on the scan test and that also occurs with 17.6 (not shown). But the scan test can be prone to variance, especially with Postgres and I don't expect to spend time debugging this. Note that the config I use for 18.1 here uses io_method=sync which is similar to what Postgres uses in releases prior to 18.x. From the vmstat and iostat metrics what I see is:
- a small reduction in CPU overhead (cpu/o) in 18.1
- a large reduction in the context switch rate (cs/o) in 18.1
- small reductions in read IO (r/o and rKB/o) in 18.1
- QPS for 18.1 is between 1.05X and 1.25X better than for 12.22
- there might be large regressions for several tests: read-write, update-zipf and write-only, The read-write tests do all of the writes done by write-only and then add read-only statements.
- from the vmstat and iostat results for the read-write tests I see
- CPU (cpu/o) is up by ~1.2X in PG 16.x through 18.x
- storage reads per query (r/o) have been increasing from PG 16.x through 18.x and are up by ~1.1X in PG 18.1
- storage KB read per query (rKB/o) increased started in PG 16.1 and are 1.44X and 1.16X larger in PG 18.x
- from the vmstat and iostat results for the update-zipf test
- results are similar to the read-write tests above
- from the vmstat and iostat results for the write-only test
- results are similar to the read-write tests above
- results are similar for all configurations and this is expected
- there are two charts, the y-axis is truncated in the second to improve readability
- all configs get similar QPS for all tests except scan
- for the scan test
- the x10c_c32r128 config has the worst result. This is expected given there are 40 concurrent connections and it used the default for io_workers (=3)
- QPS improves for io_method=worker with larger values for io_workers
- io_method=io_uring has the best QPS (the x10d_c32r128 config)
- when using io_method=worker a larger value for io_workers hurt QPS in contrast to the result for range queries without aggregation
- io_method=io_uring gets the best QPS on all tests except for the read-only tests with range=10 and 10,000. There isn't an obvious problem based on the vmstat and iostat results. From the r_await column in iostat output (not shown) the differences are mostly explained by a change in IO latency. Perhaps variance in storage latency is the issue.
- the best QPS occurs with the x10b_c32r128 config (io_method=sync). I am not sure if that option matters here and perhaps there is too much noise in the results.

.png)





.png)

.png)












