| I both agree and don't agree about your comments.
Benchmarks should be a comparison and one can very well do a comparison between exactly same deployment on exactly same infrastructure with 2 different storage types without going so deep into the weeds. It is crucial to understand the environment of the actual benchmark, but many of the things you mention are less important unless you want to investigate what actually is going on under the hood (hoping to improve something). Also note that to many people looking to run database workloads on K8s / CEPH, knowing that someone was able to run with 18k TPS without pulling rabbits from their sleeves is much helpful, and people asking all of these details basically makes people less willing to share, which is not helpful at all. Be that as it may, as mentioned on another thread, we ran benchmarks on Premise/open Shift / CEPH, and I will try to answer as much of your questions as possible on these benchmarks. If you want more details, LMK...
* Stack is: Openshift - RBD - Network - CEPH node - VMWare VMDK - SAN storage
* Network (AFAIK) is 10g, I haven't tested network latency or storage latency, but the roundtrip for a commit (which pg_bench and pg_tps_optimizer call latency) took about is 30ms running 233 clients / 17k TPS.
* no fancy stuff like reliable Ethernet that's used for iSCSI / NVMe over IP or something like that * I mostly ran with pg_tps_optimizer and it is designed to test storage performance (not performance from app perspective) the way it works things like shared buffer size is less important. But FYI, I ran with 2GB for cluster.spec.resources.limits.memory. * What is the layout of memory buffers in PostgreSQL?
I don't understand what you are trying to get at. Running on K8s, you should trust the operator to deploy as smart as possible and not worry about stuff like this unless you are trying to actually investigate and fix problems. I ran with standard settings. * I tested with many options including. Single instance, async (with synchronous_commit is remote_write, on, remote_apply) and sync (remote_write, on, remote_apply). These tests where run on Azure VM, but I am fairly sure running on OpenShift/CEPH does not impact that much. Biggest difference with 13 clients, 12/13k TPS with sync and 17/18k TPS with async. Difference is smaller with higher number of clients. As the effect is larger with smaller number of clients, probably the effect is less severe with openshift/ceph. * AFAIK we CEPH set to keep 3 replicas. TBH, I don't see how this is of much importance. CEPH RBD kernel driver writes to both replicas in parallel. Doing more in parallel has little impact on latency and bandwidth is not the issue. * I don't know the Block sizes and frame sizes for sure. I expect it is default settings (4096). * Type of workload. Yeah, this is important stuff.
First of all, about pg_tps_optimizer. I have the most interesting information with pg_ts_optimizer. It basically runs update statements on a record in a table, and with 233 clients this is 233 tables. This really tests storage performance (we rule out things like semaphore locks). This might be compared to importing data with a separate client (which could run in parallel) for every table (or partition if you like).
With pg_bench (default workload), we see similar graphs, but we see limitations with pg_bench with higher number of clients. As all data is in the same table(s) with higher number of clients they run into contention issues (probably semaphore softlocks). As this is not a limitation of storage, I personally find this less interesting. |