Ceph performance metrics
Jump to navigation
Jump to search
This has come up because the Sentinel 1 endpoint detection and response (EDR) agent is being installed across all our servers. In order to minimize potential customer impact we will:
- Divide servers into three groups based on client IO sensitivity. Purely development environments being low sensitivity, and certain database workload being highly sensitive. S3 workloads will probably fall in the middle.
- In each group, before S1 agent is installed and running, gather some baseline metrics for 3 random cluster member servers (OSD and other services) including the following:
/usr/bin/sar
, specifically looking at CPU (%system and %idle especially) and memory usage (%memused and active memory)
- In each cluster, before S1 agent is deployed, measure the cluster's overall performance:
rados bench -p rados_bench 300 write -t 8 --object_size=4MB --no-cleanup
is the Ceph tool used for this. It exercises the RADOS layer, not client access. This will decrease cluster client IO while it is running, so is important to be mindful of customer impact. As explanation, this command will create 8 threads, each writing 4MiByte RADOS objects into therados_bench
pool for five minutes (300 seconds.) When the run is complete, record the bandwidth, IOPS, and latency numbers.- Do a read benchmark with the same settings as above:
rados bench -p rados_bench 300 read -t 8 --object_size=4MB
- Client benchmarks to be run now:
fio
can be run to measure iSCSI client systems' perceived performance. Again, this will have an impact on other customers' use of the clusters.- S3 performance can be established by uploading and downloading largish objects to a cluster's S3 endpoints. Use any of the AWS SDK CLI tools,
s3cmd
, ormc
(Minio client) for this.
- Let the S1 agent be installed, repeat the benchmarks. Any decrease of benchmark values of more than 5% should be investigated and a cause determined.
- When installation and benchmarking are completed on the low sensitivity servers, proceed to the next group and repeat.
References: