Adj at 14:28, 24 July 2025

2025-07-24T14:28:02Z

← Older revision		Revision as of 14:28, 24 July 2025
Line 11:		Line 11:
	* Let the S1 agent be installed, repeat the benchmarks. Any decrease of benchmark values of more than 5% should be investigated and a cause determined.		* Let the S1 agent be installed, repeat the benchmarks. Any decrease of benchmark values of more than 5% should be investigated and a cause determined.
	* When installation and benchmarking are completed on the low sensitivity servers, proceed to the next group and repeat.		* When installation and benchmarking are completed on the low sensitivity servers, proceed to the next group and repeat.

			References:
			* [https://www.thomas-krenn.com/en/wiki/Ceph_Perfomance_Guide_-_Sizing_%26_Testing Ceph Performance Guide]
			* [https://www.cloudseedrive.com/benchmarking-amazon-s3-performance/ Benchmarking Amazon S3]
			* [https://documentation.alluxio.io/ee-ai-en/benchmark/cosbench COSBench (S3) Benchmark]

Adj: Created page with "This has come up because the Sentinel 1 endpoint detection and response (EDR) agent is being installed across all our servers. In order to minimize potential customer impact we will: * Divide servers into three groups based on client IO sensitivity. Purely development environments being low sensitivity, and certain database workload being highly sensitive. S3 workloads will probably fall in the middle. * In each group, before S1 agent is installed and running, gather..."

2025-07-24T14:25:07Z

Created page with "This has come up because the Sentinel 1 endpoint detection and response (EDR) agent is being installed across all our servers. In order to minimize potential customer impact we will: * Divide servers into three groups based on client IO sensitivity. Purely development environments being low sensitivity, and certain database workload being highly sensitive. S3 workloads will probably fall in the middle. * In each group, before S1 agent is installed and running, gather..."

New page

This has come up because the Sentinel 1 endpoint detection and response (EDR) agent is being installed across all our servers. In order to minimize potential customer impact we will:
* Divide servers into three groups based on client IO sensitivity. Purely development environments being low sensitivity, and certain database workload being highly sensitive. S3 workloads will probably fall in the middle.
* In each group, before S1 agent is installed and running, gather some baseline metrics for 3 random cluster member servers (OSD and other services) including the following:
** <code>/usr/bin/sar</code>, specifically looking at CPU (%system and %idle especially) and memory usage (%memused and active memory)
* In each cluster, before S1 agent is deployed, measure the cluster's overall performance:
** <code>rados bench -p rados_bench 300 write -t 8 --object_size=4MB --no-cleanup</code> is the Ceph tool used for this. It exercises the RADOS layer, not client access. This will decrease cluster client IO while it is running, so is important to be mindful of customer impact. As explanation, this command will create 8 threads, each writing 4MiByte RADOS objects into the <code>rados_bench</code> pool for five minutes (300 seconds.) When the run is complete, record the bandwidth, IOPS, and latency numbers.
** Do a read benchmark with the same settings as above: <code>rados bench -p rados_bench 300 read -t 8 --object_size=4MB</code>
* Client benchmarks to be run now:
** <code>fio</code> can be run to measure iSCSI client systems' perceived performance. Again, this will have an impact on other customers' use of the clusters.
** S3 performance can be established by uploading and downloading largish objects to a cluster's S3 endpoints. Use any of the AWS SDK CLI tools, <code>s3cmd</code>, or <code>mc</code> (Minio client) for this.
* Let the S1 agent be installed, repeat the benchmarks. Any decrease of benchmark values of more than 5% should be investigated and a cause determined.
* When installation and benchmarking are completed on the low sensitivity servers, proceed to the next group and repeat.

Ceph performance metrics - Revision history

Adj at 14:28, 24 July 2025