Work in progress sofree.us 2025Q3 Ceph handson

From FnordWiki
Revision as of 13:53, 4 June 2025 by Adj (talk | contribs) (→‎Hardware available)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Timeframe

Per the title, Q3 of 2025. Some Saturday. Probably August or September.

Goals

Learners will install/configure a working Ceph cluster and gain some knowledge of its components. The hard way. No automatic container deployment or similar.

  • Generate a cluster UUID
  • Select server IP addresses
    • A quick diversion into public and cluster (private) networks as used by Ceph
  • Install Ceph application packages
  • Create a very basic cluster config file
  • use monmaptool to create a single entry cluster monitor map (monmaptool --create --fsid UUID-FROM-ABOVE
  • deploy a single monitor for the cluster (ceph-mon --mkfs ...)
  • Start the monitor up, check on cluster state
  • Add additional monitors on other nodes
  • Install manager components
  • Add OSD components (At last, some actual storage!)
  • Create some storage pools.
  • On another (client) system, make use of some of that storage. RBD is the easiest to demonstrate. RGW needs additional components.
  • Choose your own adventure:
    • RGW, the Ceph S3 object gateway (and also OpenStack Swift)
      • Install radosgw
      • Create a user (radosgw-admin)
      • Make sure it is available over the internet (this is going to be pre-work on Aaron's part)
      • Install a client program
      • Create a bucket
      • Put some objects
      • List objects
      • Read object from the S3 storage. Compare with original.
    • CephFS
      • Install MDS components
      • Create a filesystem
      • Create some client identities
      • Put client keyring on a client server
      • Mount the filesystem and start writing files
  • Stretch goals
    • Replicate RBD data from one cluster to another
    • Replicate RGW data from one cluster to another
    • Replicate CephFS data from one cluster to another

Hardware available

The following is at Aaron's home data center, currently unused:

  • 6x Dell R630 servers, each with:
    • 2x Intel Xeon E5-2630v4 processors
    • 64 Gibytes of RAM
    • 2x 800Gbyte SAS SSDs (for OS, applications, and Ceph Bluestore WAL/DB devices for spinning disks)
    • 2x 1Tbyte SATA SSDs (Ceph cluster data disks)
    • 2x 1.8Tbyte SAS HDDs (Ceph cluster data disks)
    • 1x 2-port Mellanox 56Gbits/sec Ethernet card (and sufficient switch ports and cabling for these)
    • 2x Intel 10GbaseT twisted pair Ethernet ports (I do not have switch ports for 10GbaseT, but have plenty of 1000baseT)
    • 2x Intel 1000baseT twisted pair Ethernet ports (and sufficient switch ports and cabling)
    • iDRAC remote console, power control, etc
  • 3x Dell R720XD servers, each with:
    • 2x (1x?) Intel Xeon E5-26?? processors
    • 32 Gibytes of RAM
    • 2x 120Gbyte SATA SSDs (rear-mounted) for the OS and applications
    • 6 or 7x 3Tbyte SAS HDDs (Ceph cluster data disks)
    • 2x 400Gbyte SAS SSDs (Ceph Bluestore WAL/DB devices)
    • 2x 10Gbits/sec SFP+ (fiber optic) Ethernet adapters (and sufficient switch ports and cabling for these, but might need Intel branded SFP+ transceivers)
    • 2X 1000baseT twisted pair Ethernet ports (and sufficient switch ports and cabling)
    • iDRAC remote console, power control, etc

Aaron has a friend who is at least a bit interested in participating, too. Friend has some servers that could also be used:

  • 6x Dell PowerEdge R720
    • 2x Intel Xeon E5-26?? processors
    • 256 Gibytes of RAM
    • unknown storage
    • unknown networking
    • iDRAC remote console, power control, etc

Potential Cluster end states

  • Option 1: A single cluster with all 9 servers as members. This would probably make most sense with only 2 or 3 learners.
  • Option 2: Two clusters. One with mixed hardware consisting of 3x R630 servers and 3x R720 servers. And a second cluster of 3x R630 servers. Makes sense with 4-6 learners in groups of 2 or 3. And demonstrates Ceph's flexibility in building from non-identical servers.
  • Option 3: Two clusters. But this time all nodes in each cluster are similar. 6x R630s in one cluster. And 3x R720s in the second cluster. Makes sense with 4-6 learners in groups of 2 or 3.
  • Option 4: Three clusters. Each cluster with consistent hardware. Makes most sense with a third group of learners.