Work in progress sofree.us 2025Q3 Ceph handson

From FnordWiki
Jump to navigation Jump to search

Timeframe

Per the title, Q3 of 2025. Some Saturday. Probably August or September.

Goals

Learners will install/configure a working Ceph cluster and gain some knowledge of its components. The hard way. No automatic container deployment or similar.

  • Generate a cluster UUID
  • Select server IP addresses
    • A quick diversion into public and cluster (private) networks as used by Ceph
  • Install Ceph application packages
  • Create a very basic cluster config file
  • use monmaptool to create a single entry cluster monitor map (monmaptool --create --fsid UUID-FROM-ABOVE
  • deploy a single monitor for the cluster (ceph-mon --mkfs ...)
  • Start the monitor up, check on cluster state
  • Add additional monitors on other nodes
  • Install manager components
  • Add OSD components (At last, some actual storage!)
  • Create some storage pools.
  • On another (client) system, make use of some of that storage. RBD is the easiest to demonstrate. RGW needs additional components.
  • Choose your own adventure:
    • RGW, the Ceph S3 object gateway (and also OpenStack Swift)
      • Install radosgw
      • Create a user (radosgw-admin)
      • Make sure it is available over the internet (this is going to be pre-work on Aaron's part)
      • Install a client program
    • CephFS
      • Install MDS components
      • Create a filesystem
      • Create some client identities
      • Put client keyring on a client server
      • Mount the filesystem and start writing files
  • Stretch goals
    • Replicate RBD data from one cluster to another
    • Replicate RGW data from one cluster to another
    • Replicate CephFS data from one cluster to another

Hardware available

The following is at Aaron's home data center, currently unused:

  • 6x Dell R630 servers, each with:
    • 2x Intel Xeon E5-2630v4 processors
    • 64 Gibytes of RAM
    • 2x 800Gbyte SAS SSDs (for OS, applications, and Ceph Bluestore WAL/DB devices)
    • 4x 1.8Tbyte SAS HDDs (Ceph cluster data disks)
    • 1x 2-port Mellanox 56Gbits/sec Ethernet card (and sufficient switch ports and cabling for these)
    • 2x Intel 10GbaseT twisted pair Ethernet ports (I do not have switch ports for 10GbaseT, but have plenty of 1000baseT)
    • 2x Intel 1000baseT twisted pair Ethernet ports (and sufficient switch ports and cabling)
    • iDRAC remote console, power control, etc
  • 3x Dell R720XD servers, each with:
    • 2x (1x?) Intel Xeon E5-26?? processors
    • 32 Gibytes of RAM
    • 2x 120Gbyte SATA SSDs (rear-mounted) for the OS and applications
    • 6 or 7x 3Tbyte SAS HDDs (Ceph cluster data disks)
    • 2x 400Gbyte SAS SSDs (Ceph Bluestore WAL/DB devices)
    • 2x 10Gbits/sec SFP+ (fiber optic) Ethernet adapters (and sufficient switch ports and cabling for these)
    • 2X 1000baseT twisted pair Ethernet ports (and sufficient switch ports and cabling)
    • iDRAC remote console, power control, etc

Potential Cluster end states

  • Option 1: A single cluster with all 9 servers as members. This would probably make most sense with only 2 or 3 learners.
  • Option 2: Two clusters. One with mixed hardware consisting of 3x R630 servers and 3x R720 servers. And a second cluster of 3x R630 servers. Makes sense with 4-6 learners in groups of 2 or 3. And demonstrates Ceph's flexibility in building from non-identical servers.
  • Option 3: Two clusters. But this time all nodes in each cluster are similar. 6x R630s in one cluster. And 3x R720s in the second cluster. Makes sense with 4-6 learners in groups of 2 or 3.
  • Option 4: Three clusters. Each cluster with consistent hardware. Makes most sense with a third group of learners.