In transit data security: Difference between revisions
| Line 55: | Line 55: | ||
** Copy the current versions of the <code>fullchain.pem</code> and <code>privkey.pem</code> files into the PKI directory |
** Copy the current versions of the <code>fullchain.pem</code> and <code>privkey.pem</code> files into the PKI directory |
||
And just assume that the 15 minute key and certificate reload in <code>radosgw</code> works as documented. Well, for now. If testing shows that it does not work as documented, take some other action in the future as needed. |
And just assume that the 15 minute key and certificate reload in <code>radosgw</code> works as documented. Well, for now. If testing shows that it does not work as documented, take some other action in the future as needed. |
||
And more sadness. Maybe even bitterness? The documentation at https://docs.ceph.com/en/squid/radosgw/frontends/ (the Squid version at least) regarding the <code>ssl_reload</code> configuration option does not actually match reality. Here's what happens when searching for the text <code>ssl_reload</code> in an unpacked Ceph 19.2.3 source tree: |
|||
$ grep -Ril ssl_reload . 2> /dev/null |
|||
$ du -sm . # make sure we're actually looking at an unpacked source tarball |
|||
1270 . |
|||
$ |
|||
Yep. It is not there. At all. It seems that perhaps the <code>ssl_reload</code> option was implemented back in the civetweb days. But it is not there. So there's a fallback plan. Send the running <code>radosgw</code> process a SIGHUP. That will reload the config, reopen logs, etc, right? Well, that does not work either. This does seem to have been recognized as a bug by upstream Ceph team. And a fix seems to be in the backports queue for both Ceph Squid (19.2.y) at https://tracker.ceph.com/issues/73704 and Tentacle (20.2.y) at https://tracker.ceph.com/issues/73703. So it might be fixed eventually. In the mean time, though, the hook script will need to restart the <code>ceph-radosgw@radosgw.''shorthostname''.service</code> unit. Sigh. |
|||
=== hook script for Dovecot imapd === |
=== hook script for Dovecot imapd === |
||
Latest revision as of 19:07, 7 March 2026
All of the Fnordly web properties (this wiki, webmail, OpenStack API endpoints, etc) are hosted behind HAproxy daemon(s) running on the Internet facing firewall machines. While it would be nice to say that things are all covered by HAproxy doing TLS termination for us, defense in depth principles demand that all traffic that can be encrypted be encrypted. As such, traffic between the HAproxy endpoints and the internal web services is encrypted too, and web service identities are established with Let's Encrypted x509 certificates.
I should put a pretty picture here to make understanding a bit easier, but my graphical skills are extremely limited. As such, text will have to suffice for now.
Vision Statement
All conceivably TLSed traffic should be TLSed in transit and authenticated by valid (not expired) Let's Encrypt certificates. No ongoing manual certificate management should be needed. And private key + certificate reloading handled automatically as well. Tin foil hat on!
Inside network TLS encrypted services list
- All the HTTP things
- Static web pages
- Mediawiki content
- Ceph rados gateway S3 and Swift services
- Webmail
- OpenStack API endpoints
- Probably a few others
- IMAP
- SMTP
- probably missing an item or two here
Tools to be used
Lots of people like to hate on certbot, and probably for really good reasons. I have been successfully using it for a number of years, though. And intend to, for now, continue doing so. Apache HTTPD for the static web pages, MediaWiki, Roundcube (webmail), and other PHP applications. Postfix for SMTP. Dovecot for IMAP. uWSGI for the OpenStack API endpoints. Ceph radosgw for S3.
Let's Encrypt domain name ownership validation with be done by DNS-01 challenges. This allows a certificate requestor that is not directly accessible by Let's Encrypt's ACME validation servers to still request a certificate. Additionally, it allows for the issuance of wildcard certificates.
DNS contortions
As mentioned above, HAproxy on the internet facing firewalls will be doing the TLS termination for web (and other) clients. And it then, in turn, contacts the network-internal servers (be they VMs or bare metal) to actually get data for clients. The DNS-01 ACME challenge method requires that Let's Encrypt's servers be able to check a publicly visible DNS record to prove ownership. So for that to happen, the internal machine contacts an ACME server, gets a value to publish in the DNS, sends an RFC 2136 DNS update to the zone primary server, waits some time for it to be distributed to the secondaries, then asks the ACME server to check. If the ACME server can see the expected value in the external DNS, then it should issue a certificate.
But, I don't really want to publish the internal DNS zones to the world at large. At the same time, HAproxy will be validating certificates based on domain names. In order to minimize internal DNS information leakage, I have settled on creating a svc.fnord.greeley.co.us zone which exists in both the internal and external DNS views. The ACME requestor systems have credentials allowing updates to this zone. The external version of this zone is configured as an in-view zone in the BIND configuration so there is only one copy in the named process's memory. Updates to the interval view are seen immediately in the external view. And DNS NOTIFYs are sent to the secondary name servers as zone data updates are made.
Remember that "value to publish" two paragraphs up? It is put into a new DNS record that prefixes the requesting server's DNS name with _acme-challenge.. And, important to know... DNS zones are arranged in a hierarchy, but the records inside a zone are not. So inside a zone like svc.fnord.greeley.co.us, server-0.svc.fnord.greeley.co.us and _acme-challenge.server-0.svc.fnord.greeley.co.us exist independently. The latter does not depend on the former. This allows us to make server-0.svc.fnord.greeley.co.us a CNAME for server-0.internal.fnord.greeley.co.us and let the _acme-challenge.server-0.svc.fnord.greeley.co.us be visible to the outside world. (If a DNS label has a CNAME value attached, that label is not allowed to have any other vales attached.) As such, HAproxy (which can see the internal.fnord.greeley.co.us zone) can find the IP address (and other) information for server-0.svc.fnord.greeley.co.us by follwing the CNAME. But DNS clients outside the network cannot. Neat, right?
certbot installation and configuration
Packages are installed and configured by Salt states here. Salt minions that will be requesting certificates have the letsencrypt-certificate-requestor role pillar. And they also have a letsencrypt pillar listing domains (and parameters for them) for which they are requesting certificates.
As part of the package installation Salt state, a Let's Encrypt account is also registered by the ACME client machine, unless an account is already found.
Certificate enrollment
This is also performed by a Salt state. If the Salt minion has the letsencrypt-certificate-requestor role assigned, the following happens:
- The
/etc/letsencrypt/privatedirectory is created if missing and its permissions set to root, root, 0700. - For each domain listed in the minion's
letsencrypt:domainspillar value:- A credentials file is stored under
/etc/letsencyrpt/private. This file has the DNS TSIG secret necessary to make updates to the zone. - Unless certbot already knows there is a certificate for the domain, a certificate is requested using the DNS-01 ACME challenge method.
- A credentials file is stored under
certbot hook scripts
certbot requests certificates from Let's Encrypt. And it generates private keys, too. For TLS secured communications to happen, an endpoint needs its private key, its own ("leaf") certificate, and any intermediate certificates for certificate authorities (CAs) between the leaf and a trusted root CA. Different applications handle these in their own ways. So we may need to frobnicate the certbot PEM files or the applications' configuration to suit.
hook script for Ceph radosgw
In theory, nothing is needed here. The Ceph radosgw configuration points at the Let's Encrypt live key (privkey.pem) and leaf cert + intermediate cert (fullchain.pem) file. The radosgw configuration is set to re-load the key and certificates every 15 minutes. As such, cert renewals are handled without any additional steps needed. Unfortunately, reality is a bit more complicated. The radosgw process runs as a non-privileged ceph user, which is not be able to read the private key and certificate chain files in /etc/letsencrypt. Even worse, it starts a root, reads private key and certificate, then drops privileges.
So, it seems a hook script will be required for this afterall. And it should do this:
- Exit if there were no certificates renewed
- For each domain that was renewed, check to see if it is one that we care about, and if so:
- Make sure there is a directory for the RGW process to pick up its PKI files under
/var/lib/ceph/radosgw/clustername-radosgw.shorthostname/ - Make sure said directory is sufficiently private
- Copy the current versions of the
fullchain.pemandprivkey.pemfiles into the PKI directory
- Make sure there is a directory for the RGW process to pick up its PKI files under
And just assume that the 15 minute key and certificate reload in radosgw works as documented. Well, for now. If testing shows that it does not work as documented, take some other action in the future as needed.
And more sadness. Maybe even bitterness? The documentation at https://docs.ceph.com/en/squid/radosgw/frontends/ (the Squid version at least) regarding the ssl_reload configuration option does not actually match reality. Here's what happens when searching for the text ssl_reload in an unpacked Ceph 19.2.3 source tree:
$ grep -Ril ssl_reload . 2> /dev/null $ du -sm . # make sure we're actually looking at an unpacked source tarball 1270 . $
Yep. It is not there. At all. It seems that perhaps the ssl_reload option was implemented back in the civetweb days. But it is not there. So there's a fallback plan. Send the running radosgw process a SIGHUP. That will reload the config, reopen logs, etc, right? Well, that does not work either. This does seem to have been recognized as a bug by upstream Ceph team. And a fix seems to be in the backports queue for both Ceph Squid (19.2.y) at https://tracker.ceph.com/issues/73704 and Tentacle (20.2.y) at https://tracker.ceph.com/issues/73703. So it might be fixed eventually. In the mean time, though, the hook script will need to restart the ceph-radosgw@radosgw.shorthostname.service unit. Sigh.
hook script for Dovecot imapd
Very little is needed here. Dovecot child processes are started as root, so no making private key files that are readable by the less-privileged dovecot user. Just need to systemctl reload dovecot.service when a new cert and key are symlinked into the /etc/letsencrypt/live/dns_svc_domain/ directory, assuming that the Dovecot config file (probably /etc/dovecot/conf.d/10-ssl.conf) points to the correct place.
certbot and/or Let's Encrypt annoyances
- Running
certbot --help securitysuggests that any elliptic curve supported in TLS 1.3 (defined in RFC 8446) should work for certificate creation. However, the Let's Encrypt ACME servers only support secp256r1 and secp384r1. (Tested empirically 2026-01-24.)