Automate certificate renewal via Let’s Encrypt on Avi/NSX ALB
This time I want to introduce the ability to request the famous Let’s Encrypt certificates on your NSX Advanced Load Balancer (also known as Avi Vantage) using comprehensive on-board functionality it offers.
I was improving, adding new features and testing the script for quite some time now, which got now reviewed and merged by Engineering on 31st of August 2021 on GitHub. One major change was to support the "Virtual Hosting" feature with Let’s Encrypt and also allow specifying multiple domains for the certificate (SANs).
Support for the script: If you have any troubles or questions regarding the script, I’d recommend raising a GitHub Issue. It’s easier to keep track of them and more people might be able to assist.
Why?
In the Controller you can manage all your SSL/TLS certificates at one central point. All those certificates can be used across your virtual services, running on different Service Engines. Obviously, before a certificate expires it should be renewed and replaced.
In the case of free, well-known public certificate authority Let’s Encrypt certificates are only valid 90 days, where the best-practise is to renew it every 30 days. Performing this step manually each month, is probably not the most exciting work.
This is where Certificate Management and the "ControlScript" in NSX ALB joins the party. This feature – by default – initiates a renewal 7 days before the certificate expiry. Or, in other words: Just right before the penultimate certificate expiry notification as configured. (For more information see Avi’s documentation for "Customizing Notification of Certificate Expiration" here)
With this feature issuing Let’s Encrypt certificates for Virtual Services can be completely automated.
Takeaway: To debug the certificate renewal it’s really handy to manually trigger the renewal via CLI. For more details, you can give this QuickTip a read.
Setup
To set up everything you need, follow below steps from your Avi Controller’s webinterface.
You need to do this once per tenant.
To note: Starting 21.1 (Release Notes), what I use, the webinterface has been rebuilt using VMware’s Clarity website framework going forward, hence previous versions do look differently. Beside the visual difference, the steps however should be pretty much the same.
Step 1: Check the requirements/disclaimer
- You need a public domain which is also publicly reachable.
- You need a
Virtual Service
with anApplication Profile
of typeHTTP
(so Layer 7). Issuing certificates to virtual services of type L4 do not work, as it requiresHTTP Policy Sets
to complete the validation.
Some more notes:
- You should be aware that Let’s Encrypt does do Rate-limitating based on IP address. If issuing certificates fails, you might want to enable
dryrun
, investigate the cause and keep retries to a minimum to not reach the limit. - Please read carefully through this guide. When done a few times, it’s way less complicated than it looks like.
- This is a community-supported script. So no official support is provided. If there’re issues, please consider raising a GitHub issue.
Step 2: Create ControlScript
- Go to
Templates - Scripts - ControlScripts
and hitCreate
. A dialog will open. - Open the LetsEncrypt renewal script in a separate tab: raw.githubusercontent.com/avinetworks/devops/master/cert_mgmt/letsencrypt_mgmt_profile.py. Copy the content in your clipboard.
- Pick a name like
request_letsencrypt_certificate
. - Paste the script, we copied right before, in the large textarea below.
- It will now look like this:
- Click
Save
to return to the previous dialog.
Step 3: Create "Certificate Management"
- Go to
Templates - Security - Certificate Management
and hitCreate
. A dialog will open. - As
Name
, pick something likeuse_letsencrypt
(Only alphanumeric, undersscore, period or hyphen characters are allowed) - At
Control Script
, select the script namedrequest_letsencrypt_certificate
we created in the previous step. - Enable
Enable Custom Script Parameters
. At a minimum, you need to add at least following two values: (Check additional parameters in the next step!)user
for the username used for the API calls. Can be a custom user (recommended), or some admin account.password
for the password, marked asSensitive
.- This account needs permissions to manage and change
SSL/TLS Certificates
via API/WebUI.
- Additionally to above parameters, there are more options you can and might need to define. I recommend to set most of these as
Dynamic
, so it can be changed on a certificate-basis individually.tenant
contains the name of the tenant to be used. If not specified,admin
will be used.dryrun
defaults toFalse
and production servers are used. IfTrue
set, the staging/test servers of LetsEncrypt will be used which have different ratelimiting settings.disable_check
determines if the token will be validated from the Avi Controller. See more details in Appendeix-section down below. Usually no change is needed.debug
defaults toFalse
. If set toTrue
, more debug messages are printed.contact
can be a e-mail address provided to Let’s Encrypt. Certificate expiry warnings will be sent to this email address.
- In the end, it will look like this:
Step 4: Request a certificate
- Go to
Templates - Security - SSL/TLS Certificates
, click onCreate - Application Certificate
. - As a
Name
you can pick something likepatrik.kernstock.net RSA
. - As the
Type
pick CSR. - As the
Certificate Management Profile
we pickuse-letsencrypt
, what we created in the previous step. (Make changes toDynamic Parameters
, if defined and required.) - As
Common Name
pick the FQDN to what the certificate should be issued to. For example:patrik.kernstock.net
. - Chose
Algorithm
andKey Size
as required. We can go forRSA
and3072 Bits
. - Add any
Subject Alternate Name (SAN)
, if required. This are additional domain names which should be included in the certificate. - Click
Save
. - When saved, the script will be run in the background and used to issue your certificate accordingly. This might take a few seconds. If it fails, you will see the output of the script with more details. (Note: Script output will only displayed starting 20.1.6)
If the script suceeds, you will see the recent issued certificate in the list:
Now your certificate is ready to use. Happy certificiating… or so…
Final notes
Notes
I don’t see an error!
If you don’t see any additional errors (e.g. when using older versions than 20.1.6), you can see more logs in the log file /var/lib/avi/log/portal_exception.log
on your Avi Controller. To check this log file, login to your controller via SSH and check the log file using less -i /var/lib/avi/log/portal_exception.log
or tail -f /var/lib/avi/log/portal_exception.log
.
Additionally you might want to set the custom parameter debug
to True
.
How to use RSA and ECDSA?
To issue both, a RSA and ECDSA certificate, you simply create two SSL/TLS Certificates
entries and chosing the Algorithm
down below accordingly. You can then define both certificates on your Virtual Service
:
Errors
Error: "All 5 internal token verifications failed."
(This also provides more details about the parameter disable_check
)
As described earlier in Step 3, point 5, the token verification can be disabled by setting the parameter disable_check
to True
.
To understand this parameter further, I’ll need to briefly explain the token validation of Let’s Encrypt:
- The ACME standard, what Let’s Encrypt invented, is used to automatically issue certificates and proof ownership. Of course noone wants to have other/evil people issue valid certificates for their domains.
- At first, we’re going to tell Let’s Encrypt – or the ACME server in general – for which domain we want a certificate issued. It gives us back a token Let’s Encrypt expects to see at a certain URL to proof ownership to them.
- The script then sets a
HTTP Policy
on the corresponding Virtual Service to return a specific token string at the URLhttp://our-domain-on-avi.tld/.well-known/acme-challenge/<TOKEN>
. - Before the script tells Let’s Encrypt to verify the token, the Avi Controller (the script to be precise) makes a HTTP call to above URL and validate the token locally. This keeps us from getting rate-limited to quick in case validation fails.
- If the validation suceeds, we inform Let’s Encrypt that the token can be verified. On success, the certificate will be issued and handed over to the Avi Controller to process. Issuing complete.
In some setups you might use split-horizon DNS:
our-domain-on-avi.tld
inside your network points to an internal server, directly on the webserver.our-domain-on-avi.tld
from outside your network points to the NSX ALB/Avi Load Balancer.
As the Avi Controller validates the token within the local network, it will never go through the Load Balancer and therefore never hit the HTTP Policy
set through the script. Essentially causing the local validation to fail. By setting disable_check
to True
we simply bypass this check.
End
I hope this was useful for some Avi/NSX ALB fans out there!
Changelog
- 2021-12-27: Added note regarding support on GitHub repo.
Great Article Patrik. I am gonna try this out in my lab soon.
Wondering what config changes you made with ALB to get a dark theme?
I’m using “Dark Reader” (browser extension) for this. Unfortunately there’s no native dark mode in the Avi Controller WebUI.
hi buddy, great script. I am with a problem and I cannot find the solution. I have no programming skills and I am just learning about avi. I share the error I have to see if you can help me: Error from certificate management service: Could not find a VS with fqdn = abc.labs.com.ar. STDOUT - 'Running version 0.9.0 Debug enabled. dry_run is: False disable_check is: False directory_url is https://acme-v02.api.letsencrypt.org/directory Reusing account key. Parsing account key ... Parsing CSR ... Found domains: abc.labs.com.ar Getting directory ... Directory found! Registering account ... Already registered! Creating new order ... Order created! Authorization… Read more »
I have not looked in the logs to get the verbose error rapt0r has posted, however, I am getting the message “Error from certificate management service: Could not find a VS with fqdn = domainnamehere” so I think I have the similar issue. We have a glsb and two SE’s creating two vs objects for the site. dns of domainnamehere does resolve to the ip of the vs. I have also tried the disable_check True with same results.
Hi you both, apologies for the delayed response.
Would you mind please trying the suggested fix manually: https://github.com/avinetworks/devops/pull/246. Does this help?
If not, following new PR might help as this allows manually overwriting the VS to be used: https://github.com/avinetworks/devops/pull/249
For additional questions, I’d recommend raising a GitHub issue in the repo there as it’s better to keep track of and more people might see it.
Thanks!