A new feature added to TKGS as of 7.0 Update 2 is support for adding private SSL certificates to the “trust” on TKG cluster nodes.
This is very important as it finally provides a supported mechanism to use on-premises Harbor and other image registries.
It’s done by adding the encoded CAs to the “TkgServiceConfiguration”. The template for the TkgServiceConfiguration looks like this:
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
name: tkg-service-configuration
spec:
defaultCNI: antrea
proxy:
httpProxy: http://<user>:<pwd>@<ip>:<port>
trust:
additionalTrustedCAs:
- name: first-cert-name
data: base64-encoded string of a PEM encoded public cert 1
- name: second-cert-name
data: base64-encoded string of a PEM encoded public cert 2
Notice that there are two new sections under spec; one for proxy and one for trust. This article is going to focus on trust for additional CAs.
If your registry uses a self-signed cert, you’ll just encode that cert itself. If you take advantage on an Enterprise CA or similar to sign your certs, you’d encoded and import the “signing”, “intermediate” and/or “root” CA.
Example
Let’s add the certificate for a standalone Harbor (not the built-in Harbor instance in TKGS, its certificate is already trusted)
Download the certificate by clicking the “Registry Certificate” link
Run base64 -i <ca file> to return the base64 encoded content:
Provide a simple name and copy and paste the encoded cert into the data value:
Apply the TkgServiceConfiguration
After setting up your file. Apply it to the Supervisor cluster:
kubectl apply -f ./TanzuServiceConfiguration.yaml
Notes
Existing TKG clusters will not automatically inherit the trust for the certificates
Clusters created after the TKGServiceConfiguration is applied will get the certificates
You can scale an existing TKG cluster to trigger a rollout with the certificates
You can verify the certificates exist by connecting through SSH to the nodes and locating the certs under /etc/ssl/certs:
In TKGS on vSphere 7.0 through (at least) 7.0.1d, a Harbor Image Registry may be enabled for the vSphere Cluster (Under Configure|Namespaces| Image Registry). This feature currently (as of 7.0.1d) requires the Pod Service, which in turn requires NSX-T integration.
As of 7.0.1d, the self-signed certificate created for this instance of Harbor is added to the trust for nodes in TKG clusters, making it easier (possible?) to use images from Harbor.
When you login to harbor as a user, you’ll notice that the menu is very sparse. Only the ‘admin’ account can access the “Administration” menu.
To get logged in as the ‘admin’ account, we’ll need to retrieve the password from a secret for the harbor controller in the Supervisor cluster.
Steps:
SSH into the vCenter Server as root, type ‘shell’ to get to bash shell
Type ‘/usr/lib/vmware-wcp/decryptK8Pwd.py‘ to return information about the Supervisor Cluster. The results include the IP for the cluster as well as the node root password
While still in the SSH session on the vCenter Server, ssh into the Supervisor Custer node by entering ‘ssh root@<IP address from above>’. For the password, enter the PWD value from above.
Now, we have a session as root on a supervisor cluster control plane node.
Enter ‘kubectl get ns‘ to see a list of namespaces in the supervisor cluster. You’ll see a number of hidden, system namespaces in addition to those corresponding to the vSphere namespaces. Notice there is a namespace named “vmware-system-registry” in addition to one named “vmware-system-registry-#######”. The namespace with the number is where Harbor is installed.
Run ‘kubectl get secret -n vmware-system-registry-######‘ to get a list of secrets in the namespace. Locate the secret named “harbor-######-controller-registry”.
Run this to return the decoded admin password: kubectl get secret -n vmware-system-registry-###### harbor-######-controller.data.harborAdminPassword}' | base64 -d | base64 -d
In the cases I seen so far, the password is about 16 characters long, if it’s longer than that, you may not have decoded it entirely. Note that the value must be decoded twice.
Once you’ve saved the password, enter “exit” three times to get out of the ssh sessions.
Notes
Don’t manipulate the authentication settings
The process above is not supported; VMware GSS will not help you complete these steps
Some features may remain disabled (vulnerability scanning for example)
As admin, you may configure registries and replication (although it’s probably unsupported with this built-in version of Harbor for now)
So, lets say you want to deploy an instance of Harbor to your “services” kubernetes cluster. The cluster is protected by a scheduled Velero backup Velero pickup all resources in all namespaces by default, but we need to add an annotation to indicate a persistent volume that should be included in the backup. Without this annotation, Velero will not include the PV in the backup.
First, let’s create a namespace we want to install Harbor to: kubectl create ns harbor Then, we’ll make sure helm has the chart for Harbor helm repo add harbor https://helm.goharbor.io
helm repo update Finally, we’ll install harbor helm install harbor harbor/harbor --namespace harbor \
--set expose.type=loadBalancer,expose.tls.enabled=true,expose.tls.commonName=harbor.ragazzilab.com,\
externalURL=harbor.ragazzilab.com,harborAdminPassword=harbor,\
redis.podAnnotations."backup\.velero\.io/backup-volumes"=data,\
registry.podAnnotations."backup\.velero\.io/backup-volumes"=registry-data,\
trivy.podAnnotations."backup\.velero\.io/backup-volumes"=data,\
database.podAnnotations."backup\.velero\.io/backup-volumes"=database-data,\
chartmuseum.podAnnotations."backup\.velero\.io/backup-volumes"=chartmuseum-data,\
jobservice.podAnnotations."backup\.velero\.io/backup-volumes"=job-logs
Notice a few of the configurations we’re passing here:
expose.tls.commonName is the value that will be used by the gnerated TLS certificate
externalURL is the FQDN that we’ll use to reach Harbor (post deploy, you’ll get the loadBalancer IP and add the DNS record for it)
harborAdminPassword is the password assigned by default to the admin account – clearly this should be changed immediately
The next items are for the podAnnotations; the syntax was unexpectedly different. Notice there’s a dot instead of an equals-sign between the key and the value. Also notice that the dots in the value must be escaped.
Once Harbor is deployed, you can get the loadBalancer’s IP and point your browser at it.
Now, we can wait for the Velero backup job to run or kick off a one-off backup.
I noticed that Harbor did not start properly after restore. This was because postgres in the database pod expects a specific set of permissions – which were apparently different as a result of the restore. The log on the database pod only read FATAL: data directory “/var/lib/postgresql/data” has group or world access
To return Harbor to functionality post-restore, I had to take the following steps:
Edit the database statefulSet: kubectl edit StatefulSet harbor-harbor-database -n harbor
Replace the command in the “change-permission-of-directory” initContainer from chown -R 999:999 /var/lib/postgresql/data to chmod -R 0700 /var/lib/postgresql/data
Save changes and bounce the database pod by running kubectl delete po -n harbor harbor-harbor-database-0
Bounce the remaining pods that are in CrashLoopBackup (because they’re trying to connect to the database)
Thanks to my friend and colleague Hemanth AVS for help with the podAnnotations syntax!
Login to Harbor Web GUI as an administrator. Navigate to Administration/Registries
Add Endpoint for local Harbor by clicking ‘New Endpoint’ and entering the following:
Provider: harbor
Name: local (or FQDN or whatever)
Description: optional
Endpoint URL: the actual URL for your harbor instance beginning with https and ending with :443
Access ID: username for an admin or user that at least has Project Admin permission to the target Projects/namespaces
Access Secret: Password for the account above
Verify Remote Cert: typically checked
Add Endpoint for Docker Hub by clicking ‘New Endpoint’ and entering the following:
Provider: docker-hub
Name: dockerhub (or something equally profound)
Description: optional
Endpoint URL: pre-populated/li>
Access ID: username for your account at dockerhub
Access Secret: Password for the account above
Verify Remote Cert: typically checked
Notice that this is for general dockerhub, not targeting a particular repo.
Configure Replications for the Yelb Images
You may create replications for several images at once using a variety of filters, but I’m going to create a replication rule for each image we need. I think this makes it easier to identify a problem, removes the risk of replicating too much and makes administration easier. Click ‘New Replication Rule‘ enter the following to create our first rule:
Name: yelb-db-0.5
Description: optional
Replication Mode: Pull-based (because we’re pulling the image from DockerHub)
Source registry: dockerhub
Source Registry Filter – Name: mreferre/yelb-db
Source Registry Filter – Tag: 0.5
Source Registry Filter – Resource: pre-populated
Destination Namespace: yelb (or whatever Project you want the images saved to)
Trigger Mode: Select ‘Manual’ for a one-time sync or select ‘Scheduled’ if you want to ensure the image is replicated periodically. Note that the schedule format is cron with seconds, so 0 0 23 * * 5 would trigger the replication to run every Friday at 23:00:00. Scheduled replication makes sense when the tag filter is ‘latest’ for example
Override: leave checked to overwrite the image if it already exists
Enable rule: leave checked to keep the rule enabled
Add the remaining Replication Rules:
Name
Name Filter
Tag Filter
Dest Namespace
yelb-ui-latest
mreferre/yelb-ui
latest
yelb
yelb-appserver-latest
mreferre/yelb-appserver
latest
yelb
redis-4.0.2
library/redis
4.0.2
yelb
Note that redis is an official image, so we have to include library/
Pivotal Container Service (PKS) 1.5 and Kubernetes 1.14 bring *beta* support for Workers running Windows. This means that we can provide the advantages of Kubernetes to a huge array of applications running on Windows. I see this especially useful for Windows applications that you don’t have the source code for and/or do not want to invest in reworking it for .NET core or languages that run on Linux.
In nearly all cases, you’ll need an image with your applications’ dependencies or configuration and in the real world, we don’t want those in the public space like dockerhub. Enter Private Docker Repositories.
PKS Enterprise includes VMware Harbor as a private registry, it’s very easy to deploy alongside PKS and provides a lot of important functionality. The Harbor interface uses TLS/SSL; you may use a self-signed, enterprise PKI-signed or public CA-signed certificate. If you chose to not use a public CA-signed certificate ($!), the self-signed or PKI-signed certificate must be trusted by the docker engine on each Kubernetes worker node.
Clusters based on Ubuntu Xenial Stemcells:
The operator/administrator simply puts the CA certificate into the “Trusted Certificates” box of the Security section in Ops Manager.
When BOSH creates the VMs for kubernetes clusters, the trusted certificates are added to the certificate store automatically.
If using an enterprise PKI where all of the internal certificates are signed by the Enterprise CA, this method makes it very easy to trust and “un-trust” CAs.
Clusters based on Windows 2019 Stemcells:
This is one of those tasks that is easier to perform on Linux that it is on Windows. Unfortunately, Windows does not automatically add the Trusted Certificates from Ops Manager to the certificate store, so extra steps are required.
Obtain the Registry CA Certificate. In Harbor, you may click the “REGISTRY CERTIFICATE” link while in a Project. Save the certificate to where the BOSH cli is installed (Ops Manager typically).
List BOSH-managed vms to identify the service_instance deployment corresponding to the targeted K8s cluster by matching the VM IP address to the IP address of the master node as reported by PKS cluster.
Run this command to copy the certificate to the Windows worker
bosh -e ENV -d DEPLOYMENT scp root.cer WINDOWS-WORKER:/
ENV – your environment alias in the BOSH cli
DEPLOYMENT – the BOSH deployment that corresponds to the k8s cluster; ex: service-instance_921bd35d-c46d-4e7a-a289-b577ff743e15
WINDOWS-WORKER – the instance name of the specific Windows worker VM; ex: windows-worker/277536dd-a7e6-446b-acf7-97770be18144
This command copies the local file named root.cer to the root folder on the Windows VM
Use BOSH to SSH into the Windows Worker.
bosh -e ENV -d DEPLOYMENT ssh WINDOWS-WORKER
ENV – your environment alias in the BOSH cli
DEPLOYMENT – the BOSH deployment that corresponds to the k8s cluster; ex: service-instance_921bd35d-c46d-4e7a-a289-b577ff743e15
WINDOWS-WORKER – the instance name of the specific Windows worker VM; ex: windows-worker/277536dd-a7e6-446b-acf7-97770be18144
SSH into Windows node, notice root.cer on the filesystem
In the Windows SSH session run “powershell.exe” to enter powershell
The example above imports the local file “root.cer” into the Trusted Root Certificate Store
Type “exit” twice to exit PS and SSH
Repeat steps 5-8 for each worker node.
Add docker-registry secret to k8s cluster
Whether the k8s cluster is running Windows workers or not, you’ll want to add credentials for authenticating to harbor. These credentials are stored in a secret. To add the secret, use this command:
HARBOR_FQDN – FQDN for local/private Harbor registry
HARBOR_USER – name of user in Harbor with access to project and repos containing the desired images
USER_PASS – username for the above account
USER_EMAIL – email adddress for the above account
Note that this secret is namespaced; it needs to be added to the namespace of the deployments that will reference it
More info
Here’s an example deployment yaml for a Windows K8s cluster that uses a local private docker registry. Note that Windows clusters cannot leverage NSX-T yet, so this example uses a NodePort to expose the service.
Last night, Pivotal announced new versions of PKS and Harbor, so I thought it’s time to simplify the upgrade process. Here is a concourse pipeline that essentially aggregates the upgrade-tile pipeline so that PKS and Harbor are upgraded in one go.
What it does:
Runs on a schedule – you set the time and days it may run
Downloads the latest version of PKS and Harbor from Pivnet- you set the major.minor version range
Uploads the PKS and Harbor releases to your BOSH director
Determines whether the new release is missing a stemcell, downloads it from PivNet and uploads it to BOSH director
Stages the tiles/releases
Applies changes
What you need:
A working Concourse instance that is able to reach the Internet to pull down the binaries and repo
Edit the params.yml by replacing the values in double-parentheses with the actual value. Each line has a bit explaining what it’s expecting. For example, ((ops_mgr_host)) becomes opsmgr.pcf1.domain.local
Remove the parens
If you have a GitHub Token, pop that value in, otherwise remove ((github_token))
The current pks_major_minor_version regex will get the latest 1.0.x. If you want to pin it to a specific version, or when PKS 1.1.x is available, you can make those changes here.
The ops_mgr_usr and ops_mgr_pwd credentials are those you use to logon to Ops Manager itself. Typically set when the Ops Manager OVA is deployed.
The schedule params should be adjusted to a convenient time to apply the upgrade. Remember that in addition to the PKS Service being offline (it’s a singleton) during the upgrade, your Kubernetes clusters may be affected if you have the “Upgrade all Clusters” errand set to run in the PKS configuration, so schedule wisely!
Set the new pipeline. Here, I’m naming the pipeline “PKS_Upgrade”. You’ll pass the pipeline.yml with the “-c” param and your edited params.yml with the “-l” param
Unpause the pipeline so it can run when in the scheduled window
fly -t concourse up -p PKS_Upgrade
Login to the Concourse web to see our shiny new pipeline!
If you don’t want to deal with the schedule and simply want it to upgrade on-demand, use the pipeline-nosched.yml instead of pipeline.yml, just be aware that when you unpause the pipeline, it’ll start doing its thing. YMMV, but for me, it took about 8 minutes to complete the upgrade.
Behind the scenes
It’s not immediately obvious how the pipeline does what it does. When I first started out, I found it frustrating that there just isn’t much to the pipeline itself. To that end, I tried making pipelines that were entirely self-contained. This was good in that you can read the pipeline and see everything it’s doing; plus it can be made to run in an air-gapped environment. The downside is that there is no separation, one error in any task and you’ll have to edit the whole pipeline file.
As I learned a little more and poked around in what others were doing, it made sense to split the “tasks” out, keep them in a GitHub public repo and pull it down to run on-demand.
Pipelines generally have two main sections; resources and jobs. Resources are objects that are used by jobs. In this case, the binary installation files, a zip of the GitHub repo and the schedule are resources. Jobs are (essentially) made up of plans and plans have tasks.
Each task in most pipelines uses another source yml. This task.yml will indicate which image concourse should build a container from and what it should do on that container (typically, run a script). All of these task components are in the GitHub repo, so when the pipeline job runs, it clones the repo and runs the appropriate task script in a container built on an image pulled from dockerhub.
More info
I’ve got a several pipelines in the repo. Some of them do what they’re supposed to. 🙂 Most of them are derived from others’ work, so many thanks to Pivotal Services and Sabha Parameswaran