PAS with NSX-T Tip: use a fresh IP Block

I’ve fought with this for an embarrassingly long time.  Had a failed PAS (Pivotal Application Services) deployment (missed several of the NSX configuration requirements) but removed the cruft and tried again and again and again.  In each case, PAS and NCP would deploy, but fail on the PAS smoke_test errand. The error message said more detail is in the log.

Which Log?!

I ssh’d into the clock_global VM and found the smoke_test logs. They stated that the container for instance {whatever} could not be created and an error of NCP04004. This pointed me to the Diego Cells (where the containers would be created) and I poked around in the /var/vcap/sys/log/garden logs there. They stated that the interface for the instance could not be found. Ok, this is sounding more like an NSX problem.

I ended up parsing through the NSX Manager event log and found this gem:

IP Block Error

Ah-ha! Yup, I’d apparently allocated a couple of /28 subnets from the IP Block. So when the smoke test tried to allocate a /24, the “fixed” subnet size had already been set to /28, causing the error.

Resolution was to simply remove all of the allocated subnets from the IP block. This could have been avoided by either not reusing an existing IP Block or using the settings in the NCP configuration to create a new IP Block with a given CIDR.

Resolutions for 2018

If I put it here, I’m much more likely to follow-through.  Like many, I work best under some pressure.  Here is a list of what I want to do differently (with regard to technology) next year.

  1. Do more blogging.  I can make a ton of excuses for not blogging as much this year.  I love sharing what I’ve learned; the more new stuff I learn, the more I share.  So….
  2. Do more for NSX for vSphere and NSX-T.  I feel strongly that SDN is critical to the future of how datacenters operate.  NSX is the logical leader in this space and will only grow in interest.  There is still a tendency to replicate what was done with pre-SDN technology and I’d like to see modern ways to solve problems while finding and pushing the limits of what can be done in SDN.
  3. PKS
    Do more with containers and PKS.  The technologies that Pivotal provides are cutting edge.  Already and continuing, containers and applications-as-code methods are growing and will define the datacenter of the future.  Just as a few years ago, we stopped thinking of hardware servers as single-purpose, we’ll embrace multiple workloads within a VM.
  4. Do more coding.  I love concourse and pipelines, but have a lot to learn.  Let’s find the limits of BOSH and pipelines.  Can we not only deploy, but automate the operation and maintenance of a PaaS solution?
  5. Do more coding.  I feel that as we move to “applications-as-code”, it’s important to understand what that means to developers and operators.  What sort of problems become irrelevant in this approach?  What molehills become mountains?

Hope to see you next year!

 

Getting started with BOSH Backup and Restore – Pt.1 Backup

Starting with a working PCF 1.11 deployment, a random linux VM and the BOSH Backup and Restore bits, let’s try it out!

Background

  • We’ll perform two types of backup jobs using BBR; one against the BOSH director and one against the Elastic Runtime deployment. The command and parameters are different between the jobs.
  • BBR stores the backup data in subfolders where the executable is run
  • Tiles other than Elastic Runtime (CF) may be backed up with BBR later, but as of late June 2017, they do not have the BBR scripts in place.
  • If you don’t turn on MySQL backups and the Backup Prepare Node in Elastic Runtime, the CF deployment backup job will fail in that it cannot find the backup scripts for the MySQL database
  • I’m using a CentOS VM in the environment as the jumpbox to run BBR.  You’ll want to make sure that the jumpbox is able to reach the BOSH director on TCP22 and TCP25555.

Steps

  1. Prepare PCF
    • Logon to Ops Manager
    • Click the “Pivotal Elastic Runtime” tile
    • Assuming you’re using the internal MySQL, click “Internal MySQL” on the Settings tab
    • Under Automated Backups Configuration, select “Enable automated backups from MySQL to and S3 bucket or other S3-compatible file store”.  Right here, you’re thinking, “but I don’t have an S3 server or account or whatever”.  That’s ok, just fake it.  Put bogus values in the fields and an unreachable date (like February 31st).  Click Save.

      Bogus S3 info
    • Under Resource Config, make sure the Backup Prepare Node instance count is 1 (or more?).  Click Save
    • Return to the Installation Dashboard and Apply Changes
  2. Get the BBR credentials.
    • Logon to Ops Manager
    • Click the “Ops Manager Director” tile
    • Click the “Credentials” tab
    • Click the “Link to Credential” link beside “Bbr Ssh Credentials”

      BBR Director Backup Credential
    • The page the loads will display a yml-type file with the PEM-encoded Private and Public Keys.  Select and copy from “—–BEGIN RSA PRIVATE KEY—–” through “—–END RSA PRIVATE KEY—–“.
    • Paste this into a text editor.  In my case, on Windows, the content used literally “/n” to indicate new-line rather than an actual newline.  So, to convert it, I used Notepad++ to replace “//n” with “/n” in the Extended Search Mode.

      Using Notepad++
    • The username that BBR will use for the director job is “bbr”
    • Back on the “Credentials” tab of Ops Manager Director, click “Link to Credential” beside “Uaa Bbr Client Credentials”
    • On the page that loads, note that the identity is”bbr_client” and record the password value. This will be used for the BBR deployment job(s)
    • Back on the “Credentials” tab of Ops Manager Director, click “Link to Credential” beside “Director Credentials”
    • On the page that loads, note that the identity is”director” and record the password value.  You’ll need this to login to BOSH in order to get the deployment name next
  3. Get the deployment name
    • Open an SSH session to the Ops Manager, logging on as ubuntu
    • Run this:

      uaac target –ca-cert /var/tempest/workspaces/default/root_ca_certificate https://DIRECTOR-IP-ADDRESS:8443

      bosh –ca-cert /var/tempest/workspaces/default/root_ca_certificate target DIRECTOR-IP-ADDRESS

      Logon as “director” with the password saved earlier

    • Run this:

      bosh deployments

    • In the results, copy the deployment name that begins with “cf-“. (eg: cf-67afe56410858743331)
  4. Prepare the jumpbox
    • Logon with a privileged account
    • Using SCP or similar, copy “/var/tempest/workspaces/default/root_ca_certificate” from Ops Manager to the jump box
    • Copy the bbr-0.1.2.tar file to the jumpbox
    • Extract it – tar -xvf bbr-0.1.2.tar
    • Make sure you have plenty of space on the jumpbox.  In my case, I mounted a NFS share and ran BBR from the mount point.
    • Copy <extracted files>/release/bbr to the root folder where you want the backups to reside.
    • Save the PEM-encoded RSA Private Key from above to the jumpbox, making a note of it’s path and filename.  I just stuck it in the same folder as the bbr executable.
    • Make sure you can connect to the BOSH director via ssh
      ssh -i bbr@
  5. Director Backup
    • On the jumpbox, navigate to where you placed the bbr executable.  Remember that it will create a time-stamped subfolder here and dump all the backups into it.
    • Run this, replacing the values in red with the correct path to the private key file and BOSH Director IP address :Director Pre-check
      ./bbr director –private-key-path ./private.key –username bbr –host 172.16.9.16 pre-backup-check
    • Check that the pre-check results indicate that the director can be backed up
    • Run this to perform the backup: (same as before, just passing the “backup” sub-command instead of the “pre-backup-check’ subcommand)Director Backup
      ./bbr director –private-key-path ./private.key –username bbr –host 172.16.9.16 backup
    • Wait a while for the backup to complete
  6. What’d it do?
    • Backed up BOSH director database to bosh-0-director.tar
    • Dumped credhub database to bosh-0-credhub.tar
    • Dumped uaa database to bosh-0-uaa.tar
    • Backed up the BOSH director blobstore to bosh-0-blobstore.tar
    • Saved the blobstore metadata to a file named metadata
  7. Elastic Runtime Backup
    • On the jumpbox, navigate to where you placed the bbr executable.  Remember that it will create a time-stamped subfolder here and dump all the backups into it.
    • Run this, replacing the values in red with the IP/FQDN of your BOSH director, password for the bbr_client account retrieved from Ops Manager, the Elastic Runtime deployment name and path to the root_ca-certificate copied from the Ops Manager:
      Deployment Pre-check

      ./bbr deployment –target 172.16.9.16 –username bbr_client –password abc123 –deployment cf-abcdef123456 –ca-cert ./root_ca_certificate pre-backup-check

    • Check that the pre-check results indicate that the director can be backed up
    • Run this to perform the backup: (same as before, just passing the “backup” sub-command instead of the “pre-backup-check’ subcommand)
      Deployment Backup

      ./bbr deployment –target 172.16.9.16 –username bbr_client –password abc123 –deployment cf-abcdef123456 –ca-cert ./root_ca_certificate backup

    • Wait a while for the backup to complete
  8. What’d it do this time?
    • Backed up the MySQL Cloud Controller Database to mysql-artifact.tar
    • Backed up uaa to uaa-0-uaa.tar (this is different from the UAA backup performed against the director)
    • Backed up the blobstore (in my case, from the internal NFS server) to nfs_server-0-blobstore-backup.tar
    • Saved the blobstore metadata to a file named metadata

 

References:

 

Building a Concourse CI VM on Ubuntu

Recently, I’ve found myself needing a Concourse CI system. I struggled with the documentation on concourse.ci, couldn’t find any comprehensive build guides.  Knew for certain I wasn’t going to use VirtualBox.  So, having worked it out; thought I’d share what I went through to get to a working system.

Starting Position
Discovered that the CentOS version I was using previously did not have a compatible Linux kernel version.  CentOS 7.2 uses kernel 3.10, Concourse requires 3.19+.  So, I’m starting with a freshly-deployed Ubuntu Server 16.04 LTS this time.

Prep Ubuntu
Not a lot we have to do, but still pretty important:

  1. Make sure port for concourse is open

    sudo ufw allow 8080
    sudo ufw status

    sudo ufw disable

    I disabled the firewall on ubuntu because it was preventing the concourse worker and concourse web from communicating.

  2. Update and make sure wget is installed

    apt-get update
    apt-get install wget

Postgresql
Concourse expects to use a postgresql database, I don’t have one standing by, so let’s install it.

  1. Pretty straightforward on Ubuntu too:

    apt-get install postgresql postgresql-contrib

    Enter y to install the bits.  On Ubuntu, we don’t have to take extra steps to configure the service.

  2. Ok, now we have to create an account and a database for concourse. First, lets create the linux account. I’m calling mine “concourse” because I’m creative like that.

    adduser concourse
    passwd concourse

  3. Next, we create the account (aka “role” or “user”) in postgres via the createuser command. In order to do this, we have to switch to the postgres account, do that with sudo:

    sudo -i -u postgres

    Now, while in as postgres we can use the createuser command

    createuser –interactive

    You’ll enter the name of the account, and answer a couple of special permissions questions.

  4. While still logged in as postgres, run this command to create a new database for concourse. I’m naming my database “concourse” – my creativity is legendary. Actually, I think it makes life easier if the role and database are named the same

    createdb concourse

  5. Test by switching users to the concourse account and making sure it can run psql against the concourse databaseWhile in psql, use this command to set the password for the account in postgress

    ALTER ROLE concourse WITH PASSWORD 'changeme';

  6. Type \q to exit psql

Concourse
Ok, we have a running postgresql service and and account to be used for concourse. Let’s go.

  1. Create a folder for concourse. I used /concourse, but you can use /var/lib/whatever/concourse if you feel like it.
  2. Download the binary from concourse.ci/downloads.html into your /concourse folder using wget or transfer via scp.
  3. Create a symbolic link named “concourse” to the file you downloaded and make it executable

    ln -s ./concourse_linux_amd64 ./concourse
    chmod +x ./concourse_linux_amd64

  4. Create keys for concourse

    cd /concourse

    mkdir -p keys/web keys/worker

    ssh-keygen -t rsa -f ./keys/web/tsa_host_key -N ”
    ssh-keygen -t rsa -f ./keys/web/session_signing_key -N ”
    ssh-keygen -t rsa -f ./keys/worker/worker_key -N ”
    cp ./keys/worker/worker_key.pub ./keys/web/authorized_worker_keys
    cp ./keys/web/tsa_host_key.pub ./keys/worker

  5. Create start-up script for Concourse. Save this as /concourse/start.sh:

    /concourse/concourse web \
    –basic-auth-username myuser \
    –basic-auth-password mypass \
    –session-signing-key /concourse/keys/web/session_signing_key \
    –tsa-host-key /concourse/keys/web/tsa_host_key \
    –tsa-authorized-keys /concourse/keys/web/authorized_worker_keys \
    –external-url http://192.168.103.81:8080 \
    –postgres-data-source postgres://concourse:changeme@127.0.0.1/concourse?sslmode=disable

    /concourse/concourse worker \
    –work-dir /opt/concourse/worker \
    –tsa-host 127.0.0.1 \
    –tsa-public-key /concourse/keys/worker/tsa_host_key.pub \
    –tsa-worker-private-key /concourse/keys/worker/worker_key

    The items in red should definitely be changed for your environment. “external_url” uses the IP address of the VM its running on. and the username and password values in the postgres-data-source should reflect what you set up earlier. Save the file and be sure to set it as executable (chmod +x ./start.sh)

  6. Run the script “./start.sh”. You should see several lines go by concerning worker-collectors and builder-reapers.
    • If you instead see a message about authentication, you’ll want to make sure that 1) the credentials in the script are correct, 2) the account has not had it’s password set in linux or in postgres
    • If you instead see a message about the connection not accepting SSL, be sure that the connection string in the script includes “?sslmode=disable” after the database name
  7. Test by pointing a browser at the value you assigned to the external_url. You should see “no pipelines configured”.  You can login using the basic-auth username and password you specified in the startup script.

    Success!
  8. Back in your SSH session, you can kill it with <CRTL>+C

Finishing Up
Now we just have to make sure that concourse starts when the system reboots. I am certain that there are better/safer/more reliable ways to do this, but here’s what I did:
Use nano or your favorite text editor to add “/concourse/start.sh” to /etc/rc.local ABOVE the line that reads “exit 0”
Now, reboot your VM and retest the connectivity to the concourse page.

Thanks

EMC ECS Community Edition project for how to start the script on boot.

Mitchell Anicas’ very helpful post on setting up postgres on Ubuntu.

Concourse.ci for some wholly inadequate documentation

Alfredo Sánchez for bringing the issue with Concourse and CentOS to my attention

Building a Concourse CI VM on CentOS

Recently, I’ve found myself needing a Concourse CI system. I struggled with the documentation on concourse.ci, couldn’t find any comprehensive build guides.  Knew for certain I wasn’t going to use VirtualBox.  So, having worked it out; thought I’d share what I went through to get to a working system.

WARNING

It has been brought to my attention that CentOS does not have a compatible Linux kernel, so I’ve redone this post using Ubuntu instead.

Starting Position
I’m starting with a freshly-deployed CentOS 7 VM. I use Simon’s template build, so it comes up quickly and reliably.  Logged on as root.

Prep CentOS
Not a lot we have to do, but still pretty important:

  1. Open firewall post for concourse

    firewall-cmd --add-port=8080/tcp --permanent
    firewall-cmd --reload

    optionally, you can open 5432 for postgres if you feel like it

  2. Update and make sure wget is installed

    yum update
    yum install wget

Postgresql
Concourse expects to use a postgresql database, I don’t have one standing by, so let’s install it.

  1. Pretty straightforward on CentOS:

    yum install postgresql-server postgresql-contrib

    Enter y to install the bits.

  2. When that step is done, we’ll set it up with this command:

    sudo postgresql-setup initdb

  3. Next, we’ll update the postgresql config to allow passwords. Use your favorite editor to open /var/lib/pgsql/data/pg_hba.conf We need to update the value in the method column for IPv4 and IPv6 connections from “ident” to “md5” then save the file.
    Before

    After
  4. Now, let’s start postgresql and set it to run automatically

    sudo systemctl start postgresql
    sudo systemctl enable postgresql

  5. Ok, now we have to create an account and a database for concourse. First, lets create the linux account. I’m calling mine “concourse” because I’m creative like that.

    adduser concourse
    passwd concourse

  6. Next, we create the account (aka “role” or “user”) in postgres via the createuser command. In order to do this, we have to switch to the postgres account, do that with sudo:

    sudo -i -u postgres

    Now, while in as postgres we can use the createuser command

    createuser –interactive

    You’ll enter the name of the account, and answer a couple of special permissions questions.

  7. While still logged in as postgres, run this command to create a new database for concourse. I’m naming my database “concourse” – my creativity is legendary. Actually, I think it makes life easier if the role and database are named the same

    createdb concourse

  8. Test by switching users to the concourse account and making sure it can run psql against the concourse databaseWhile in psql, use this command to set the password for the account in postgress

    ALTER ROLE concourse WITH PASSWORD 'changeme';

  9. Type \q to exit psql

Concourse
Ok, we have a running postgresql service and and account to be used for concourse. Let’s go.

  1. Create a folder for concourse. I used /concourse, but you can use /var/lib/whatever/concourse if you feel like it.
  2. Download the binary from concourse.ci/downloads.html into your /concourse folder using wget or transfer via scp.
  3. Create a symbolic link named “concourse” to the file you downloaded and make it executable

    ln -s ./concourse_linux_amd64 ./concourse
    chmod +x ./concourse_linux_amd64

  4. Create keys for concourse

    cd /concourse

    mkdir -p keys/web keys/worker

    ssh-keygen -t rsa -f ./keys/web/tsa_host_key -N ”
    ssh-keygen -t rsa -f ./keys/web/session_signing_key -N ”
    ssh-keygen -t rsa -f ./keys/worker/worker_key -N ”
    cp ./keys/worker/worker_key.pub ./keys/web/authorized_worker_keys
    cp ./keys/web/tsa_host_key.pub ./keys/worker

  5. Create start-up script for Concourse. Save this as /concourse/start.sh:

    /concourse/concourse web \
    –basic-auth-username myuser \
    –basic-auth-password mypass \
    –session-signing-key /concourse/keys/web/session_signing_key \
    –tsa-host-key /concourse/keys/web/tsa_host_key \
    –tsa-authorized-keys /concourse/keys/web/authorized_worker_keys \
    –external-url http://192.168.103.81:8080 \
    –postgres-data-source postgres://concourse:changeme@127.0.0.1/concourse?sslmode=disable

    /concourse/concourse worker \
    –work-dir /opt/concourse/worker \
    –tsa-host 127.0.0.1 \
    –tsa-public-key /concourse/keys/worker/tsa_host_key.pub \
    –tsa-worker-private-key /concourse/keys/worker/worker_key

    The items in red should definitely be changed for your environment. “external_url” uses the IP address of the VM its running on. and the username and password values in the postgres-data-source should reflect what you set up earlier. Save the file and be sure to set it as executable (chmod +x ./start.sh)

  6. Run the script “./start.sh”. You should see several lines go by concerning worker-collectors and builder-reapers.
    • If you instead see a message about authentication, you’ll want to make sure that 1) the credentials in the script are correct, 2) the account has not had it’s password set in linux or in postgres and 3) the pg_hba.conf fie has been updated to use md5 instead of ident
    • If you instead see a message about the connection not accepting SSL, be sure that the connection string in the script includes “?sslmode=disable” after the database name
  7. Test by pointing a browser at the value you assigned to the external_url. You should see “no pipelines configured”

    Success!
  8. Back in your SSH session, you can kill it with <CRTL>+X

Finishing Up
Now we just have to make sure that concourse starts when the system reboots. I am certain that there are better/safer/more reliable ways to do this, but here’s what I did:

echo "/concourse/start.sh" >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local

Now, reboot your VM and retest the connectivity to the concourse page.

Thanks

EMC ECS Community Edition project for how to start the script on boot.

Mitchell Anicas’ very helpful post on setting up postgres on CentOS.

Concourse.ci for some wholly inadequate documentation

Configuring NSX Load-Balancer for PCF

There’s not a lot of specific information out there for this configuration.  There’s some guidance from Pivotal and some how-tos from VMware, so with a little additional detail, we should be able to figure this out.

Edit – 2/1/17 – Updated with OpenSSL configuration detail
Edit – 3/20/17 – Updated SubjectAltNames in config

Preparation

  1. SSL Certificate. You’ll need the signed public cert for your URL (certnew.cer), the associated private key (pcf.key) and the public cert of the signing CA (root64.cer).
    1. Download and install OpenSSL
    2. Create a config file for your request – paste this into a text file:

      [ req ]
      default_bits = 2048
      default_keyfile = rui.key
      distinguished_name = req_distinguished_name
      encrypt_key = no
      prompt = no
      string_mask = nombstr
      req_extensions = v3_req

      [ v3_req ]
      basicConstraints = CA:FALSE
      keyUsage = digitalSignature, keyEncipherment
      extendedKeyUsage = serverAuth, clientAuth
      subjectAltName = DNS: *.pcf.domain.com, DNS:ServerShortName, IP:ServerIPAddress, DNS: *.system.pcf.domain.com, DNS: *.apps.pcf.domain.com, DNS:*.login.system.pcf.domain.com, DNS: *.uaa.system.pcf.domain.com

      [ req_distinguished_name ]
      countryName = US
      stateOrProvinceName = State
      localityName = City
      0.organizationName = Company Name
      organizationalUnitName = PCF
      commonName = *.pcf.domain.com

    3. Replace the values in red with those appropriate for your environment. Be sure to specify the server name and IP address as the Virtual IP and its associated DNS record. Save the file as pcf.cfg.  You’ll want to use the wildcard “base” name as the common name and the server name, as well as the *.system, *.apps, *.login.system and *.uaa.system Subject Alt Names.
    4. Use OpenSSL to create the Certificate Site Request (CSR) for the wildcard PCF domain.

      openssl req -new -newkey rsa:2048 -nodes -keyout pcf.key -out pcf.csr -config pcf.cfg

    5. Use OpenSSL to convert the key to RSA (required for NSX to accept it)

      openssl rsa -in pcf.key -out pcfrsa.key

    6. Submit the CSR (pcf.csr) to your CA (Microsoft Certificate Services in my case), retrieve the certificate (certnew.cer) and certificate chain (certnew.p7b) base-64 encoded.
    7. Double-click certnew.p7b to open certmgr. Export the CA certificate as 64-bit encoded x509 to a file (root64.cer is the file name I use)
  2. Networks.  You’ll need to know what layer 3 networks the PCF components will use.  In my case, I set up a logical switch in NSX and assigned the gateway address to the DLR. Probably should make this a 24-bit network, so there’s room to grow, but not reserving a ridiculous number of addresses. We’re going to carve up the address space a little, so make a note of the following:
    • Gateway and other addresses you typically reserve for network devices.  (eg:  first 9 addresses 1-9)
    • Address that will be assigned to the NSX load balancer.  Just need one (eg: 10)
    • Addresses that will be used by the PCF Routers.  At least two. These will be configured as members in the NSX Load Balancer Pool.
  3. DNS, IP addresses.  PCF will use “system” and “apps” subdomains, plus whatever names you give any apps deployed.  This takes some getting used to – not your typical application.  Based on the certificate we created earlier, I recommend just creating a “pcf” subdomain.  In my case, the network domain (using AD-DNS) is ragazzilab.com and I’ve created the following:
    • pcf.ragazzilab.com subdomain
    • *.pcf.ragazzilab.com A record for the IP address I’m going to assign to the NSX Load-Balancer

NSX

Assuming NSX is already installed and configured.  Create or identify an existing NSX Edge that has an interface on the network where PCF will be / is deployed.

  1. Assign the address we noted above to the inteface under Settings|Interfaces
  2. Under Settings|Certificates, add the our SSL certificates
    • Click the Green Plus and select “CA Certificate”.  Paste the content of the signing CA public certificate (base64.cer) into the Certificate Contents box.  Click OK.
    • Click the Green Plus and select “Certificate”.  Paste the content of the signed public cert (certnew.cer) into the Certificate Contents box and paste the content of the RSA private key (pcfrsa.key) into the Private Key box. Click OK.
  3. Under Load Balancer, create an Application Profile. We need to ensure that NSX inserts the x-forwarded-for HTTP headers.  To do that, we need to be able to decrypt the request and therefore must provide the certificate information.  I found that Pool Side SSL had to be enabled and using the same Service and CA Certificates.
    Router Application Profile
    Router Application Profile

     

  4. Create the Service Monitor.  What worked for me is a little different from what is described in the GoRouter project page. The key points are that we want to specify the useragent and look for a response of “ok” with a header of “200 OK”.

    Service Monitor for PCF Router
    Service Monitor for PCF Router
  5. Create the Pool.  Set it to ROUND-ROBIN using the Service Monitor you just created.  When adding the routers as members, be sure to set the port to 443, but the Monitor Port to 80.

    Router Pool
    Router Pool
  6. Create the Virtual Server.  Specify the Application Profile and default Pool we just created.  Obviously, specify the correct IP Address.
    Virtual Server Configuration
    Virtual Server Configuration


PCF – Ops Manager

Assuming you’ve already deployed the Ops Manager OVF, use the installation dashboard to edit the configuration for Ops Manager Director.  I’m just going to highlight the relevant areas of the configuration here:

Networks.  Under “Create Networks”, be sure that the Subnet specified has the correct values.  Pay special attention to the reserved IP ranges.  These should be the addresses of the network devices and the IP address assigned to the load-balancer.  Do not include the addresses we intend to use for the routers though.  Based on the example values above, we’ll reserve the first 10 addresses.

Ops Manager Network Config
Ops Manager Network Config

Ops Manager Director will probably use the first/lowest address in range that is not reserved.

PCF – Elastic Runtime

Next, we’ll install Elastic Runtime.  Again, I’ll highlight the relevant sections of the configuration.

  1. Domains.  In my case it’s System Domain = system.pcf.ragazzilab.com and Apps Domain = apps.pcf.ragazzilab.com
  2. Networking.
    • Set the Router IPs to the addresses (comma-separated) you noted and added to as members to the NSX load-balancer earlier.
    • Leave HAProxy IPs empty
    • Select the point-of-entry option for “external load balancer, and it can forward encrypted traffic”
    • Paste the content of the signed certificate (certnew.cer) into the Certificate PEM field.  Paste the content of the CA public certificate (root64.cer) into the same field, directly under the certificate content.
    • Paste the content of the private key (pcf.key) into the Private Key PEM field.
    • Check “Disable SSL Certificate verification for this environment”.
  3. Resource Config.  Be sure that the number of Routers is at least 2 and equal to the number of IP addresses you reserved for them.

 

Troubleshooting

Help! The Pool Status is down when the Service Monitor is enabled.

This could occur if your routers are behaving differently from mine.  Test the response by sending a request to one of the routers  through curl and specifying the user agent as HTTP-Monitor/1.1

curl -v -A “HTTP-Monitor/1.1” “http://{IP of router}”

 

Testing router with curl
Testing router with curl

The value in the yellow box should go into the “Expected” field of the Service Monitor and the value in the red box should go into the “Receive” field. Note that you should not get a 404 response, if you do, check that he user agent is set correctly.

 

Notes

This works for me and I hope it works for you.  If you have trouble or disagree, please let me know.

Pivotal Cloud Foundry vApp startup order workflow

After installing Pivotal Cloud Foundry (PCF) on vSphere, you’ll have a collection of at least 21 (probably closer to 60!) VMs with names that probably don’t match anyone’s convention.  Although, as noted in the PCF documentation, there is a correct order to starting up and shutting down the VMs in PCF, the installer does not configure a vApp so that we can control that order.  So, I dragged all the PCF VMs into a vApp and starting trying to determine which ones are in which role and quickly realized that it’s a pain.

Creating an AZ in Ops Manager on vSphere
Creating an AZ in Ops Manager on vSphere

As an aside, when you create your Availability Zone, you point it at a vSphere cluster and, optionally, a Resource Pool.  Unfortunately, if you specify a vApp Name instead of a Resource Pool name, BOSH will fail to deploy the VMs.  So, I’ve typically leave the Resource Pool field blank and then drag the VMs into a vApp post-deployment.

I put together a workflow that will help place the PCF VMs into correct startup/shutdown groups for you.

Example PCF VMNames
Example PCF VMNames

Instructions for Use

  1. Download the package from here
  2. Import the package into vRealize Orchestrator
  3. If you haven’t already, create a new vApp in your cluster and drag the Ops Manager, Ops Manager Director and all of the Elastic Runtime VMs into the vApp
  4. Run the “PCFvAppStartupOrder” workflow, select your new vApp as the input, click Submit
  5. If the PCF installation is scaled out to more VMs, just drag them to the vApp and rerun the workflow

How it works/What it does

  • The correct order is stored in a string array
  • The deployment, job and director custom fields are read for each VM in the vApp to get the VM’s assigned role
  • For the Ops Manager, the Notes field is read and if found, it is placed at the top of the startup sequnce
  • Unknown VMs are assigned a startup order higher than the last in the array.  This way, they start last and power-off first
  • Unknown VMs are those where the “deployment” field does not start with “cf”; with exceptions for Ops Manager (Notes field) and Ops Manager Director (“director” field value is “bosh-init”)

Additional suggestions and notes

  • Adjust the resources for the vApp based on VMware best practices and what makes sense for your environment
  • Use this at your own risk, there is no implied warranty