This is the second in my series for building a fully distributed vCAC deployment. In this part, we’re building the vPostgres database server with replication for use with vCAC 6.x.
I’m using v9.2.6.0. The vCAC 6.0 Support Matrix says 9.2.4 is supported but the PDF version of the Installation and Configuration guide says 9.2.4 or higher is supported. I originally wanted to use 9.3.2.0 because the documentation includes replication, but I’m unsure whether it’s officially supported with vCAC 6.x yet. We’ll still configure replication though 🙂 I’m going to front-end the vPostgres nodes with a vCNS Edge Gateway load balancer so that in the case of a failure, we don’t have to reconfigure the vCAC appliance database connection. Updated documentation shows that for vCAC 6.0, vPostgres v9.2.4 is supported, v9.2.6 and v9.3.4 were untested. For vCAC 6.1, versions 9.2.4, 9.2.6 and 9.3.4 are supported.
Prerequisites:
- Reserve IP addresses for the appliances and the virtual IP.
- Add DNS entries for the IP addresses. I used vpostgres1 for vpostgres2 for the appliances and vpostgres as the virtual/load-balanced name/address.
vProgres setup Steps
- Download the VMware vFabric Postgres Appliance from my.vmware.com.
- Deploy the vFabric Postgres Appliance from OVF twice. I named them vPostgres1 and vPostgres2. vPostgres1 will be the master and vPostgres2 will be the slave.
- Power on vPostgres1, browse to https://vpostgres1:5480, logon as root using the password you entered during deployment.
- Configure the hostname (eg: vpostgres1.ragazzi.lab) and timezone
- Browse to https://vpostgres1:8443, leave the default values, enter your password and click “Connect” to enter the Enterprise Manager (vpgdbem)
- Click on localhost:5432/DB Login Users to list the existing users (just “postgres” so far)
- Click the green plus to add a new DB Login user. In the properties, enter “vcac” (or whatever you want) as the name, check “Enable login”, do not check “Can create DB login users” and set the password. Click OK to save.
- Click on localhost:5432 to display the overview and list of databases (just “postgres” so far)
- Click the green plus to Create a new database. Just enter “vcacdb” (or similar) for the name, set the Owner to “vcac”, add a comment if you wish and click “OK” to save. Click the refresh button (blue ccw arrow) to refresh the list.
- Expand the Databases item under the treeview and select your new “vcacdb” database. The database overview should load, displaying the uptime, size and more.
- Toward the right side of the window is a button labelled “Enter SQL”, click it.
- In the SQL Script area, type the following:
CREATE EXTENSION "hstore";
CREATE EXTENSION "uuid-ossp" - Click “Execute” and check the Output|Messages area for SQL query succeeded
- Click the X to close the SQL window
- Complete steps 2-4 for vpostgres2. Do not configure any users or databases on vPostgres2.
- SSH into vpostgres1, logon as “postgres”, not root.
- Run this command to create a replication user named “replicate”:
v9.2
/opt/vmware/vpostgres/current/share/create_replication_user replicate
v9.3
/opt/vmware/vpostgres/current/scripts/create_replication_user replicate
You’ll be prompted for a password and confirmation.
- SSH into on vpostgres2 as “postgres”
- Run this command to configure vpostgres2 as a replica:
/opt/vmware/vpostgres/current/share/run_as_replica -h 192.168.101.31-b -W -U replicate
Obviously, replace the red text with the IP address of the master vPostgres server. First you’ll be prompted for the password for the “replicate” user, then you’ll confirm the authenticity of the connection, then you’ll be prompted to enter the password for the postgres user on the master. Next, you’ll confirm that you want to enable WAL archiving on the primary/master by typing “yes” and lastly, you’ll confirm your intention to overwrite the data directory with the databases from the master. It’ll copy the tablespace over.
- Run this command on vpostgres1 to verify the replication:
/opt/vmware/vpostgres/current/share/show_replication_status
Load-Balancer setup steps
I’m going to use the load-balancer feature in vCloud Networking and Security Edge Gateway. It’s not the most intelligent Load-Balancer ever, but it’s what I have.
- Using vCNS Manager, locate the appropriate edge gateway, click Actions|Manage to open it for editing
- On the Configure Tab, edit the interface that will listen on the virtual IP
- Edit the Subnet and add the Virtual IP. It’s probably not the primary IP. Save and publish those changes
- On the Load Balancer tab, on the Pools page, click “Enable”, then “Publish Changes”
- Click the green plus to add a load-balancing pool
- Enter a recognizable Name and Description, click “Next”
- On the Services step, check only TCP, set Balancing Method to “ROUND_ROBIN” and the Port to 5432. Click “Next”
- On the Health Check step, set it as shown. Click “Next” when done.
- On the members step, click the green plus to add the IP address of you SSO servers to the pool. Add the primary/master vPostgress server with a weight of 1 or higher. Add the slave/replica with a weight of 0 (zero). This will ensure all of the traffic goes to the primary until it is changed in the event of a primary failure. Keep the TCP port and Monitor Port at 5432 for each. Click “Next” once all you members are added.
- Review the Ready to complete step and click “Finish” if it all correct
- Click the Publish Changes Button before proceeding
- Click the “Virtual Servers” link, then the green plus to add a Virtual Server
- Enter a meaningful name and description, provide the Virtual IP adddress that you added to the edge earlier, select the Pool created in the steps above and Enable TCP on port 5432. Make sure the “Enabled” box is checked. Click “Add” then “Publish Changes”
Now, when you configure your vCAC Appliance, provide the host name that resolves to the virtual IP address.
Dealing with a failure
By default, the replica acts like a read-only copy of the database. It has a very short replication delay, so do not count on it to save you if you delete things from the primary.
When to promote a replica:
- You’ve screwed up the network settings on the primary vPostgres node beyond repair; preventing vCAC from using it and replication from occurring
- You’ve applied an update to the primary vPostgres node that broke it; preventing vCAC from using it and replication from occurring
When to NOT promote a replica
- You deleted a bunch of stuff from vCAC. Too late! Those changes have already replicated
- The physical host where the primary vPostgres virtual appliance was running has failed. Just wait for vSphere HA to being it back online
- You want to see it run active/active. It does’t do that. relax
Recovery Procedure
- See if the primary/master node is up. If it is, stop here.
- Using the vCNS Manager web interface, edit the load-balancing pool, setting the weight for vpostgres1 (which has failed) to 0 (zero) and the weight for vpostgres2 (which we’re going to promote) to 1. Save and publish changes.
- SSH into the slave (vpostgres2) as postgres
- Run this command to promote the slave to master:
/opt/vmware/vpostgres/current/share/promote_replica_to_primary
The response will be “server promoting“
- If/When vpostgres1 comes back to life, you’ll need to configure it as a replica to vpostgres2. Do this by running the command from step 19 above.
- Now if you want to make vpostgres1 primary again, I strongly suggest you stop the vcac_service on your vCAC appliances. Then, you’ll just promote it like you did before and make vpostgres2 a replica again.