vRealize Automation DEM worker cannot connect to Orchestrator

In vRA 6.2, using vRO 6.0, you may find that the data collection and other vRO workflows fail with the error “You must have at least one properly configured vCenter Orchestrator endpoint that is reachable”.  The IaaS/Monitoring/Log will show which DEM worker threw the error.  When you check the DEM worker logs for that instance, if you find the message “Could not create SSL/TLS secure channel. —> System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel“, you have probably been affected by VMKB 2123455 and MS KB 3061588.

Although both articles seem to suggest that removing the offending patch will solve the problem, I think figuring out exactly which patch is rather awkward.  The easier fix is to apply a quick registry hack to your DEM workers (and wherever the vRA Designer runs).

  1. Logon as an account with admin rights (suggest the account your IaaS services run under)
  2. locate or add the key

    HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\
    KeyExchangeAlgorithms\Diffie-Hellman

  3. Add/update the DWORD value ClientMinKeyBitLength and set the value to 512 decimal (200 hex)
  4. Restart the DEM worker service

 

Notes:
The Microsoft patch sets the default minimum group size to 1024.  It appears that the vRO 6.0.x appliances use something less than that.  This registry hack indicates that SCHANNEL should accept keys as small as 512 bits.  I suggest only applying this to the necessary and affected machines since it does lower the bar for the DHE security requirements.

Thanks to Zach Milleson for reminding me that this workaround may not resolve everyone’s issue, depending on which MS patches are installed.  If this workaround doesn’t work for you, you may have to locate and remove the offending patch.  YMMV.

Pivotal Cloud Foundry vApp startup order workflow

After installing Pivotal Cloud Foundry (PCF) on vSphere, you’ll have a collection of at least 21 (probably closer to 60!) VMs with names that probably don’t match anyone’s convention.  Although, as noted in the PCF documentation, there is a correct order to starting up and shutting down the VMs in PCF, the installer does not configure a vApp so that we can control that order.  So, I dragged all the PCF VMs into a vApp and starting trying to determine which ones are in which role and quickly realized that it’s a pain.

Creating an AZ in Ops Manager on vSphere

Creating an AZ in Ops Manager on vSphere

As an aside, when you create your Availability Zone, you point it at a vSphere cluster and, optionally, a Resource Pool.  Unfortunately, if you specify a vApp Name instead of a Resource Pool name, BOSH will fail to deploy the VMs.  So, I’ve typically leave the Resource Pool field blank and then drag the VMs into a vApp post-deployment.

I put together a workflow that will help place the PCF VMs into correct startup/shutdown groups for you.

Example PCF VMNames

Example PCF VMNames

Instructions for Use

  1. Download the package from here
  2. Import the package into vRealize Orchestrator
  3. If you haven’t already, create a new vApp in your cluster and drag the Ops Manager, Ops Manager Director and all of the Elastic Runtime VMs into the vApp
  4. Run the “PCFvAppStartupOrder” workflow, select your new vApp as the input, click Submit
  5. If the PCF installation is scaled out to more VMs, just drag them to the vApp and rerun the workflow

How it works/What it does

  • The correct order is stored in a string array
  • The deployment, job and director custom fields are read for each VM in the vApp to get the VM’s assigned role
  • For the Ops Manager, the Notes field is read and if found, it is placed at the top of the startup sequnce
  • Unknown VMs are assigned a startup order higher than the last in the array.  This way, they start last and power-off first
  • Unknown VMs are those where the “deployment” field does not start with “cf”; with exceptions for Ops Manager (Notes field) and Ops Manager Director (“director” field value is “bosh-init”)

Additional suggestions and notes

  • Adjust the resources for the vApp based on VMware best practices and what makes sense for your environment
  • Use this at your own risk, there is no implied warranty

Configuring vRealize Log Insight Agent on MS SQL Server 2012

There’s a lot of information available about the SQL Server Content Pack for Log Insight, but actually  trying to use it reveals a lot of gaps.  So, here’s what I’ve done to get my SQL Server 2012 R2 instances to correctly report their information to vRealize Log Insight and show up in the Content Pack filters.

To ensure the correct log directories are used and allow reuse of filelog sections, I suggest creating an agent group for each SQL Server.  I’m also a fan of the server-side agent configuration, so if the agent gets totally removed and reinstalled, it’ll acquire the correct configuration again.

Steps:

  1. Install the Log Insight Agent for Windows on the SQL Server machine, be sure to use the FQDN of the Log Insight VIP
  2. While logged on to the SQL Server, locate the Logs folder for each SQL server instance. Note that this is not the database transaction logs folder – if it contains *.ldf files, it’s the wrong folder.   Specifically, you want the folder that contains the ERRORLOG file.  To be certain follow these steps:
    • Launch SQL Server Configuration Manager
    • Select “SQL Server Services” from the left pane
    • Right-click “SQL Server (INSTANCENAME)” from the right page, choose “Properties”.
    • Select the “Startup Parameters” tab
    • In the Existing Parameters box, make a note of the parameter that starts with “-e”
    • The filepath in that parameter (minus “ERRORLOG”) is the folder we need.
    • DO NOT CHANGE ANY VALUES, Cancel Properties, Close Configuration Manager

Startup Params in ConfigurationManager

  1. Logon to Log Insight as an administrator
  2. Check the Content Package area to be sure the “Microsoft – SQL Server” Content Pack is installed
  3. In the next steps, we’ll configure the agent from the server-side. It is possible to do from the client-side.
  4. In the Administration|Agents section of Log Insight select Microsoft – SQL Server from Available Templates drop-down list
  5. Click the “Copy Template” button. In the Copy Agent Group box that appears, change the name to “Microsoft – SQL Server – SERVERNAME”.  (replace SERVERNAME with the actual server’s name).  Click “Copy”.
  6. Now, the dropdown list should indicate that you are editing the new Agent Group
  7. In the Filter box, edit the parameters to create a filter that identifies only the desired server. For example “Hostname | starts with | SERVERNAME”
  8. In the Agent Configuration section
    • Update the section header from “[filelog|MSSQL]” to [filelog|MSSQL-SERVERNAME-INSTANCENAME]” replace SERVERNAME and INSTANCENAME with the correct, actual values. Be sure to keep “MSSQL” as the prefix, since the dashboard elements key on that.
    • Update the directory value to where the ERRORLOG file was found
    • Update the exclude value to read *.trc;*.xel;*.mdmp
    • Example:

[filelog|MSSQL-COREDB1-MSQLSERVER]
; IMPORTANT: Change the directory as per the environment
directory=C:\MSSQL\DATA\MSSQL12.MSSQLSERVER\MSSQL\Log
tags={“ms_product”:”mssql”}
charset=UTF-16LE
exclude=*.trc;*.xel;*.mdmp

  1. If there are multiple instances of SQL server on the host server, copy the entire section and edit the section header with the Instance Name. Example of configuration for multiple instances:

[filelog|MSSQL-COREDB1-MSSQLSERVER]
; IMPORTANT: Change the directory as per the environment
directory=C:\MSSQL\DATA\MSSQL12.MSSQLSERVER\MSSQL\Log
tags={“ms_product”:”mssql”}
charset=UTF-16LE
exclude=*.trc;*.xel;*.mdmp

[filelog|MSSQL-COREDB1-WORKLOAD]
; IMPORTANT: Change the directory as per the environment
directory=C:\MSSQL\DATA\MSSQL12.WORKLOAD\MSSQL\Log
tags={“ms_product”:”mssql”}
charset=UTF-16LE
exclude=*.trc;*.xel;*.mdmp

  1. Click Save New Group.
  2. On the SQL Server, restart the “VMware vRealize Log Insight Agent”
  3. Navigate to %ALLUSERSPROFILE%\VMware\Log Insight Agent and open liagent-effective.ini with notepad. Check that the sections added above appear in this file.
    • If they do not, you may have to adjust the filters on the Agent Group

Automating syslog configuration on NSX

03/09/2016 Comments off

If you’re here, you’ve probably already found VMKB2092228 and been frustrated by the lack of an easy and consistent way to configure all the NSX components to send log data.  Me too.

I put together a vRO workflow to help configure (and reconfigure) syslog on NSX.

  1. It prompts for the NSX manager connection info and Syslog server info
  2. Creates a REST host for the NSX manager using the admin super-user account
  3. Adds several REST operations to the NSX Manager REST host
  4. Deletes and readds the syslog configuration on the NSX Manager
  5. Optionally configures Activity Monitoring on the NSX Manager
  6. Identifies each running Controller; deletes and readds the syslog config on each
  7. Identifies each deployed and “Green” Edge (including DLRs); deletes and readds the syslog config on each
  8. Removes the NSX Manager REST host created earlier

Notes

  • It must use the NSX Manager admin account – it’s the only superuser account that can update teh manager config.
  • If a Controller is not running when the workflow runs, its syslog config will not be updated
  • If an Edge’ state is not “green” or is not fully deployed, its syslog config will not be updated
  • REST-HTTP plugin for vRO is required
  • Tested with vRO 6.0.2 and NSX 6.2.1
  1. Get the package Download Here

    By downloading any code, package or file, you acknowledge that:There is no explicit or implied warranty or support for the code.  Neither Brian Ragazzi, his employer nor anyone else is responsible for any problems, errors, omissions, unexpected behavior, breakage, trauma, outage, fatigue, lost time, lost work or incontinence that may occur as a result of using the code or package.

  2. Import package into vRO
  3. Run the ConfigureSyslogNSX workflow
    Workflow Step 1

    Workflow Step 1

    Workflow Step 2

    Workflow Step 2

About Cross-vCenter NSX

02/02/2016 Comments off

I’ve spent the past several weeks testing, trying to understand Cross-vCenter NSX and make it work in a useful way.  Here’s what I’ve learned so far.

Environment and Set up:

I knew I needed two “sites” with discrete storage, but had a few physical limitations.  I.e; I only own one managed router (Cisco C2821) and one managed switch (Force 10 S50).  I trunked the management and vtep network to all the hosts, but configured a discrete transit network for each “site”.  I hope to work with a much larger lab environment and do a thorough review of the setup and configuration.  In the meantime, here’s the basics.

  • Two-host cluster, NSX manager and vCenter Server for primary site
  • 1 host cluster, NSX manager and vCenter Server for recovery site
  • vSphere 6.0 U1b
  • NSX for vSphere 6.2.1
  • Universal Transport Zone includes both clusters, configured for Local Egress.
  • VTEPs in same L3 network (for simplicity’s sake)
  • Edge per site, discrete transit networks
  • Universal DLR for tenant networks, uplinked to both site ESGs
  • OSPF area for each ESG->UDLR
Partial Lab Network

Partial Lab Network

Objectives

  1. Eliminate need for manual synchronization between sites/NSX instances
    • Success! The distributed firewall and universal objects are respected by the NSX manager at both sites.  Universal Security Groups and Logical Switches are usable at either end.
  2. Span VXLANs between sites
    • Success! This is not really a surprise.  As long as the clusters share a Transport Zone and teh VTEPs can route to each other, this works great.
  3. Minimize network alterations when used with Site Recovery Manager to failover protected VMs between vCenter Servers.
    • Partial success.  A placeholder/recovered VM will be in the same Universal Security Groups as the protected VM and the distributed firewall rules will apply as expected without any changes needed.  However, even with Local Egress enabled, you’ll have to apply some sort of route updates so that traffic destined for the recovered VMs can get there.  It looks like the route redistribution for egress us handled automatically though.  This configuration pushed the limits of what I can do with my lab.
  4. Make use of site-to-site microsegmentation (my definition:  application of security policy regardless of L3 network scope)
    • Partial Success.  I was really disappointed here. When the document states that you can add IP Sets, MAC Sets and other Universal Security Groups to a Universal Security Group, it means that’s ALL you can add.  I’ve blogged a few times about adding VMs to a Security Group – even doing it as a Resource Action in vRA.  That won’t work with Universal Security Groups!  The only workaround I see is to add the desired VM’s IP or MAC to a Set and include the Set in the Security group.  Blech!
  5. Retain ability to assign security policy via security group membership through vRA
    • FAILURE.  As noted above, we cannot assign Universal security group membership to a VM using any of the existing custom properties.

Notes and Observations

I guess I get it.  The vm id between vCenter Servers will differ, so saying that “vm-123 is a member of securitygroup-456” is not valid on a different vCenter Server.  But a table of IPs and MACs would be universally valid.  I’m hoping that in a near-future version, the universal security group membership capabilities can be extended; perhaps with a shared or replicated vCenter Inventory Service.

 

Resolving Site Recovery Manager Error “Unable to create protection group. No VRM registered with vCenter Server”

Just a quick note here about this error.  I felt kinda silly once I realized what I’d done.

Scenario:

  1. Installed a vCenter Server at both sites
  2. Deployed a vSphere Replication 6.1 appliance at both sites.  Connected each to its corresponding PSC and vCenter Server
  3. Installed Site Recovery Manager 6.1 on each site, registered to corresponding vCenter Server
  4. Complete mappings and basic configurations in SRM, confirm VR is available at each site

    SRM configured with VR

    SRM configured with VR

  5. Attempt to create a protection group
  6. Receive Error “Unable to create protection group. No VRM registered with vCenter Server …”
  7. Google frantically

Its not clear from the documentation, but you have to configure at least one VM for replication using vSphere Replication section of Web Client first.  Totally counter-intuitive.  Just right-click the target VM and select “All vSphere Replication Actions | Configure Replication” and walk through the wizard.  Once complete, you can return to the SRM configuration and succesfully set up a Protection Group and subsequent Recovery Plan.

Weak Diffie-Hellman key in vRealize Orchestrator

{Edited Oct 19 2015 to reflect updated information inVMKB 2131619}

Recent versions of Google Chrome and Mozilla Firefox have begun rejecting connections using SSLv3 ciphers. Chrome complains of a weak ephemeral Diffie-Hellman public key, calling it a “disastrous misconfiguration”.  Firefox’s message also complains of a weak ephemeral Diffie-Hellman key in Server Key Exchange, but doesn’t foreshadow impending doom.

Interestingly (I guess), Internet Explorer 11, still happily connects…

Firefox message on vRO configuration page

Firefox message on vRO configuration page

Chrome message on vRO configuration page

Chrome message on vRO configuration page

Let’s fix Orchestrator so that we can use FF and Chrome…

Procedure

Confirmed this works on the vCO Appliance v5.5.2.1 through v6.0.2.1 and on the vRealize Automation Appliance v6.2.x

  1. SSH into the appliance
  2. Enter this to navigate to the configuration for the configuration page

    cd /etc/vco/configuration

  3. Enter this to backup the server.xml file

    cp ./server.xml ./serverxml.backup

  4. Use vi, or whatever you’re familiar with, to edit server.xml and replace the line that reads (as one line)

    ciphers=“TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
    TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
    TLS_RSA_WITH_AES_256_CBC_SHA,
    TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
    TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
    TLS_RSA_WITH_AES_128_CBC_SHA”

    with (again, as one line)

    ciphers=“TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
    TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
    TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
    TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
    TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,
    TLS_RSA_WITH_3DES_EDE_CBC_SHA”

  5. save the file
  6. Repeat the steps above for /etc/vco/app-server/server.xml
  7. Restart the vco-server and vcp-configurator services

    service restart vco-server
    service restart vco-configurator

That’s it, you should be good to go. There’s probably other VMware applications that will need the same treatment though.

Not Diffie-Hellman

Dang, wrong Hellman