Archive

Posts Tagged ‘UCS’

How UCS QOS System Class can affect VMware NSX VXLANs

In a recent, ongoing, installation we encountered a wide variety of sporadic network traffic issues  affecting the VMs connected to NSX Logical Switches (VXLANs).

Some of the symptoms were:

  • Can ping a device, but not get a complete tracert
  • Can connect to a server over HTTPS, but not its neighbor (both webservers)
  • vCAC Gugent cannot connect to vCAC server
  • Were unable to perform a vmkping using the VXLAN TCP stack with more than 1470 bytes.

The last bullet made it pretty clear that the issue was related to the MTU.  We had no visibility into the configuration of the north-south layer 3 devices, but had been assured that they were configured for 1600 byte frames.

In the NSX for vSphere implementation of VXLAN, the packets sent by devices get a new additional header, increasing its overall size beyond 1500 bytes (up to ~1540 bytes or so).

I checked the UCS service profiles and the vNIC Templates, it looked something like this:

UCS vNIC template

UCS vNIC template

It certainly looks like its set for jumbo frames, but also notice the second red ellipse there; QoS Policy. If you pay attention to things like that, you might also notice the warning about the MTU size.

The QoS policy assigned to the vNIC (template) is uses an egress priority based on a QoS System Class.

UCS QoS Policy

UCS QoS Policy

The QoS System Classes specify not only a class-of-service and a weight, but also MTU! In my case, I changed the vNIC Template QoS Policy to one with a System Class whose MTU is  9216. Once this change was made, the VMs behaved as expected.

UCS System Class

UCS System Class

A couple of notes:

  • If your vNIC (templates) do not specify a QoS Policy, UCS appears to use the MTU given
  • If you do not have an enabled QoS System Class with an MTU of 9216, you’ll have to type it in, the dropdown list only contains “normal” and “fc”

This is another of those posts where I just stumbled upon the fix and needed to write it up before I forgot. Hopefully this will some someone a lot of time later.

VMware vSphere 5 AutoDeploy on Cisco UCS – Part 2: Image Profiles

After completing Part 1, we have DHCP configured to assign a reserved IP address to the Cisco B200 M2 blades when they boot to the vNIC. Now the goal is to create the image that the auto-deploy hosts will use..

The image building procedure sounds complicated, but once you break it down, it’s not too bad. First, we need to inventory the components (VIBs) that’ll be needed on the hosts; above-and-beyond the base install. In our case, we needed the HA agent, the Cisco Nexus 1000V VEM and the EMC NAS Plugin for VAAI. The HA driver will be downloaded from the vCenter Server, but you’ll have to download the licensed ZIP files from Cisco and EMC for the others.

In addition to the enhancements, we’ll need the VMware ESXi 5.0 offline bundle, “VMware-ESXi-5.0.0-469512-depot.zip” from the licensed product downloads area of VMware.com. This is essentially a “starter-kit” for image builder, it contains the default packages for ESXi 5.0.

Preparation:

  1. Copy these files into C:\depot
    • VMware-ESXi-5.0.0-469512-depot.zip
    • VEM500-201108271.zip
    • EMCNasPlugin-1.0-10.zip
  2. Launch PowerCLI

On to the PowerCLI code:

Register the offline bundle as a Software Depot (aka source)

Add-EsxSoftwareDepot “C:\depot\VMware-ESXi-5.0.0-469512-depot.zip”

Connect powerCLI to your vCenter server (replace x.x.x.x with your vCenter server’s name or IP)

Connect-VIServer –server x.x.x.x

List the image profiles contained in the offline bundle, ESXi-5.0.0-469512-no-tools and ESXi-5.0.0-469512-standard. We’re going to work with “standard”.

Get-EsxImageProfile

Register vCenter Server depot for HA agent

Add-EsxSoftwareDepot -DepotUrl http://X.X.X.X:80/vSphere-HA-depot

Register depot for updates to ESXi

Add-EsxSoftwareDepot -DepotUrl https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Register depot for Nexus 1000V VEM and VAAI plugin for VNX NAS

add-esxsoftwaredepot c:\depot\VEM500-201108271.zip
add-esxsoftwaredepot c:\depot\EMCNasPlugin-1.0-10.zip

List the image profiles, except now it will list several more versions. For each, there is a “no-tools” and a “standard”. Make a note of the newest “standard” image (or the one you want to use)

get-esximageprofile

Clones the standard “ESXi-5.0.0-20111204001” image profile to a new image profile with the name “ESXi-HA-VEM-VAAI-20111204001”

New-EsxImageProfile –cloneprofile ESXi-5.0.0-20111204001-standard –name “ESXi-HA-VEM-VAAI-20111204001”

Add the HA agent (vmware-fdm) to our custom image profile

Add-EsxSoftwarePackage -ImageProfile “ESXi-HA-VEM-VAAI-20111204001”-SoftwarePackage vmware-fdm

Check for the VEM package “cisco-vem-v131-esx”

get-esxsoftwarepackage -Name cisco*

Add the Nexus 1000V VEM to our custom image profile

add-esxsoftwarepackage -Imageprofile “ESXi-HA-VEM-VAAI-20111204001” -SoftwarePackage cisco-vem-v131-esx

Check for EMC VAAI Plugin for NAS “EMCNasPlugin”

get-esxsoftwarepackage -Name emc*

Add the EMC VAAI plugin for NAS to our custom image profile

add-esxsoftwarepackage -Imageprofile “ESXi-HA-VEM-VAAI-20111204001” -SoftwarePackage EMCNasPlugin

Export our custom image to a big zip file – we’ll use this to apply future updates

export-esximageprofile -imageprofile “ESXi-HA-VEM-VAAI-20111204001” -Filepath “C:\depot\ESXi-HA-VEM-VAAI-20111204001.zip” –ExporttoBundle

Deploy Rules
OK, now we have a nice image profile, let’s assign it to a deployment rule. To get Auto-Deploy working, we’ll need a good Host Profile and details from a reference host. So, we’ll apply our initial image profile to our reference host, then use our reference host to create a host profile and update the RuleSetCompliance

Create a new temporary rule with our image profile and an IP range; then add it to the active ruleset.

New-DeployRule –Name “TempRule” –Item “ESXi-HA-VEM-VAAI-20111204001 –Pattern “ipv4=10.10.0.23”
Add-DeployRule -DeployRule “TempRule”

At this point, we booted up the blade that would become the reference host. I knew that DHCP would give it the IP that we identified in the temporary deployment rule. BTW – Auto-deploy is not really fast, it takes 10 minutes or so from power-on to visible in vCenter.

Repair Ruleset
You may have noticed a warning about a component that is not auto-deploy ready;  we have to fix that.

In the following code, “referencehost.mydomain.com” is the FQDN of my reference host. This procedure will modify the ruleset to ignore the warning on the affected VIB.

Test-DeployRuleSetCompliance referencehost.mydomain.com
$tr = Test-DeployRuleSetCompliance referencehost.mydomain.com
Repair-DeployRuleSetCompliance $tr
Test-DeployRuleSetCompliance referencehost.mydomain.com

After this completes, reboot the reference host and add it to your Nexus 1000V DVS.

Part 3 (coming soon!) will cover the host profile and final updates to the deployment rules.

References:
https://communities.cisco.com/docs/DOC-26572

VMware vSphere 5 AutoDeploy on Cisco UCS – Part 1: DHCP

First, many thanks to Gabe and Duncan for their great Auto-Deploy guides that got me started.  Found here and here.  Their information answered a lot of questions, but left me with even more questions about how to implement it in my environment.

My goal is to demonstrate how to implement and configure vSphere Auto-deploy in a near-production environment that uses vSphere 5, Cisco UCS, EMC storage, Nexus 1000V and vShield Edge.

The first hurdle I ran into was trying to make DHCP cooperate.  I’m using vShield Edge for DHCP in some of the protected networks, but the Cisco 2900-series router is doing DHCP for the network where the vSphere Management addresses live.  In IOS for DHCP, you can assign a manual address in a pool via the “hardware-address” OR the “client-identifier” parameter.  Looks like “client-identifier” is used by DHCP, whereas “hardware-address” is used by BOOTP.  When booting, the blade first draws information via BOOTP, but after acquiring the details from TFTP, it changes its personality and sends another DHCP DISCOVER request.

Here’s how we got this working in our environment:

  • Identify permanent addresses for your hosts  (10.10.0.23 in this case)
  • Identify a temporary address for each host (10.10.0.123 is this case)
  • Make sure those addresses are not excluded

    ip dhcp excluded-address 10.10.0.0 10.10.0.20
    ip dhcp excluded-address 10.10.0.25 10.10.0.120
    ip dhcp excluded-address 10.10.0.125 10.10.0.210
    ip dhcp excluded-address 10.10.0.251 10.10.0.255

  • Create your “main” pool if it doesn’t already exist

    ip dhcp pool mgmt
    network 10.10.0.0 255.255.255.0
    default-router 10.10.0.253
    dns-server 10.10.0.61 10.10.0.62
    lease 0 8
    update arp

  • Create Pool for your permanent host address, make sure to use the “client-identifier” parameter

    ip dhcp pool AutoDeploy23
    host 10.10.0.23 255.255.255.0
    client-identifier 0100.25b5.0000.2d
    bootfile undionly.kpxe.vmw-hardwired
    next-server 10.10.0.50
    client-name AutoDeploy23
    dns-server 10.10.0.61 10.10.0.62
    option 66 ip 10.10.0.50
    option 67 ascii undionly.kpxe.vmw-hardwired
    default-router 10.10.0.253
    lease 0 8
    update arp

  • Create Pool for the temporary host address, assigned first by BOOTP and dropped after PXE boot

    ip dhcp pool AutoDeploy123
    host 10.10.0.123 255.255.255.0
    hardware-address 0025.b500.002d
    bootfile undionly.kpxe.vmw-hardwired
    next-server 10.10.0.50
    client-name AutoDeploy23
    dns-server 10.10.0.61 10.10.0.62
    option 66 ip 10.10.0.50
    option 67 ascii undionly.kpxe.vmw-hardwired
    default-router 10.10.0.253
    lease 0 8

Continue on to Part 2, covering the creation and assignment of the image profile

Cisco VN-Link is awesome

01/18/2011 Comments off

First, many thanks to Jeremy Waldrop and his walkthrough video.  This provided me with a lot of help and answers to the questions I had.

I’m so impressed with VN-Link that I’m kicking myself for not deploying it sooner.  In my view, it easily is a better choice than the Nexus 1000V.  Sure, it essentially uses the Nexus 1000V’s Virtual Ethernet Module, but since it doesn’t require the Supervisor Module (VSM) to run as a VM, you can use those processor cycles for other VMs.

In a VN-Link DVS, the relationship between vSphere and UCSM is much more apparent.  Because the switch “brains” are in the Fabric Interconnect and each VM gets assigned a dymanic vNIC, UCSM is aware of which VMs reside on which host and consume which vNIC.

I especially like that I can add port groups to the VN-Link DVS without using the CLI.  All of the virtual network configuration is performed via UCSM.  This makes for quick and easy additions of VLANs, port profiles and port groups.

This Cisco White Paper advocates Hypervisor Bypass, (which breaks vMotion, FT and snapshots by the way), but describes a 9 percent performance improvement by using VN-Link over a hypervisor-based switch.  A 9 percent improvement that doesn’t break things is a big deal, if you ask me.

There are cases where the VN-Link just won’t do:

  • There is no Fabric Interconnect
  • You must use Access Control Lists between VLANs
  • You must have SNMP monitoring of the VSM.

Beyond these cases, if you have the requisite components (Fabric Interconnects, M81KR VICs, vSphere Enterprise Plus), I’d suggest taking a strong look at VN-Link.

 

 

 

Experience in upgrading UCS to 1.4(1j)

01/15/2011 Comments off

The UCS deployment in the Mobile VCE is different from many deployments because it does not employ many of the redundant and fault-tolerant options and doesn’t run a production workload.  So, I have the flexibility to bring it down at almost any time for as long a duration as needed.

All this aside, it IS a complete Cisco UCS deployment with all the same behavior as if it were in production.  This means, I can perform an upgrade or configuration change in this environment first and work through all the ramifications before performing the same action on a production environment.

There’s a lot of excitement around the web about the new features in this upgrade and I’ve been looking forward to installing it.  For me, I’m excited about the lengthy list of fixes, the ability to integrate the management of the C-series server with the UCS Manager, FC port-channels and user-labels.

To start, I used the vSphere client to shut down the VMs I could and moved those I couldn’t to the C-series.  Please note that I have not yet connected the C-series to the Fabric Interconnect for integration yet (that’s another post).  Then I shutdown the blades themselves.

For the actual upgrade, I simply followed the upgrade guide – there’s no reason to go through the details of that here.

However, the experience was not exactly stress-free.

Although it makes sense, when the IO Module is rebooted, the Fabric Interconnect loses connection to the Chassis.  It cycled through a sequence of heart-stopping error messages before finally rediscovering the chassis and servers and stabilizing.  During this phase, it’s best to not look – the error messages led me to believe the IOM had become incompatible with the Fabric Interconnect.  Like I said, after a few minutes, the error messages all resolved and every component was successfully updated to 1.4.1.

GUI changes after upgrade

Nodes on the Equipment tree for Rack-Mounts/FEX and Rack-Mounts/Servers

 

 

 

 

 

 

User labels (Yes! )

Summary
I’ll be connecting the C-series to the Fabric Interconnect soon and am looking forward to setting up the FC port channel.

Resolve Hardware Status Alert SEL_FULLNESS

I noticed an alert on two UCS B250M2 hosts in the vSphere Client.  The alert Name was “Status of other host hardware objects”.  This isn’t helpful.  To get more information, you have to navigate to the Hardware Status tab of the host properties.  Here I saw more information about the alert.  It’s cryptically named “System Board 0 SEL_FULLNESS”.

SEL_FULLNESS alert in vSphere Client

This points to the System Event Log for the UCS blade itself.  Luckily, this is easily cleared by using the UCS Manager to navigate to the management Logs tab of the Server properties under Equipment.

Clear management Log for UCS Blade

Once there, you can back up and clear the SEL.  Within a few minutes, the vSphere sensors will update and the alert will be gone.

UPDATE:  Once UCSM has been updated to 1.4.1, the “Management Logs” tab is named “SEL Logs”

B200-M2 boot from FC-SAN

I picked up a few key pieces of information in the UCS Design class recently.  Most importantly was that the boot LUN id (H-LUN) must be 0 (zero).  One item I found confusing initially was identifying the correct WWPN for the SAN to use.  Once I found the right one, it worked like a champ.   Sort of.

As expected, the are multiple paths to the LUN for the vHBA, if the boot LUN gets trespassed (EMC’s term for a LUN being controlled by a processor other than its default), the path is no longer valid and the blade won’t boot.  I expect to have to have to configure the secondary/alternate targets, but for now, I just trespass the LUN back where it belongs.

I’ll update this post, once I have the resiliency worked out correctly.

 

 

Categories: Cisco UCS Configuration Tags: , , ,