Twitter has been abuzz all week with talk about the Tolly Group comparison report on the ‘Network Bandwidth Scalability’ between the HP C7000 and the Cisco UCS 5108. There has already been a lot said on Kevin Houston’s blog, but I thought I might wade in with my thoughts.
First things first, I have no real world experience of UCS, but I have deployed a few C7000s. This is my get out of jail clause in case some of my UCS information is incorrect.
The report pits a HP C7000 with 6 BL460c G6 blades and one Flex-10 Ethernet Module against a Cisco UCS 5108 chassis with 6 B200 blades, two UCS 2100 Fabric Extenders in an active/passive configuration and a UCS 6100 Fabric Interconnect. The purpose of the report is to compare the available network bandwidth for bi-directional server to server transmissions in both a physical and virtual environment. It is worth noting that the report is sponsored by HP.
The C7000 blade chassis has 16 half height slots and eight backend IO slots. Each half height blade has two onboard NICs and capacity for one dual port mezzanine card and one quad port mezzanine card. The onboard NICs on G6 blades are 10GbE Flex-10 capable.

As can be seen in the diagram above, the onboard NICs map to bays 1 and 2, Mez 1 to Bays 3 and 4 and Mez 2 to Bays 5, 6, 7 and 8.
Having one Flex-10 Network module provides 10gb IO to each NIC with up to 4 virtual ‘FlexNICs’. Each FlexNIC can be carved from the physical NIC and dedicated a variable amount of bandwidth carved from the 10Gb. This module also provides a maximum of 8x10gb uplinks.
The UCS 5108 blade chassis has 8 half height slots and options for up to two UCS 2100 Fabric Extenders. Each Fabric Extender has a maximum of 4x10Gb uplinks but don’t forget, for this report, they are configured as active/passive. Each half height blade is pinned to one of these uplinks statically. Tolly had populated slots 3 to 8. In addition, it is worth remembering that all traffic from a blade has to be passed through the Fabric Extender (UCS 2100), up to the UCS 6100 and back down again.

To sum the test conditions up, the C7000 has a dedicated 10Gb port per each of the 6 blades, direct to the Flex-10 Network Module – that’ll be a 1:1 ratio right? The UCS 5108 has four 10Gb links up to the UCS 6100 to support 6 blades – or 1.5:1. This seems slightly skewed already.
Delving further in to the report (as much as you can delve in to 6 pages), two tests have been conducted. A physical to physical test and a virtual to virtual test. Baselines were conducted by sending traffic from an appointed blade to another in the chassis. Tolly levelled the Cisco test up by using blades 3 (Uplink C) and 5 (Uplink A) as well as 4 (Uplink D) and 6 (Uplink B). As these run on different uplinks, they receive dedicated bandwidth. Incidentally, Cisco won this baseline test by a “Doesn’t make a different in the Real World” margin.
Following the baseline, traffic from blades 7 (Uplink C) and 8 (Uplink D) were introduced. These share uplinks with blades 3 and 4. Contention you say? Surprise, that’s what Tolly’s results captured too!
For the virtual to virtual tests, virtual machines were VMotioned from blade 3 (Uplink C) to blade 5 (Uplink A). Again, Cisco won this baseline. They then conducted a migration between blade 3 (Uplink C) and blade 7 (Uplink C). Guess what they found. More bandwidth contention.
If you haven’t guessed it yet, as an HP customer, I find this report disappointing and bordering on insulting. It is based on the fact that all traffic occurs inside a single chassis and that active/active is the devils work. Can anyone point me to an Architect who has designed a production solution to run inside a single chassis please? No? Oh.
The fact that the uplink ratios aren’t equal off the bat is also dubious. “But we used 6 blades for both environments, that’s equal!”. Yes, but one chassis supports twice the amount of blades the other does, try it on a C3000 (NICs 1 and 2 on a C3000 go in to the same bay….). Why not use two Fabric Interconnects like any normal prod environment would?
In a proper production environment, communication occurs between blade chassis. This means uplinks to the aggregation layer must exist and with it comes contention. It isn’t a bad thing, it’s life. However this report makes it seem the work of the devil.
Let’s do some basic maths. A C7000, fully populated with half height blades will have 16x10Gb ports to the Flex-10 Network Module in Bay 1. I see that as 160Gb. Each Flex-10 Network Module has 8x10Gb uplinks. I see potential contention there.
But wait! You could be really smart and add a second Flex-10 Network Module to Bay 2. Each BL460c already has dual onboard 10Gb Flex-10 capable NICs. That way you could run 8 blades active/passive to bay 1 and 8 blades active/passive to bay 2. With 8x10Gb uplinks from each, you could truly achieve a 1:1 ratio to the aggregation layer. High 5! I’m trying to forget the fact I need to raise a PO to cover the 16x10Gb aggregation ports per blade chassis. And don’t tell me I could just utilise the internal 10Gb horizontal stacking between the bays as this might lead to contention!
The point is, bandwidth contention under certain conditions is always a risk depending on your cost, performance and availability requirements. All this test is showing me is that under certain configuration and loads, UCS can run in to contention. That’s fantastic, but I can prove this on a C7000 too.
What this test isn’t showing me is the apparent operational benefits of HP’s Bladesystem. That’s what I want as a customer. My problems are not one of tech they are operational and this is seemingly what Cisco are aiming to help me with. Oh well HP, it was a nice headline and any PR is good PR apparently. However, this customer has learnt far more about UCS by investigating these claims and I likes it. I likes it a lot [sic].