Use Case: Link Aggregation to Maximize iSCSI

The vSphere solution allows Storage vMotion and other functions that require the greatest amount of bandwidth possible, especially given the fact that we do leverage local blade storage for particular VMs. (What we lose in resiliency we gain back in the most efficient use of space possible.) Unfortunately we are limited to a 1GbE backbone as upgrading to 10GbE was simply too expensive; this led to the need to maximize throughput.

Our setup is as follows:

  1. Blades: 192GB RAM, 1.6TB RAID5 local storage (10k/RPM), 6 1GbE NICs
  2. Blade Chassis: HDS CB2000 with 4 onboard Cisco-compatible switches
  3. NetGear GS748TS switches
  4. NetApp FAS2240 with 4 1GbE uplinks (also some FC but we do not have the switch fabric to support)

The approach for storage bandwidth maximization is to aggregate the physical ports. Aggregation means to combine multiple dedicated switch ports into a single logical virtual port where bandwidth is distributed equally across all allocated ports. This section covers the requirements for this capability from three viewpoints:

  1. HDS CB2000 switch programming (term is Link Aggregation; Cisco calls this “trunking”).
  2. NetGear GS748TS switch (Link Aggregated Group or LAG)
  3. NetApp FAS2240 (Virtual Interfaceor VIF).

Because the aggregated switch ports appear as a single entity to consumers the ESXi hosts could be left unchanged.

Steps are as follows:

On Infrastructure CB2000 Switch 1:

  1. Determine the uplink ports that can be dedicated to storage. Each CB2000 backplane switch provides a total of 4 gigabit uplink ports so an entire backplane switch was dedicated just to the dvPgStorage portgroup. Per HDS switch documentation, prior to doing anything else all 4 proposed uplink ports were shutdown (e.g. int range gig 0/1-4; shutdown).
  2. Connect physical cables to the CB2000 switch uplink ports (but not to the other side).
  3. The CB2000 backplane switch closely resembles a Cisco switch; as with Cisco IOS one first creates a port-channel with settings to match the gigabit uplink port range. In our environment, example is as follows:
    int port-channel 24
    shutdown
    description "24 - Storage port aggregation"
    switchport mode trunk
    switchport trunk allowed vlan 240

    Note that the port-channel is 24 instead of 240 to match our Storage VLAN. This is a switch limitation (port-channel has max value of 128).
  4. Attach the gigabit uplink port range to the port-channel interface. Example on how to set the mode to static:
    int range gig 0/1-4
    channel-group 24 mode on

    Once done, since a static configuration is being used then it can make things easier to configure speed and negotiation settings:
    speed 1000
    duplex full
    flowcontrol send on

    Configure other settings for the gigabit uplink port range as necessary. For our environment, each uplink was configured as follows:
    speed 1000
    duplex full
    flowcontrol send on
    mtu 9000
    switchport mode trunk
    switchport trunk allowed vlan 240
    channel-group 24 mode on

    Note that jumbo frames are enabled above; jumbo frames are a separate topic.
  5. Do not enable the gigabit uplink ports yet, get all other configurations performed.

On NetGear Switch for CB2000 Switch Integration:

  1. Determine the ports for connection. As an example, from the CB2000 Switch 1 will connect to ports 37-40 on the NetGear switch. For each of these ports, name them appropriately via the NetGear user interface (such as “Uplink – Infra CB2000 Switch 1/1” for the first uplink from the CB2000 switch). Also per NetGear docs check that ports are configured correctly:
    • no VLAN memberships (also set PVID, or the “port-based VLAN ID”, to 1 which is the default VLAN)
    • explicit portspeed (1000M)
    • no auto-negotiation
    • flow control enabled
    • MDIX to indicate a switch is attached
  2. Shutdown the ports and connect physical cables from the other switch (for this use case: from the CB2000 Switch 1 to the ports on the NetGear switch N3).
  3. Create the Link Aggregated Group (LAG). For this example between CB2000 Switch 1 and the NetGear switch, use LAG1 (only 8 LAGs are permitted). Set the description to something meaningful like “Infra CB2000 Switch 1 Uplinks”. Set values as follows:
    • Type – “static”
    • Speed – “1000M”
    • Duplex Mode – “Full”
    • Auto Negotiation – “Disable” (this must match the switch on the other side)
  4. Add ports to the LAG. Ports must not be tagged on any VLANs or they will not add.
  5. Within the VLAN Membership screen on the NetGear switch, assign the LAG to the appropriate VLAN (such as 240 for our storage VLAN).
  6. Do not enable any ports yet.

Milestone 1 – Check CB2000 and NetGear Switches

  1. Enable all NetGear ports.
  2. Enable the NetGear LAG (this auto-enables managed ports)
  3. Enable the CB2000 ports (no shutdown).
  4. Enable the CB2000 port-channel (no shutdown).
  5. On the NetGear, update the LAG Advanced Configuration and verify that the LAG is “Up”
  6. On the CB2000 switch, run the command show channel-group [number] where for our environment it would be show channel-group 24. Verify that the status displays and that the aggregate is up.

Continue with other configuration; for this use case the next step is to get port aggregation connected to the storage processor (SP).

On NetGear Switch for CB2000 Switch Integration:

  1. Determine the ports to use for the NetApp filer. For example, one could use the following ports: 24 (Filer 1 – e0b); 32 (Filer 1 – e0c); 33 (Filer 2 – e0c); and so on. Be sure to document all port usage!
  2. Disable all selected ports and configure port settings exactly as for LAG1 described above. For this use case, only 3 uplink ports are being used from the NetApp filer as e0a on the filer is reserved for Management access; thus, 3 matching ports on the NetGear switch are used: 24, 32, and 34. Document these as you did the physical port usage on each switch.
  3. Connect physical cables to the NetGear switch ports (but not to the other side).
  4. Configure the LAG on the NetGear switch (LAG2 for this use case). The settings must be applied to match the attached switch. For example, the NetApp filer requires auto-negotiate, so on the NetGear switch LAG the “Auto Negotiation” flag must be set to “Enable” within the NetGear UI.
  5. Add the ports to the LAG.
  6. Add the LAG to the VLAN membership. For this use case, the reader would find that both LAGs 1 and 2 on the NetGear switch are members of the 240 (Storage) VLAN.
  7. Do not enable any ports yet.

On NetApp FAS2240:

  1. The NetApp term for port aggregation is an “Interface Group” (ifgrp). Also, for this use case only one filer can be used. This is because for true redundancy one would use “Active / Active” failover; however, space limitations preclude this. Thus, only filer head 0 (zero) is configured.
  2. Disable all of the selected NICs on the NetApp filer. For this use, three NICs are being used for uplink: e0b, e0c, and e0d. (The NIC e0a is used for management and NFS.) Bring down each NIC; for example: ifconfig e0b down.
  3. Connect the physical cables from the switch (the NetGear switch N3 in this use case) to the NICs on the NetApp filer.
  4. Set the individual NIC settings to match the settings on the switch. As an example: ifconfig e0b mtusize 9000 flowcontrol full mediatype auto would set the MTU to 9000 (jumbo frames), auto-negatiation, and flow control for NIC e0b.
  5. Verify that traffic can flow. For example, from a connected ESXi host one could perform a vmkping. There are a number of useful tools to verify connectivity; refer to switch vendor documentation to see how port aggregations can be monitored.
  6. Once traffic flow is verified, update the /etc/rc on the NetApp filer so that required settings are in place upon a device reboot. For our environment’s storage, the following was chosen (note that interfaces e0b, e0c, and e0d are in the aggregation):
    ifconfig e0b down
    ifconfig e0c down
    ifconfig e0d down
    ifconfig e0b mtusize 9000 flowcontrol full mediatype auto
    ifconfig e0c mtusize 9000 flowcontrol full mediatype auto
    ifconfig e0d mtusize 9000 flowcontrol full mediatype auto
    ifgrp create multi Vif1 e0b e0c e0d
    vlan create Vif1 240
    ifconfig Vif1-240 172.28.4.160 netmask 255.255.255.0 partner Vif1-240
    ifconfig Vif1-240 up

Congratulations on setting up port aggregation!

However – here’s one critical point: Within the Cisco switch it is possible to indicate how load is distributed through the LAG. Options are Round Robin, src IP / dst IP, src port / dst port. I wanted to use src port / dst port as that would – for iSCSI – give me good usage of all uplink ports. But it did not work! I suspect the problem is at Layer 2 – the NetGear switches, although they “know” about the LAG, seem to want individual MAC resolution to stay on a given line. This would mean that – if all of my ESXi hosts are in use and load is balanced via DRS – that all of my uplinks in the LAG would be used. However, it required src IP / dst IP line selection policy in the switch which means that once an ESXi host is “assigned” a particular uplink, that host will *always* use that uplink. I could not figure out a way around this. Grr…

Team-oriented systems mentor with deep knowledge of numerous software methodologies, technologies, languages, and operating systems. Excited about turning emerging technology into working production-ready systems. Focused on moving software teams to a higher level of world-class application development. Specialties:Software analysis and development...Product management through the entire lifecycle...Discrete product integration specialist!

Leave a Reply

Your email address will not be published. Required fields are marked *

*