Performance issues on 6509 E switch

jwharrison · ‎12-09-2010

Recently we upgraded our data center switch performing the following tasks. The upgrade was driving by a faulty power supply slot preventing high input to the redundant power supplies. I will describe before and after.

The original config was a 6509, single Sup II MSFC, 2 10/100-48 port blades, 2 6148, 1 6148A and 2 6408 fiber blades.

The upgrade replaced and upgraded the chassis to a 6509E, added a second Sup II MSFC, replaced the two 6148 with 6148A. The IOS Version is

12.2(18)SXF17. The second Sup II is running in redundant mode and has no fiber connections on it.

Almost immediately after the upgrade, minor performance issues began to appear. VPN sessions from a 3020 Concentrator connected to FE ports began to drop RDP sessions to both VM servers on directly attached blade centers, as well as workstations on the LAN. Those workstations are on a 6509, connected via etherchannel fiber. Additionally, monitoring systems began randomly dropping packets to routers directly connected to Fast E ports on blades 3/4. The problem begins manifesting itself after 6pm in the evening when backup processes begin. Prior to the chassis upgrade, these issues were not present.

There are no reported errors, high CPU or other reported issues either in the logs, or with Solar Winds NPM. I am asking for any advanced commands to assist in isolating and eliminating these issues. Is it possible that some new features need to be enabled on the Sup to utilize the E chassis and backplane? Thanks in advance for any assistance in identifying this issue.

Dale Miller · ‎12-10-2010

Do the problems only happen during the backups? Is the problem isolated to the same VLAN(s) as the servers kicking off the back up. If so, from the information provided so far I would suspect unicast flooding. A quick way to confirm is to check for output drops and uniform output rate on the switch.

- show interfaces | incl drops|rate|is up

The output drops would explain dropped packets impacting other services and the almost uniform rate for most ports would confirm flooding in the VLAN. To determine the src/dst of the traffic flooding stick a sniffer to an access port in the suspect VLAN(s). You will see unicast flow(s) on the port confirming the issue.

There is an excellent whitepaper on the subject that covers all the details.

Unicast flooding in Campus Switch Networks -

http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801d0808.shtml

The most common cause for unicast flooding is one of the following:

1. Assymetrical Routing

2. Spanning-Tree instability

3. CAM table exhaustion

By far the most common cause is Assymentrical Routing that may have been introduced due to a change in the path after the chassis was swapped out. The best way to resolve unicast flooding due to assymetrical routing is to raise your mac aging timers to match ARP (14,200 secs).

If STP instablility is the problem you will see a high amount of TCNs reported.

- show spanning-tree det | incl ieee|occurr|from|is exec

HTH,

Dale