Abnormal performance issue - Cisco 3750 (3 member stack)

Zaid Farooqui · ‎10-19-2011

HI,

we recently started having a problem where a certan users in an area started complain abt file transfers (user to user / email attachments etc )

in our scenaro the data center has a 4507RE which connects to Dept A via X2 (LRM) port over 10G fiber , similarly for Dept B its the same using the second X2 port (LRM) over 10G fiber.

Dept A has a localized network rack with 3x3750E switches stacked. Same is for Dept B.

After the complaint was launched , when we started investigating , the first thing i noticed was that the ping response from the core switch to Dept B stack was breaking... this was my ping response

DC_Core#ping 172.16.0.3 repeat 1000

Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 172.16.0.3, timeout is 2 seconds:
!!!!!!!!.!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!.!!!!!!!!!!!!!!!!!!!!!
!!!!.!!!!!!!!!!!.!!!!!!!!!!!!.!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!.!!!!!!!!.!!!!!!!!!!!!!!!
!!!!.!!!!!!!!!!!!.!!!!!!!.!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!.!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!.!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!.!.!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!.!!!!.!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 96 percent (968/1000), round-trip min/avg/max = 1/1/8 ms
DC_Core#

it breaks even if i specify the minimum datagram size of 36 bytes...

this was disturbing.... so i tried the same with Dept A stack , just to verify it it was a problem with the core.... which ran without a miss.. i tried Dept A stack with the max datagram size of 18024 bytes which also went without a miss..

DC_Core#ping 172.16.0.2 size 18024 repeat 1000

Type escape sequence to abort.
Sending 1000, 18024-byte ICMP Echos to 172.16.0.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 8/12/36 ms
DC_Core#

so OBVIOUSLY somthing is wrong with the Dept B stack ... but i can not see anything on the switch panel or our manage engine OP Manager console.

the memory utilization is 20 % , CPU is less then 10%, ...

i have tried restarting the stack.. how else can i resolve this ?? what else do i need to look for ?

Leo Laohoo · ‎10-19-2011

Post the command output "sh contr e ten" from both 3750E.

Zaid Farooqui · ‎10-19-2011

both Dept A & Dept B stacks show utilization = 0

Supplychain_SW#show controllers ten 3/0/1 utilization

Receive Bandwidth Percentage Utilization : 0

Transmit Bandwidth Percentage Utilization : 0

Leo Laohoo · ‎10-19-2011

No. That's not what I asked for.

Post the command output "sh contr e ten" from both 3750E.

Zaid Farooqui · ‎10-19-2011

Supplychain_SW#sh controllers ethernet-controller tenGigabitEthernet 3/0/1

     Transmit TenGigabitEthernet3/0/1         Receive
    687654406 Bytes                       1433533223 Bytes
      1333729 Unicast frames                 1563175 Unicast frames
        14462 Multicast frames                147813 Multicast frames
        22128 Broadcast frames                157173 Broadcast frames
            0 Too old frames              1391160299 Unicast bytes
            0 Deferred frames               18534353 Multicast bytes
            0 MTU exceeded frames           23358879 Broadcast bytes
            0 1 collision frames                   0 Alignment errors
            0 2 collision frames                   0 FCS errors
            0 3 collision frames                   0 Oversize frames
            0 4 collision frames                   0 Undersize frames
            0 5 collision frames                   0 Collision fragments
            0 6 collision frames
            0 7 collision frames              226959 Minimum size frames
            0 8 collision frames              351354 65 to 127 byte frames
            0 9 collision frames              247551 128 to 255 byte frames
            0 10 collision frames             152273 256 to 511 byte frames
            0 11 collision frames              61218 512 to 1023 byte frames
            0 12 collision frames             826092 1024 to 1518 byte frames
            0 13 collision frames                  0 Overrun frames
            0 14 collision frames                  0 Pause frames
            0 15 collision frames
            0 Excessive collisions                 0 Symbol error frames
            0 Late collisions                      0 Invalid frames, too large
            0 VLAN discard frames               2714 Valid frames, too large
            0 Excess defer frames                  0 Invalid frames, too small
       451421 64 byte frames                       0 Valid frames, too small
       171586 127 byte frames
       236140 255 byte frames                      0 Too old frames
        93663 511 byte frames                      0 Valid oversize frames
        69714 1023 byte frames                     0 System FCS error frames
       347795 1518 byte frames                     0 RxPortFifoFull drop frame
            0 Too large frames
            0 Good (1 coll) frames
            0 Good (>1 coll) frames

Supplychain_SW#

Jon Marshall · ‎10-19-2011

I would start by looking at the interfaces between the core switch and the 3750 stack. Check for dropped packets. It could be that your fibre or optical interfaces need replacing.

Jon

Zaid Farooqui · ‎10-19-2011

Hi jon,

i saw a report from OP manager with packt loss close to 2% while others were close to 0%

here is the "show interface tengig 3/0/1" for the Dept B stack

Supplychain_SW#sh interfaces tenGigabitEthernet 3/0/1

TenGigabitEthernet3/0/1 is up, line protocol is up (connected)

Hardware is Ten Gigabit Ethernet, address is b4a4.e30b.a49d (bia b4a4.e30b.a49d)

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive not set

Full-duplex, 10Gb/s, link type is auto, media type is 10GBase-LRM

input flow-control is off, output flow-control is unsupported

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:00, output 00:00:05, output hang never

Last clearing of "show interface" counters never

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/0 (size/max)

5 minute input rate 440000 bits/sec, 68 packets/sec

5 minute output rate 113000 bits/sec, 48 packets/sec

772650 packets input, 620471631 bytes, 0 no buffer

Received 96208 broadcasts (45326 multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 45326 multicast, 0 pause input

0 input packets with dribble condition detected

592087 packets output, 218874279 bytes, 0 underruns

0 output errors, 0 collisions, 1 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 PAUSE output

0 output buffer failures, 0 output buffers swapped out

Supplychain_SW#

what else should i check ??... really in a mess.. all applications starting to fail for clients there.. .. any help would be appreciated..

regards

zaid

Jon Marshall · ‎10-19-2011

Check also the 4500 end of the link

Jon

Latchum Naidu · ‎10-19-2011

Hi,

How is the cpu utilization of the both switches?

Please rate the helpfull posts.
Regards,
Naidu.

Zaid Farooqui · ‎10-19-2011

i deleted my previous post as i gave the wrong interface's details ...

the cup / memory utilization is ok . less then 20%

jon,

the 4500 side ten gig module shows error..

DC_Core#sh int ten2/3

TenGigabitEthernet2/3 is up, line protocol is up (connected)

Hardware is Ten Gigabit Ethernet Port, address is 1cdf.0f7e.22e9 (bia 1cdf.0f7e.22e9)

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,

reliability 254/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 10Gb/s, link type is auto, media type is 10GBase-LRM

input flow-control is on, output flow-control is on

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:03, output never, output hang never

Last clearing of "show interface" counters never

Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 108000 bits/sec, 68 packets/sec

5 minute output rate 639000 bits/sec, 101 packets/sec

61283541 packets input, 22021843469 bytes, 0 no buffer

Received 1789745 broadcasts (831233 multicasts)

0 runts, 0 giants, 0 throttles

938780 input errors, 938780 CRC, 0 frame, 0 overrun, 0 ignored

0 input packets with dribble condition detected

115444363 packets output, 94940540751 bytes, 0 underruns

0 output errors, 0 collisions, 0 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier

0 output buffer failures, 0 output buffers swapped out

DC_Core#

any suggestions what to do next ?

Ven Taylor · ‎10-19-2011

Looks like you may have a physical problem.

Clear your counters and see if the errors & CRC's come back.

If you can, change your 10Gig modules out and see if that fixes it.

If your 10Gig fibers are patched to an LIU, change out the patch fibers too.

Ven Taylor

twgraham · ‎03-19-2013

Fix went up last week:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtq86186

Short answer is upgrade the code to 15.0(2)SE