10-19-2011 03:18 AM - edited 03-07-2019 02:54 AM
HI,
we recently started having a problem where a certan users in an area started complain abt file transfers (user to user / email attachments etc )
in our scenaro the data center has a 4507RE which connects to Dept A via X2 (LRM) port over 10G fiber , similarly for Dept B its the same using the second X2 port (LRM) over 10G fiber.
Dept A has a localized network rack with 3x3750E switches stacked. Same is for Dept B.
After the complaint was launched , when we started investigating , the first thing i noticed was that the ping response from the core switch to Dept B stack was breaking... this was my ping response
DC_Core#ping 172.16.0.3 repeat 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 172.16.0.3, timeout is 2 seconds:
!!!!!!!!.!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!.!!!!!!!!!!!!!!!!!!!!!
!!!!.!!!!!!!!!!!.!!!!!!!!!!!!.!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!.!!!!!!!!.!!!!!!!!!!!!!!!
!!!!.!!!!!!!!!!!!.!!!!!!!.!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!.!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!.!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!.!.!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!.!!!!.!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 96 percent (968/1000), round-trip min/avg/max = 1/1/8 ms
DC_Core#
it breaks even if i specify the minimum datagram size of 36 bytes...
this was disturbing.... so i tried the same with Dept A stack , just to verify it it was a problem with the core.... which ran without a miss.. i tried Dept A stack with the max datagram size of 18024 bytes which also went without a miss..
DC_Core#ping 172.16.0.2 size 18024 repeat 1000
Type escape sequence to abort.
Sending 1000, 18024-byte ICMP Echos to 172.16.0.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 8/12/36 ms
DC_Core#
so OBVIOUSLY somthing is wrong with the Dept B stack ... but i can not see anything on the switch panel or our manage engine OP Manager console.
the memory utilization is 20 % , CPU is less then 10%, ...
i have tried restarting the stack.. how else can i resolve this ?? what else do i need to look for ?
10-19-2011 03:22 AM
Post the command output "sh contr e ten
10-19-2011 03:29 AM
both Dept A & Dept B stacks show utilization = 0
Supplychain_SW#show controllers ten 3/0/1 utilization
Receive Bandwidth Percentage Utilization : 0
Transmit Bandwidth Percentage Utilization : 0
10-19-2011 03:38 AM
No. That's not what I asked for.
Post the command output "sh contr e ten
10-19-2011 07:45 AM
Supplychain_SW#sh controllers ethernet-controller tenGigabitEthernet 3/0/1
Transmit TenGigabitEthernet3/0/1 Receive
687654406 Bytes 1433533223 Bytes
1333729 Unicast frames 1563175 Unicast frames
14462 Multicast frames 147813 Multicast frames
22128 Broadcast frames 157173 Broadcast frames
0 Too old frames 1391160299 Unicast bytes
0 Deferred frames 18534353 Multicast bytes
0 MTU exceeded frames 23358879 Broadcast bytes
0 1 collision frames 0 Alignment errors
0 2 collision frames 0 FCS errors
0 3 collision frames 0 Oversize frames
0 4 collision frames 0 Undersize frames
0 5 collision frames 0 Collision fragments
0 6 collision frames
0 7 collision frames 226959 Minimum size frames
0 8 collision frames 351354 65 to 127 byte frames
0 9 collision frames 247551 128 to 255 byte frames
0 10 collision frames 152273 256 to 511 byte frames
0 11 collision frames 61218 512 to 1023 byte frames
0 12 collision frames 826092 1024 to 1518 byte frames
0 13 collision frames 0 Overrun frames
0 14 collision frames 0 Pause frames
0 15 collision frames
0 Excessive collisions 0 Symbol error frames
0 Late collisions 0 Invalid frames, too large
0 VLAN discard frames 2714 Valid frames, too large
0 Excess defer frames 0 Invalid frames, too small
451421 64 byte frames 0 Valid frames, too small
171586 127 byte frames
236140 255 byte frames 0 Too old frames
93663 511 byte frames 0 Valid oversize frames
69714 1023 byte frames 0 System FCS error frames
347795 1518 byte frames 0 RxPortFifoFull drop frame
0 Too large frames
0 Good (1 coll) frames
0 Good (>1 coll) frames
Supplychain_SW#
10-19-2011 03:22 AM
I would start by looking at the interfaces between the core switch and the 3750 stack. Check for dropped packets. It could be that your fibre or optical interfaces need replacing.
Jon
10-19-2011 03:34 AM
Hi jon,
i saw a report from OP manager with packt loss close to 2% while others were close to 0%
here is the "show interface tengig 3/0/1" for the Dept B stack
Supplychain_SW#sh interfaces tenGigabitEthernet 3/0/1
TenGigabitEthernet3/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is b4a4.e30b.a49d (bia b4a4.e30b.a49d)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 10Gb/s, link type is auto, media type is 10GBase-LRM
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:05, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 440000 bits/sec, 68 packets/sec
5 minute output rate 113000 bits/sec, 48 packets/sec
772650 packets input, 620471631 bytes, 0 no buffer
Received 96208 broadcasts (45326 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 45326 multicast, 0 pause input
0 input packets with dribble condition detected
592087 packets output, 218874279 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
Supplychain_SW#
what else should i check ??... really in a mess.. all applications starting to fail for clients there.. .. any help would be appreciated..
regards
zaid
10-19-2011 03:37 AM
Check also the 4500 end of the link
Jon
10-19-2011 03:39 AM
Hi,
How is the cpu utilization of the both switches?
Please rate the helpfull posts.
Regards,
Naidu.
10-19-2011 05:20 AM
i deleted my previous post as i gave the wrong interface's details ...
the cup / memory utilization is ok . less then 20%
jon,
the 4500 side ten gig module shows error..
DC_Core#sh int ten2/3
TenGigabitEthernet2/3 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet Port, address is 1cdf.0f7e.22e9 (bia 1cdf.0f7e.22e9)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
reliability 254/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 10Gb/s, link type is auto, media type is 10GBase-LRM
input flow-control is on, output flow-control is on
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:03, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 108000 bits/sec, 68 packets/sec
5 minute output rate 639000 bits/sec, 101 packets/sec
61283541 packets input, 22021843469 bytes, 0 no buffer
Received 1789745 broadcasts (831233 multicasts)
0 runts, 0 giants, 0 throttles
938780 input errors, 938780 CRC, 0 frame, 0 overrun, 0 ignored
0 input packets with dribble condition detected
115444363 packets output, 94940540751 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier
0 output buffer failures, 0 output buffers swapped out
DC_Core#
any suggestions what to do next ?
10-19-2011 07:19 AM
Looks like you may have a physical problem.
Clear your counters and see if the errors & CRC's come back.
If you can, change your 10Gig modules out and see if that fixes it.
If your 10Gig fibers are patched to an LIU, change out the patch fibers too.
03-19-2013 11:08 AM
Fix went up last week:
Short answer is upgrade the code to 15.0(2)SE
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide