cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4733
Views
10
Helpful
9
Replies

FEX bouncing

mSumo
Level 1
Level 1

Hello all,

 

I've been ask by friend of mine to help with troubleshooting of the issue he is experiencing with Nexus. The problem is that they are losing all devices on "Eth101/1/" , FEX101. This has been occurring for some time but they only noticed now, frequency varies but averages every 4-6 hours or so.

 

Attached is the Log and also some show commands outputs I've asked him to send me...

 

Some additional info I've checked with him so far:

  • the problem is only on FEX101, other FEX are OK
  • there is only one cable in port-channel that goes from N5k to N2k
  • I'm not sure yet whether there is a direct link or any patch pannel in that connection 
  • the issue is occuring on CORE-A switch. CORE-B does not have this problem at all as it looks for now (vPC)

 

Attached are two files, some shows (2nd-later) were performed a bit later to verify whether somehting is changing...

  • sh ver
  • sh fex
    • once, the status was ONLINE SEQUENCE, 2nd time was ONLINE
  • sh fex 101 detail
    • found some strange logs here like "Deleting route to FEX", "Module disconnected", "Offlining Module", "LC insert failed at sequence 10 :  Im SAP" ... not sure what this means... but doesnt look good
  • sh interface fex-fabric

  • sh port-channel summ
  • sh interface port-channel 101
  • sh run int eth1/1
  • sh run int port-channel 101

  • sh int e1/1
    • I've asked to clear counters and do the show again... Noticed there are CRC and Imput Errors increasing
  • sh int port-channel 101
  • sh int e1/1 transceiver details

    • to check TX/RX Power ... but looks OK

I'm not very experienced with Nexus, but it looks to me like a possible cable/SFP issue?... However, it is strange that it bounces again and again after several hours....

 

Any idea whether anything else could cause it?

 

 

 

 

 

9 Replies 9

balaji.bandi
Hall of Fame
Hall of Fame

As per the post, since you have mentioned there is no other issue with other FEX,

 

you are correct and follow the below steps :

 

1. check the SFP Both the ends. ( try with a new one)

2. check the fibre patching from end to end.

3. check is the FEX rebooting, while he lost FEX from the parent device. ( not sure power issue )  - will not suspect worth to check.

 

 

 

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

thanks for confirmation....

 

About "3 - power issue" - I would say that if there is a power issue on the 2k side, that would cause that FEX going down also on CORE-B?

Yes power issue would cause it to show dropped on both sides and also if there was a config issue or difference between each side

I've just got some additional checks I asked for - see below. Also, got a log when Eth1/1 is UP, Po101 is UP, but FEX 101 is DOWN (attached)- could this be a state when we consider cable/sfp issue?

 

What I asked for:

 

  1. check sh int Eth1/1 few times over the day to see whether errors are increasing
    • they are increasing slowely
    • CRC and Input Errors are indetical
  2. compare configuration between CORE-A and CORE-B
    • the configuration match on both sides
  3. sh int Eth1/1 transceiver details few times over the day to see whether TX/RX is changing significantly
    • it is not. Quite stable
  4. bounce the Eth1/1 interface
    • didnt help... errors are still increasing
  5. check the cabling
    • there are patch pannels

it is just strange to me that ports are UP, FEX is down... I'm quite courious about this issue right now :)

E1/1 you have a bad cable on FEX uplink , CRCs and INputs , sign of a dodgy cable or SFP

198301584712 input packets 209092495319978 bytes101983157103 jumbo packets 0 storm suppression bytes0 runts 0 giants 230040460 CRC 0 no buffer230040460 input error 0 short frame 0 overrun 0 underrun 0 ignored0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop0 input with dribble 0 input discard

also if your using LACP apply below LACP command to the PO VPCs

interface port-channel101
description FEX101:
switchport
switchport mode fex-fabric
fex associate 101
no lacp suspend-individual
vpc 101

Also that SW is 7 years old you should really upgrade there has been multiple CVE releases based against that image version

thanks for your posts Mark.... very useful for me...

 

  1. LACP
    • Im not 100% sure here.... the port is configured like the below... The "feature lacp" is configured globaly, but from "show port-chan sum", I can see the PROTOCOL for 101 is NONE.... However, I guess that LACP is running by default? so will try to configure that command you proposed

       

      interface Ethernet1/1

        switchport mode fex-fabric

        fex associate 101

        channel-group 101

  2. Will advise once again to replace SFP / cabling.... The ppl are just not very keen on doing it :)
  3. I'm aware of that.... In fact, I already advised to upgrade to the latest stable NXOS

Hi
no your right its just on - on mode, i just have it added to all my po,s but if your not specifically using it your ok, the cabling thing though is an issue id suspect

before replacing anything clear the counters so you can get a fresh count when the new cable or sfp goes in

ye sure let us know how you get on in case thats not the fix we can look further into it