cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
cancel
78
Views
1
Helpful
2
Replies

Spontaneous reboot of FEX N2K-C2348UPQ-10GE

kz-support
Level 1
Level 1

Good day

We are using Nexus N7K (N77-C7706) and attached FEX module N2K-C2348UPQ-10GE, and we encountered an unexpected problem - removing the SFP module from the FEX led to a reboot of this very FEX.

1. Oddities

a) N77-F324FQ-25 modules in slots 1, 2 and 5 are in normal condition, but the module in slot 1 has HW-Version 1.2 (others 1.4), and module 2 did not pass online diagnostics

show module`
Mod  Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    24     10/40 Gbps Ethernet Module          N77-F324FQ-25      ok
2    24     10/40 Gbps Ethernet Module          N77-F324FQ-25      ok
3    0      Supervisor Module-3                 N77-SUP3E          active *
4    0      Supervisor Module-3                 N77-SUP3E          ha-standby
5    24     10/40 Gbps Ethernet Module          N77-F324FQ-25      ok
6    48     1/10 Gbps Ethernet Module           N77-F348XP-23      ok

Mod  Sw               Hw
---  ---------------  ------
1    8.4(1)           1.2     
2    8.4(1)           1.4     
3    8.4(1)           1.0     
4    8.4(1)           1.0     
5    8.4(1)           1.4     
6    8.4(1)           1.9    

 

Mod  Online Diag Status
---  ------------------
1    Pass
2    Fail
3    Pass
4    Pass
5    Pass
6    Pass

b) Another oddity is the message about exceeding the recommended limit of the number of Vlan-port Instances ports for STP after enabling and raising FEX:

2024 Sep 24 16:01:29 nx-ym-iva71 %$ VDC-1 %$ %FEX-2-NOHMS_ENV_FEX_ONLINE: FEX-114 On-line (Serial Number FOC25271BVQ)
2024 Sep 24 16:01:56 nx-ym-iva71 %$ VDC-1 %$ %STP-2-VLAN_PORT_LIMIT_EXCEEDED: The number of vlan-port instances exceeded [Rapid-PVST mode] recommended limit of 16384

======================================================

show system reset-reason fex

----- reset reason for FEX 114 ---

1) At 0 usecs after Unknown time
Reset Reason: Unknown (0)
Service (Additional Info):
Image Version: 8.4(1)

2) At 990975 usecs after Mon Nov 15 11:29:56 2021
Reset Reason: Kernel Reboot (1)
Service (Additional Info): Reload new image
Image Version: 7.0(6)N1(1)

3) At 0 usecs after Unknown time
Reset Reason: Unknown (0)
Service (Additional Info):
Image Version: 7.0(6)N1(1)

4) At 0 usecs after Unknown time
Reset Reason: Unknown (0)
Service (Additional Info):
Image Version: 8.4(1)

====================================

'show version!

Software
BIOS: version 1.44.0
kickstart: version 8.4(1)
system: version 8.4(1)
BIOS compile time: 05/17/2019
kickstart image file is: bootflash:///n7700-s3-kickstart-npe.8.4.1.bin
kickstart compile time: 6/30/2019 23:00:00 [07/06/2019 23:47:17]
system image file is: bootflash:///n7700-s3-dk9-npe.8.4.1.bin
system compile time: 6/30/2019 23:00:00 [07/07/2019 01:20:10]

 

Can you help us figure out the reasons?

2 Replies 2

AshSe
VIP
VIP

Hello @kz-support 

The spontaneous reboot of the FEX (N2K-C2348UPQ-10GE) and the associated oddities you are observing could be caused by a combination of hardware, software, and configuration issues. Letā€™s break this down and analyze the potential causes and solutions:


1. FEX Reboot on SFP Removal

  • Possible Cause: The FEX rebooting when an SFP is removed could indicate a hardware or software bug. It might also be related to power or environmental issues, or a miscommunication between the parent Nexus 7700 and the FEX.
  • Action Plan:
    • Check Logs: Look for logs around the time of the reboot (show logging log or show logging last <time>). Focus on any hardware or environmental warnings/errors.
    • Software Bug: The FEX is running NX-OS version 8.4(1), which may have known bugs. Check Cisco's Bug Search Tool for any bugs related to FEX reboots or SFP removal in this version.
    • Upgrade Software: Consider upgrading to a more stable and recommended NX-OS version. Version 8.4(1) is relatively old, and newer versions may have fixes for such issues.
    • Hardware Check: Ensure the FEX hardware is functioning properly. Run diagnostics (show diagnostic result module <FEX-ID>).

2. Oddities in Module Hardware Versions

  • Observation: The N77-F324FQ-25 module in slot 1 has hardware version 1.2, while the others are 1.4. Additionally, the module in slot 2 failed online diagnostics.
  • Possible Cause: The hardware version mismatch could indicate older hardware or firmware on the module in slot 1. The failed diagnostics on slot 2 could point to a hardware issue or a software compatibility problem.
  • Action Plan:
    • Module in Slot 1:
      • Check if the hardware version 1.2 is supported with NX-OS version 8.4(1). If not, consider upgrading the module firmware or replacing the module.
    • Module in Slot 2:
      • Investigate why the module failed diagnostics. Run detailed diagnostics (show diagnostic result module 2 detail) and check for any hardware faults.
      • If the module is faulty, consider replacing it.
    • Firmware Consistency: Ensure all modules are running compatible firmware versions. If needed, upgrade the firmware.

3. STP VLAN-Port Instance Limit Exceeded

  • Observation: The message indicates that the number of VLAN-port instances exceeded the recommended limit of 16,384 in Rapid-PVST mode.
  • Possible Cause: This is likely due to the number of VLANs and interfaces configured on the FEX and the parent switch. Each VLAN on each port creates a VLAN-port instance, and exceeding the limit can cause instability.
  • Action Plan:
    • Reduce VLAN-Port Instances:
      • Minimize the number of VLANs assigned to interfaces on the FEX.
      • Use VLAN pruning to limit VLANs on trunk ports.
    • Switch to MST: If possible, consider switching from Rapid-PVST to MST (Multiple Spanning Tree), which is more scalable and does not create a VLAN-port instance for each VLAN on each port.
    • Monitor STP Load: Use show spanning-tree vlan to monitor the STP load and ensure it is within limits.

4. FEX Reset Reasons

  • Observation: The reset reasons for FEX 114 include "Unknown" and "Kernel Reboot". This indicates that the FEX experienced unexpected reboots, possibly due to software or hardware issues.
  • Possible Cause: The "Kernel Reboot" reason suggests a software crash, while "Unknown" could indicate a power or hardware issue.
  • Action Plan:
    • Check Logs: Look for crash logs or core dumps on the parent switch (show cores or dir bootflash:) and analyze them.
    • Upgrade Software: As mentioned earlier, upgrade to a more stable NX-OS version to address potential software bugs.
    • Hardware Check: Run diagnostics on the FEX and parent switch to rule out hardware issues.

5. General Recommendations

  • Upgrade NX-OS: The current version 8.4(1) is relatively old and may have known bugs. Upgrade to a more recent and stable version recommended by Cisco for your hardware.
  • Check Compatibility: Ensure all hardware (FEX, line cards, supervisors) is compatible with the NX-OS version you are running.
  • Environmental Factors: Verify that the FEX and parent switch are operating within acceptable environmental conditions (temperature, power, etc.).
  • Cisco TAC: If the issue persists after the above steps, consider opening a case with Cisco TAC. Provide them with the logs, crash dumps, and other relevant information.

6. Next Steps

  • Perform the recommended actions above in a controlled manner.
  • Monitor the system after each change to determine if the issue is resolved.
  • Document any changes and their impact for future reference.

By addressing the potential hardware, software, and configuration issues, you should be able to resolve the spontaneous FEX reboot and related oddities.

 

Hope This Helps!!!

 

AshSe

Forum Tips: 

  1. Insert photos/images inline - don't attach.
  2. Always mark helpful and correct answers, it helps others find what they need.
  3. For a prompt reply, kindly tag @name. An email will be automatically sent to the member.

Thank you for suggestions,

additionally I found this message in the log 

2024 Sep 24 15:57:20 nx-fex %FEX-2-NOHMS_ENV_FEX_OFFLINE: FEX-114 Off-line (Serial Number F1S25271BVQ)

Review Cisco Networking for a $25 gift card