cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
220
Views
0
Helpful
6
Replies
Highlighted
Beginner

Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

"I inserted a new A9K-24X10GE-TR line card today, which prompted the following errors:

 

RP/0/RSP1/CPU0:Jul 21 22:26:39.997 : pfm_node_rp[357]: %PLATFORM-CROSSBAR-1-SERDES_ERROR_LNK1 : Set|fab_xbar[213105]|0x1017007|XBAR_0_Slot_3 
RP/0/RSP1/CPU0:Jul 21 22:26:50.015 : pfm_node_rp[357]: %PLATFORM-CROSSBAR-1-SERDES_ERROR_LNK1 : Clear|fab_xbar[213105]|0x1017007|XBAR_0_Slot_3

This error is repeated several times per hour, each time with ~10 seconds between the "Set" and "Clear" messages.

 

Immediately after "Set" message appears:

PORT    Remote Slot  Remote Inst    Logical ID  Status
======================================================
00      0/3/CPU0            02             1        Up
01      0/3/CPU0            01             1        Up
02      0/3/CPU0            01             0        Up
03      0/3/CPU0            00             0        Up
04      0/3/CPU0            00             1        Up
05      0/3/CPU0            03             1        Up
07      0/RSP1/CPU0         00             1        Up
08      0/3/CPU0            03             0        Up
09      0/RSP0/CPU0         00             1        Down
11      0/RSP1/CPU0         00             0        Up
12      0/RSP0/CPU0         00             0        Up
14      0/RSP0/CPU0         01             1        Up
15      0/RSP1/CPU0         01             1        Up
16      0/RSP0/CPU0         01             0        Up
17      0/RSP1/CPU0         01             0        Up
24      0/3/CPU0            02             0        Up

Immediately after the message clears:

PORT    Remote Slot  Remote Inst    Logical ID  Status
======================================================
00      0/3/CPU0            02             1        Up
01      0/3/CPU0            01             1        Up
02      0/3/CPU0            01             0        Up
03      0/3/CPU0            00             0        Up
04      0/3/CPU0            00             1        Up
05      0/3/CPU0            03             1        Up
07      0/RSP1/CPU0         00             1        Up
08      0/3/CPU0            03             0        Up
09      0/RSP0/CPU0         00             1        Up
11      0/RSP1/CPU0         00             0        Up
12      0/RSP0/CPU0         00             0        Up
14      0/RSP0/CPU0         01             1        Up
15      0/RSP1/CPU0         01             1        Up
16      0/RSP0/CPU0         01             0        Up
17      0/RSP1/CPU0         01             0        Up
24      0/3/CPU0            02             0        Up

 

The line card also appears to be dropping traffic (noticeable drop for users). "show drops" reveals that "Egress Uc dq pkt-len-crc/RO-seq/len error drp" is increasing rapidly.

 

The line card was reseated, rebooted, as well as tested in another slot slot 0 and slot 3). Same issues happen with both slots immediately after the line card boots.

 

I'd assume that this is a bad line card that needs to be RMA'd, but would just like to confirm as that is a last-resort option. Is there any possibility that this error stemmed from an issue with the software, chassis or some underlying fault in one or more RSP's? I've had several (different) other issues with the fabric crossbar / FIA's on trident LC's before, so I'm unsure whether it's just bad luck or if another component in my system is damaged.

 

IOS XR v5.3.4 base, ASR9010 with 2x RSP440-SE. FPD on LC is latest (for this release of IOS XR).

 

6 REPLIES 6
Beginner

Re: Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

Fabric links are trained at every initialisation, to optimise the setting of programmable ASIC parameters. So this could be a HW fault or a scenario in which SW could do a better job in training the link. Do you have the latest 5.3.4 Service Pack installed? Or the equivalent of individual SMUs? If yes, in that case this is very likely a HW fault and the line card should be replaced.
Beginner

Re: Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

Hi,

I currently am only running the base version of XR 5.3.4 with no SMU's or SP's installed. I don't want to install any SMU's or SP's unless there is even a remote possibility that it will help.

Is there any way to verify (ie. looking for a specific keyword in a SMU or SP) that a SMU/SP has an "update" specifically for link-training with my applicable line cards? Doing most SMU/SP upgrades requires a router reboot with additional downtime, which I'd like to keep to a minimum at this time.

 

Thanks!

Beginner

Re: Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

CSCve85121 is one example of a SMU related to fabric monitoring and debugging. I'm fairly sure we posted some others as well. Running a 'vanilla' XR installation, without any SMUs or SPs, is not something we recommend. The concept of the SMU and SP was delivered on IOS XR exactly to help keep the installation up to date with important fixes, without a need to upgrade to a higher IOS XR release. To facilitate the install operation itself, we have delivered the CSM (Cisco Software Manager) Server platform which significantly reduces the complexity of network operator's task. Instead of performing the installation manually, the operator can simply watch the progress and revise the pre- and post-install check logs. There's also an API to customise the pre- and post-install checks to meet your specific deployment. CSM Server is available for download at: https://software.cisco.com/download/home/282414851/type/284777134/release/4.0.
Beginner

Re: Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

Hi,

For debugging purposes, IOS XR was upgraded to 6.4.2 (latest supported on RSP440) with SP3. However, this error is still present. I assume this means bad hardware.

Is there any way to verify if this error is 100% due to a bad linecard only? What are the possibilities (or debugging options) I can check to ensure that my RSP's or chassis is not bad as well?

Thanks!

Cisco Employee

Re: Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

The usual HW troubleshooting should help. Visual inspection of connectors on the LC back-end and inside the slot should be carried out to see whether there's any observable physical damage. If none is observed, insert another LC into the slot and see whether the problem persist. Based on what you wrote so far, I expect only the LC to be faulty.

Re: Fabric Crossbar Errors - Typhoon A9K-24X10GE-TR

Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.

CreatePlease to create content
Content for Community-Ad
July's Community Spotlight Awards