cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1377
Views
0
Helpful
5
Replies

A9K-RSP440-SE strange behavior

PK99
Level 1
Level 1

We have a 9006-V2 in our lab. 

 

The 9006 currently has 1 x MOD80-SE and 1 x MOD80-TR.  In the MOD80-SE there is a MPA-20X1GE installed.

 

When running with RSP880-SE with IOSXR 6.5.3 all is fine everything seems smooth.

 

When running with RSP440-SE with IOSXR 6.5.3 we get the following error

RP/0/RSP0/CPU0:Feb 12 08:42:10.245 UTC: FABMGR[222]: %PLATFORM-FABMGR-2-FABRIC_LINK_DOWN_FAULT : (0/2/CPU0 XBAR 0) <--> (0/RSP0/CPU0 XBAR 1) fabric link is down
RP/0/RSP0/CPU0:Feb 12 08:42:10.257 UTC: FABMGR[222]: %PLATFORM-FABMGR-2-FABRIC_INTERNAL_FAULT : 0/2/CPU0 (slot 4) encountered fabric fault. Interfaces are going to be shutdown.

 

However none of the interfaces ever go down.  Then we put a different RSP440-SE with IOSXR 6.5.3 and we get same error.  We moved the MOD80-SE to different slot and same thing.

 

The strange thing is this behavior doesn't exist with the RSP880-SE.

 

I looked at different bugs and there are different answers to this specific issue.

 

Anyone experience the same?  

 

These 2 RSP440-SE will power up a MOD400-SE with no issues and not errors.  Of course you get Rate Limit warning, but that's to be expected.

 

Now, this is popping up with the RSP440-SE when MOD80-SE is moved to a different slot

 

fab_si[175]: %PLATFORM-STATS_INFRA-3-ERR_STR_1 : ErrStr:Unable to map stats infra shared mem

 

 

1 Accepted Solution

Accepted Solutions

smilstea
Cisco Employee
Cisco Employee

While Mark is right that RSP440 is not supported by development or TAC beyond 6.4.2 there is nothing to prevent it from working (its more of a support and troubleshooting issue). With that said I am curious the combinations that were tried exactly.

Is it always RSP0 that reports the link down to the same MOD80?

You have moved the same MOD80 between slots using the same RSP and the issue follows with the MOD80?

Using the same RSP if you swap the MOD80 with the other MOD80 you get no failure?

If you swap RSP440 with another RSP440 and keep the original MOD80 you still get the same failure?

 

Please note that MOD80 vs MOD400 and RSP440 vs RSP880 use different backplane pins/connectors so that may be why you see a difference when connecting a different generation card. So please perform the testing as I described above with the same type of card (TR vs SE does not matter for the pinout, the only difference is RAM and TCAM and CPU).

 

Sam

View solution in original post

5 Replies 5

Hi PK99

 

Rsp440 is not supported after 6.4.2 so perhaps this is your issue ?

 

Pls check this link as well as the eos notice for rsp440

https://community.cisco.com/t5/service-providers-documents/ios-xr-release-strategy-and-deployment-recommendation/ta-p/3165422

 

Hope this helps

Mark

smilstea
Cisco Employee
Cisco Employee

While Mark is right that RSP440 is not supported by development or TAC beyond 6.4.2 there is nothing to prevent it from working (its more of a support and troubleshooting issue). With that said I am curious the combinations that were tried exactly.

Is it always RSP0 that reports the link down to the same MOD80?

You have moved the same MOD80 between slots using the same RSP and the issue follows with the MOD80?

Using the same RSP if you swap the MOD80 with the other MOD80 you get no failure?

If you swap RSP440 with another RSP440 and keep the original MOD80 you still get the same failure?

 

Please note that MOD80 vs MOD400 and RSP440 vs RSP880 use different backplane pins/connectors so that may be why you see a difference when connecting a different generation card. So please perform the testing as I described above with the same type of card (TR vs SE does not matter for the pinout, the only difference is RAM and TCAM and CPU).

 

Sam

Thanks Sam, I will give it a try.  In our lab we only have 1 x MOD80-SE and 1 x MOD80-TR.  I don't have the ability to try a different MOD80-SE.  I will try moving the RSP440 around and see if I get same behavior.  I can rule out using RSP880 and MOD400 with said RSP440s.  BTW, A9K-400G-DWDM-TR also works fine with RSP440, albeit the RATE LIMIT warning - but I understand that the DWDM card is a Tomahawk card.

While I understand that the RSP440 is not supported beyond 6.4.2, I was well aware of this fact but due to our needs we have to have minimum 6.5 and like you said, supporting 6.5 should have no bearing on the fact that I'm getting these errors.

 

Will test again and report back.

Sam, sure enough...our RSP slot 0 must be bad.  I moved the RSP440-SE to slot RSP1 and no fabric error.  Did the same for 2nd RSP440-SE and no error.  I tried RSP880-SE on slot RSP0 and sure enough, FABRIC error.  Was going to try RSP5 but if memory serves me correct, MOD80 won't even work with RSP5.

 

I guess I never even considered RSP0 slot was bad.  I'm guessing the FABRIC link from RSP0 slot to the rest of the slots must be bad.

I am however getting this annoying message on 1 of the RSP440-SE now in RSP1 slot

 

RP/0/RSP1/CPU0:ios#LC/0/1/CPU0:Feb 13 12:08:36.883 UTC: pfm_node_lc[308]: %FABRIC-FIA-1-SKT_SP0_INTR_BAD_CODE_0 : Set|fialc[168004]|0x103d000|SKT_SP0_INTR_BAD_CODE on FIA 0
LC/0/1/CPU0:Feb 13 12:08:36.886 UTC: pfm_node_lc[308]: %FABRIC-FIA-1-SKT_SP0_INTR_LANE_CRC_ERR_0 : Set|fialc[168004]|0x103d000|SKT_SP0_INTR_LANE_CRC_ERR on FIA 0
LC/0/1/CPU0:Feb 13 12:08:36.886 UTC: pfm_node_lc[308]: %FABRIC-FIA-1-SKT_SP0_INTR_CW_CRC_ERR_0 : Set|fialc[168004]|0x103d000|SKT_SP0_INTR_CW_CRC_ERR on FIA 0
LC/0/1/CPU0:Feb 13 12:08:46.887 UTC: pfm_node_lc[308]: %FABRIC-FIA-1-SKT_SP0_INTR_BAD_CODE_0 : Clear|fialc[168004]|0x103d000|SKT_SP0_INTR_BAD_CODE on FIA 0
LC/0/1/CPU0:Feb 13 12:08:46.887 UTC: pfm_node_lc[308]: %FABRIC-FIA-1-SKT_SP0_INTR_LANE_CRC_ERR_0 : Clear|fialc[168004]|0x103d000|SKT_SP0_INTR_LANE_CRC_ERR on FIA 0
LC/0/1/CPU0:Feb 13 12:08:46.888 UTC: pfm_node_lc[308]: %FABRIC-FIA-1-SKT_SP0_INTR_CW_CRC_ERR_0 : Clear|fialc[168004]|0x103d000|SKT_SP0_INTR_CW_CRC_ERR on FIA 0

 

It's "CLEAR" message so must be cosmetic but it's annoying because it's constantly popping up