cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
37528
Views
5
Helpful
22
Replies

BGP Flaps ASR9000

Ali Muazzam
Level 1
Level 1

Hi All,

I have been seeing lots of BGP Peer flap alerts on my ASR9010. And often this syslog is observed

%ROUTING-BGP-3-NBR_NSR_DISABLED : NSR disabled on neighbor x.x.x.x due to 'ip-tcp' detected the 'warning' condition 'NSR is down because the retransmission threshold exceeded (probably because downstream RP is not healthy)'

There seems to be no Packet drops between the peers that could lead to above situation. We also recently upgraded to 5.3.3 but this problem still exists. The far end cannot has problem because it is not with just one or two peers.

Can anybody help what else to look for?

Cheers!!

22 Replies 22

hi Xander

616 MDF_RPF_FAIL_DROP 36036 0
630 MDF_PUNT_POLICE_DROP 1178 0
637 MODIFY_PUNT_REASON_MISS_DROP 2 0
1292 PARSE_DROP_IN_UIDB_TCAM_MISS 8990 0
1298 PARSE_DROP_IN_UIDB_DOWN 9 0
1350 PARSE_L3_TAGGED_PUNT_DROP 263879 0

Tx Power: 0.87890 mW (-0.56061 dBm)
Rx Power: 0.15980 mW (-7.96423 dBm)

Hi Xander,

Is there the official document for NSR process on ASR9000? I need to learn more detail about the packet flow for NSR process.

As you said,  the active RSP give its tcp packets for bgp to the standby and the standby transmit and receive the packets. 

So packet flow might be like this.

 

Egress

RP0 (Active): CPU -> FIA -> Switch Fabric ->

RP1 (Standby): FIA -> CPU -> FIA -> Switch Fabric ->

LC: Switch Fabric ASIC -> FIA -> NP ->Physical Interface

 

Ingress

LC: Physical Interface -> NP -> FIA -> Switch Fabric ASIC - >

RP1 (Standby): Switch Fabric -> FIA -> CPU -> FIA ->

RP0 (Active): Switch Fabric -> FIA -> CPU

 

Could you please confirm it's correct, isn't ?

 

By the way, could you please confirm if the active RSP fails to get the 1st TCP ack, the active RSP pulls back the control of the session and retransmits it ?

 

BR,

Your understanding of the Egress path is correct.

On the Ingress side, packet is multicasted within the switch fabric to both active and standby RSP.

The active RP controls the session, so if it doesn't receive the ACK, it will initiate the retransmission.

 

hth,

/Aleks

Hi Aleks,

 

Do you have the official document for this?

 

BR,

Actually not, because this level of implementation details is typically not included in configuration guides. (e.g.: https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k-r6-2/ip-addressing/b-ip-addresses-configuration-guide-asr9000-62x/b-ipaddr-cg-asr9000-62x_chapter_01100.html)

We will try to create soon a supportforums article that explains this.

 

/Aleks

Hi Aleks,

 

As you said, The active RP controls the session, so if it doesn't receive the ACK, it will initiate the retransmission.

 

Did you mean, if the active RSP fails to get the 1st TCP ack then NSR process will be disabled per neighbor?

 

BR,

hi Phongthep,

 

no, a single missing ACK is not sufficient to bring down the session. Standard TCP mechanism applies. The point that I was making was that the active RP fully controls the TCP session. 

/Aleks

I work in a telecomunications company, and we run into a problem very similar to this. In our case It was like the keepalive packet didn´t came to us from our neighbor and qhen we do a sh bgp neighbor (IP) in the line NSR State: field never reach NSR Ready. The problem in our case was the MTU, after many tests we found that in the transit to our peer the MTU of the transmition packets were modified and when we adjust our mtu ipv4 and mtu ipv6 to a value that work never fall again. I can´t explain the details but it was what happend