We are running asr9006:s as backbone nodes. We have mstag accessrings wich connects different acces switches through out the network. We have been running this for a long time (3 years). After we upgraded from 4.1.0 to 4.3.3 we are experience som strange behavior on some of the vlans running accross the mstag.
Let me try to explane..
One of the vlan is connected to 2 asr9k in a accessring. The particular vlan is configured betweeen the two 9k:s with bridge grops and bridge- domains according to our standard setup for the backbone. Both routers have a configured bvi for this bridge-domain, both configured equally, but one of them is in shutdown mode.
Suddenly we are experiencing packet drops and we are unable to ping end nodes in this specific vlan wich is routed at the bvi The problem is intermittent and seems to last for a couple of minutes before it starts to work again.
While troubleshooting this problem i noticed that if we disabled the bvi on one router and enabled it on the other, the problems disappear. Changing back (activating the bvi on the first 9k), makes the problem come back again after a while.
All pseudowires and mstag check commands indicates that the bridge-domain and access ring is ok. Since we have been running this setup for a while i am beginning to think this is a 4.3.3 bug. But why does the other router handle the situation?
Solved! Go to Solution.
this situation will require some troubleshooting and some deeper investigation. It might be best to open a TAC case for this so we can do a screen share and verify the various points in the topology to see where the packet drops occur potentially.
One thing you could check is to see if there are STP convergence events happening that potentially, temporarily, bring down a link or put it in blocking instead of forwarding.
The other thing to verify is seeing if there are paritcular NP counters that reflect a rate that seems to be related to the drops.
The ping loss to the BVI can either be caused by drops in hardware, lpts (control plane policing) or at the software level.
There are too many dependencies here to give a solid confined answer for which that tshoot session would be necessary.