01-23-2015 12:07 PM - edited 03-07-2019 10:21 PM
Hi Folks,
I have an issue observed in testing. We have implemented a VSS solution on the 6880-X-LE. We have two MEC L3 PO's upstream and thus two OSPF path's for our default route. After creating an event in the lab to put the boxes in Active - Active (pulled VSL links off SW1) and then go through the process of restoring the VSS (put VSL links back in) I noticed my additional path did not restore.
From the restoration perspective, it went as follows:
Sw1 - Active SW2-standby - pull VSL links
VSL PO goes down
SW1 - Active SW2 - Active (sub second traffic impact)
SW1 - enters recovery mode SW2 - Active
VSL links back in - VSL PO up
SW1 - reboots and comes up standby SW2 - Active
Routing table now only shows entry for PO200 and no routes can go out PO100.
6880#sh ip route
<cut>
0.0.0.0/0 [110/1000] via 192.168.0.253, 00:38:34, Port-channel200
1.0.0.0/24 is subnetted, 2 subnets
O 1.1.1.0 [110/151] via 10.86.50.253, 00:31:55, Port-channel200
O 1.1.2.0 [110/151] via 10.86.50.253, 00:31:55, Port-channel200
192.168.0.0/16 is variably subnetted, 59 subnets, 8 masks
O 192.168.1.0/25 [110/150] via 1192.168.0.253, 00:31:55, Port-channel200
O 192.168.1.128/25 [110/350] via 192.168.0.253, 00:31:55, Port-channel200
O 192.168..3.0/25 [110/350] via 192.168.0.253, 00:31:55, Port-channel200
6880#sh ip ospf ne
Neighbor ID Pri State Dead Time Address Interface
192.168.0.253 0 FULL/ - 00:00:31 192..168.50.253 Port-channel200
192.168.0.251 0 FULL/ - 00:00:32 192.168.50.251 Port-channel100
Can anyone help?
Thanks,
Rash
01-23-2015 06:44 PM
Hi,
When VSS is up with one switch as active and the other one as stand-by, does the routing table show both po100 and 200?
Can you post the config?
HTH
01-27-2015 08:18 AM
Hi Reza,
I can't post the configuration. However, I can tell you that I have a TAC case open with Cisco and they agree with me that this behavior requires further investigation. So the "PO" in question disappears seems to match when I get the error related to the a port-member not being compatible:
Jan 26 15:18:17 est: %EC-SW2_STBY-5-CANNOT_BUNDLE2: Te2/3/14 is not compatible with Te1/5/14 and will be suspended (speed of Te2/3/14 is 10G, Te1/5/14 is 1000M)
So what happens is that in PO100 you'll see only one bundled port and the other is suspended. This causes OSPF to withdraw the PO from the routing table (hypothesis but reproducible).
In our trial production launch, I performed the same test above redundancy force-switchover and you'll see both entries in the routing-table. Once the other modules come up and I get a suspended port-member that particular PO disappears from the routing table.
If you shut/no shut that PO, the PO returns as an available path in the routing table. In production we are not using GLC-T (as in the lab) but are using GLC-SX-MM. The bug that puts this member into a suspended state falls under the this bug id:
CSCur17071
What I missed initially is the fact, that the routing table removes that PO from being an available path even though it still has an active port-member.
Cheers,
Rash
01-27-2015 08:37 AM
Rash
These are two 10Gbps links bundled into a port channel ?
If so that's strange because 6800s calculate 10Gbps and above as an OSPF cost of 1 so even though you lost one of the members it should still see both paths as equal in terms of cost unless you have changed the autocost reference bandwidth.
When you lose the route does it still show up in the OSPF database and if so what do the costs show for both paths ?
Jon
01-27-2015 08:52 AM
Hi Jon,
The links are GE not 10GE. So my port-channel is 2Gbps. As for the routes being in the database, I believe we saw them. I can quickly re-test in my lab environment. Do you have certain troubleshooting commands that you feel I should run? Or just the standard sh ip ospf database and show ip ospf interface | inc Cost.
Thanks.
01-27-2015 09:59 AM
Rash
I haven't used the 6800s but i believe the same applies to Gbps interfaces ie. the cost for 1Gbps or 2Gbps would be the same.
If you want to test it then you would need to get to the stage where the route is not in the routing table then do -
"sh ip ospf interface po<x>"
for both port channels and see what the cost of each is.
Then look at the OSPF database to see whether both routes are there.
I'm assuming you haven't used the "ip ospf cost .." command on any interfaces and that you haven't changed the OSPF reference bandwidth.
Note also that I'm not suggesting you do either especially the reference bandwidth because that would mean you had to update all devices in your network so test is just to try and work what is happening from a routing perspective.
Jon
01-27-2015 10:01 AM
Hi Jon,
Your thoughts were bang on. The "cost" changed. See the points below.
1) When both routes were present the costs for the PO's was 100.
2) After redundancy force switch-over
Cost becomes 50 for PO200
Cost becomes 100 for PO100
Po100 is the one with the suspended interface. The interface that is up on PO100 is te2/5/14 and its seen as 1000Mb/s whereas the suspended port is seen as 10Gb/s.
Both routes appear in the OSPF DB from the adv router but the metrics is what changed 100 --> 50.
Not sure why. It changes. I returned it by OIR the optic in the lab. Shut/no shut works in our production environment as I'm using different optics.
Any thoughts?
Rash
01-27-2015 10:11 AM
Rash
I'm not sure why the cost of po200 becomes 50 especially as it not affected in terms of any links being down.
I really don't understand that.
How is the router connected to the VSS system ?
When everything is up and running what does a "sh ip ospf neigh" show ?
Jon
01-30-2015 11:01 AM
Hi Jon,
Everything is connected and OSPF neighbors are up. I hypothesis seems to be correct as when I recover the port-member down the OSPF cost's match up between the two PO's and are now equal. I see that both paths are now available for the RIB.
It threw me for a loop but I believe this to be the correct way. You only put equal cost links into the RIB as being available.
01-27-2015 10:04 AM
Side question,
Are there any specific snmp traps for dual-active fast-hello interface? I'd like to monitor this.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide