cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2681
Views
55
Helpful
20
Replies

OSPF flapping while adding 9200 switches in Layer 2 network

Deepak Kumar
VIP Alumni
VIP Alumni

Hello Guys,

I am back here with a question. I am busy in my office work as well Cisco magic number study so time management is an issue for me.  Hoverwere I am facing a strange issue with Network as OSPF neighborship with SD-WAN devices flapping, once I will add new access switches Cisco 9200L. it is happing with 9200L switches only. I added 2960X switches but there is no issue.

 

we observed that this problem occurs with two configuration combinations such as 1. VLAN x (x = VLAN ID for OSPF neighborship only with SD-WAN device in P2P network) is the source of RSPAN (1) means making a duplicate copy of VLAN X traffic for security analysis by the security team. 2. VLAN X is allowed on the trunk ports of the newly added access switches (9200L switches). If we break any of a combination like remove VLAN X from RSPAN or Remove VLAN X from the trunk port of Cisco 9200L switches, the OSPF will stable again. I tried with different pair of new 9200L switches. Actually, we have 30 other switches and planning to add a few new 9200L switches. This issue is with all new Cisco. No routing is enabled on it. no client connected on any port on newly added switches.

 

Let me add a point here that dont go with best practice for allowing VLAN X on access switches trunk is good or not. It is not a design requirement but currently, design is like this only.

 

So far what is observed as an SD-WAN device is not getting hello message (issue with Multicast hello only) after adding new switches. Immediate Hello (unicast) is receiving by SD-WAN and trying to establish a neighborship again. The Core switch (4506x) is receiving a hello message without any issue. I can see that the Core switch is showing logs that he can't see himself in the neighborship list in the hello message hence dropping the neighborship and trying to established it again. 

We run multiple sessions with Cisco and internally. We didn't notice any issue with STP, or any other device trying to established a neighborship (unauthorized), DT has been disabled, 9200 switches are with fresh configuration (only AAA, and default added). 

 

The Firmware Version for 9200L is 16.12.4 and Core Switch is 3.8.x. 

 

The Cisco TAC is working on the case for the last 50 days but still no result. We also have a few limitations because we have the same setup on almost 90 locations with the same firmware version and this is the first location in which we started migration from static routing to OSPF. We can't change the firmware version until we will not get any technical proof (due to IT policy) and SD-WAN devices are not supported to non-broadcast network type to change hello to unicast.

 

Did you face the same type of issue? do you have any idea? 

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!
20 Replies 20

Hello

post the output of:

debug ip ospf adj

 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

HI,

Short logs are attached in a text file.

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

balaji.bandi
Hall of Fame
Hall of Fame

9200L  - not that we have any in production, same code we running many at 9300 do not see this issue.

 

Cat 9200 with limited routes and limited functionality as Layer 2 switch most of the deployments i have come across.

 

But never heard it had any issue this kind of issue. do you have a high-level diagram of how these connected?

 

9200L is 16.12.4 and Core Switch is 3.8.x.   ( what is the Core switch here ?) is this in the same location?

 

Like to see the configuration on both ends.

 

as suggested @paul can we get debug logs on both ends to understand the better.

 

Try an uplift version of code one of the test location see if that fixes the issue, quick remediation.

 

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello Deepak,

 

>> Let me add a point here that dont go with best practice for allowing VLAN X on access switches trunk is good or not. It is not a design requirement but currently, design is like this only.

 

But if removing VLAN X  from the list of allowed VLANS to new C9200 avoids the OSPF flapping and VLAN X is not needed there ( if it used only for OSPF peering with SD WAN) you have already a reasonable workaround.

 

How is the topology of VLAN X when the Cat9200 is inserted ?

Is the the new C9200 on the path between core swich and SD WAN appliance in VLAN X?

if not how could the C9200 make the OSPF multicast hellos to be dropped in VLAN X?

 

I see a reasonable workaround and a lof of work done to understand the issue.

I would go with the workaround of not allowing VLAN X to/from trunk to new C9200.

 

 

Hope to help

Giuseppe

 

Hi,

 

SD-WAN devices are directly connected to the Core switch and newly 9200 Switches are just an Access switch. C9200 switches are connected to different ports of the Core switch and are in a different building. 

 

if not how could the C9200 make the OSPF multicast hellos to be dropped in VLAN X?

Ans: This is a mystery right now. Not sure what is the root cause of it.  In the packet capture, We can't see any packet hitting or destination to the 224.0.0.5 or 224.0.0.6 or for any unicast address from the 9200 switches. 

 

I see a reasonable workaround and a lof of work done to understand the issue.

Ans: The client needs a technical explanation as to why would I remove it and what is causing an issue? As the location is working fine with static routing so She needs a root cause. She asked Cisco that "Give me a technical root cause not a workaround". 

I agree with you and currently, OSPF is restored with this workaround only :).

 

Is the the new C9200 on the path between core swich and SD WAN appliance in VLAN X?

No. 9200 series switches are just access switches in flat layer 2 network 

 

 

 

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Hello

SD-WAN devices are directly connected to the Core switch and newly 9200 Switches are just an Access switch. C9200 switches are connected to different ports of the Core switch and are in a different building. 

 

Forgive me here but if the ospf devices are directly connected and not even attaching between each other via the 9200 switches are you saying when you’ve attach the 9200 to the network the ospf peering begins to flap?

I dont see how this would have any bearing on ospf adjacencies unless the 9200 is some how involved. 

 

So the question is how are you connecting the 9200 to the cores? - is it possible you’ve introduced a stp loop for the peering ospf vlan

can you post a topology diagram this-

 

 

 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hi,

Here, The customer is using MST, and all VLANs are members of a single Instant. The Core Switch is RB. There is no STP loop noticed. Only one SD-WAN active device will perform neighborship with the Core-SW. No logs are found for the OSPF issue or no  HA flapping recorded on the SD-WAN devices as well. Even tried with Shutting Down Passive device.

Here, I am attaching a Sample Design because I don't have permission to share the actual Design pic. 

We checked a lot of times, a lot of logs, no STP event noticed in debug or show output or no mac move notification on 9200 or Core switch apart from the Interface UP and Down while connecting the Interface for 9200 series switches.

 

Design - GNS3.jpg

 

Forgive me here but if the ospf devices are directly connected and not even attaching between each other via the 9200 switches are you saying when you’ve attach the 9200 to the network the ospf peering begins to flap?

Yes, it is true. But it is happening with the above-mentioned conditions as If I will remove VLAN X from the RSPAN Source or Remove VLAN X from c9200 Trunk ports, it will stable again. RSPAN Destination is on another switch C2960x.

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Hello


@Deepak Kumar wrote:

 The customer is using MST, and all VLANs are members of a single Instant. The Core Switch is RB. There is no STP loop noticed.

Remove VLAN X from c9200 Trunk ports,

Yes, it is true. But it is happening with the above-mentioned conditions as If I will remove VLAN X from the RSPAN Source or Remove VLAN X from c9200 Trunk ports,


FYI - you should NOT be removing any vlan from any trunks, You are using MST as such stp is per MST instance NOT per vlan, Vlans are in theory dont associate stp which only runs on the instance.
If you start manually pruning from trunks when using MST you can cause blackholing of traffic because you have negated it off the trunk but MST is still forwarding!
Does your MST configuration have parity with all other devices running MST, Do you have multiple MSTI's and/or multiple MST/non MST regions?

 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hi,

We have a single MST domain in the network. Also, we not using any VTP Pruning as well :). 

 

Spoiler
If you start manually pruning from trunks when using MST you can cause blackholing of traffic because you have negated it off the trunk but MST is still forwarding!

 

I agree with you, but we removed VLAN X from the trunk (for c9200 switches) as a workaround for the issue. As I mentioned in my question that we are not removing VLANs from any trunk port.

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Hello

so given your topology - if you disable the trunk towards the 2960x which will leave just the direct connection from core to sd- wan active do you have stable ospf peering and is that access port specified as an stp p2p 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hi,

The trunk for C2960 switches does not make any issue for OSPF.  The trunk ports for C9200 Switches are making an issue and if I will remove VLAN X from the trunk ports that are connecting to C9200, the OSPF will stable.  Yes, Ports are remained P2P.

 

 

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Hello
I would defiantly suggest against pruning any vlan from any trunk and focus on the RSPAN and the c9200s.
Regards your RSPAN how are you performing this? – What direction is it running, Mirroring a vlan in both directions can cause a lot of traffic even more so if you are including multiple vlans

Suggest to use ingress only and use a specific trunk for the span session and not one being used for all your client traffic as the link be can easily overwhelm by the duplication of traffic.

Is the cisco core the stp root if not I would say it needs to be based on your topology diagram, How is the core running - VSS - VPC - Stackwise etc...

Are the C9200 stacked, What software image are they running?
Can you post the configuration of them?

You could try putting the c9200 in their own non stp mst region thus making those trunks to the cisco core stp boundaries?

However if you are reluctant to make any configuration changes one you could perform a physical test by which you disable or unplug individually each port on the 9200 one by one and see if the flapping stops this will determine if you have a misbehaving node attached to the 9200s


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hi Paul, 

Thanks for your reply. Let me answer all queries as:

 

Q: I would defiantly suggest against pruning any vlan from any trunk and focus on the RSPAN and the c9200s.
Regards your RSPAN how are you performing this? – What direction is it running, Mirroring a vlan in both directions can cause a lot of traffic even more so if you are including multiple vlans

Ans: Currently, VLAN pruning is not enabled. RSPAN in both directions. Yes, we have multiple VLANs as well. Here the fact is that We had multiple testing windows to tshoot this issue. During the testing windows, there was only 40 to 50 Mbps traffic was there on RSPAN. Second Question, Why is it stop flapping, if I will remove VLAN X from the trunk port of C9200 Switches. The same setup is working for the last 1 year. Recently, I added 6 new switches (4 9200 and 2 2960)on the same day. Why only 9200 Series of switches making this issue. I didn't remove VLAN X from the newly added 2960 switches but still, it is working. Why issue only started if Will adds VLAN X on the trunk port of any of one 9200 switches. Currently, nothing is connected to C9200 switches. All ports are down. 

 

 

Q: Suggest to use ingress only and use a specific trunk for the span session and not one being used for all your client traffic as the link be can easily overwhelm by the duplication of traffic.

Ans: RSPAN is a security team requirement. The client has a valid question, Why would change my design or configuration without a valid point. I also agree with him. The same setup is working for the last 1 year. 

 

Q: Is the cisco core the stp root if not I would say it needs to be based on your topology diagram, How is the core running - VSS - VPC - Stackwise, etc...

Ans: The core Switch is a root bridge and working on VSS.

 

Q: Are the C9200 stacked, What software image are they running?
Can you post the configuration of them?

Ans: No, 9200 Switches are standalone. The software image is 16.12.4. Let me collect the running configuration today and will share it with you. 

 

Q: However if you are reluctant to make any configuration changes one you could perform a physical test by which you disable or unplug individually each port on the 9200 one by one and see if the flapping stops this will determine if you have a misbehaving node attached to the 9200s

Ans: All ports are down on the 9200 series of switches. Currently, nothing is connected to new switches because we are not sure, what is the root cause. It might affect some factory operations as well. 

 

Let me ask a few non-technical and mixed questions here: 

1. As the RSPAN destination VLANs are Y and Z (because we have two RSPAN sessions), it means that all copied data from different VLANs are in VLANs Y and Z. So why isn't OSPF flapping if I will remove VLAN X from the trunk ports of the C9200 switches? Is the OSPF Hello packet just causing an overload on the C9200 or Core switch? Because VLAN X only has OSPF Hello packets or some other OSPF packets that are in a few KB as multicast that can flood to all ports and the rest of the traffic is unicast, which is guaranteed not to flood if the mac-address-table is stable and up-to-date, as it is there. As the Cisco team already verified that the MAC address table for VLAN X is stable.

2. As per the logs, why the only Multicast hello packets from the core switch are making issues? A unicast hello massage (Immediate Hello packet) is getting delivered and based on that hello message, the core again trying to establish a neighborship/ADJ. And even after that All unicast and a few multicast formalities happening successfully such as empty DBD, DBD packets tec. The OSPF process loading to Full and going down again.

3. If the switch is overload then do you think, increasing the Hello and Dead timer can make OSPF Stable (if there is no traffic in testing windows at least)? But it is not happening. We tried with all possible Hello and Dead timers. Currently, we have 2 and 8 seconds. 

 

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Hello


@Deepak Kumar wrote:

I added 6 new switches (4 9200 and 2 2960)on the same day. Why only 9200 Series of switches making this issue. I didn't remove VLAN X from the newly added 2960 switches but still, it is working. Why issue only started if Will adds VLAN X on the trunk port of any of one 9200 switches. Currently, nothing is connected to C9200 switches. All ports are down.

we not using any VTP Pruning

If I will remove VLAN X from the RSPAN Source or Remove VLAN X from c9200 Trunk ports, it will stable again


Okay so the 9200 have been added along with addtional 2960 switches, You have nothing active on the 9200's only the trunks ports interconnected to the core vss, So I assume you dont have rspan running on the c9200's?

 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul
Review Cisco Networking for a $25 gift card