OSPF NSSA Switches Looping back each other

vishal agavane · ‎11-17-2017

Hi,

We have four core switches connected in a ring as mentioned below.

(Core-switch-1 <---Vlan-1---> Core-switch-2 <---Vlan-2--->Core-switch-3 <---Vlan-3--->Core-switch-4 <---Vlan-4--->Core-switch-1)

We are running RSTP in between these switches, we have dedicated VLAN for each core switch connectivity and same vlan is used for OSFP neighborship, all four switches are in area 0. Reason behind keeping all core switches in layer-2 for ospf neighborship is we have 15 different vendor application and their redundant servers are connected across core switches at different location and application vendor want all of them into same subnet and vlan, we are using VRRP for gateway redundancy and all servers are belongs to same vlan and subnet.

Core switch-1 and 2 is connected to few access switches, each access switch has dual uplink to core switch-1 and 2, this uplink is layer-3 interface. all these access switches are in Area 100 for which we have created NSSA, we required all access switches should have only default routes in their routing table.

NSSA configuration on core switch

router ospf 1

network x.x.x.x y.y.y.y area 100

area 100 nssa no-summary

NSSA on Edge switches

router os 1

network x.x.x.x y.y.y.y area 100

area 100 nssa

---------------------------

With about topology and configuration we are facing one issue, when both uplink from Edge access switches are connected to their respective core switches we can able to ping all servers connected on core switches, however whenever we remove any single uplinks we could see ping loss for some of the servers in the network, but VRRP ip address (Gateway) is reachable in all cases. same situation if u do trace route from the server machine which is unable to reach we could see that server is sending packets to core switches then core switches send that packet back to other edge switch in area 100, receiving switch send that packet back to core switches then again core switch to the same with other edge switch in area 100, its happening with many switches in area 100 and then packet gets drop?

If area 100 switches are configured as NSSA and only having default route in their routing table then why core switches are sending packets to those switches? is it normal behavior?

All switches in area 0 and area 100 are forming ospf neighbourship.

do we need to give ip ospf network broadcast & ip ospf mtu-ignore command under Vlan interface which are used to form OSFP neighbourship? any other specific command required on trunk ports which is used to form OSPF neighbourship?

Can you please suggest what troubleshoot i have to do to find our root cause of this issue? as per above topology and requirement do we need to have different configuration for OSPF?

cofee · ‎11-18-2017

Hello Vishal,

Most likely you must be using Ethernet ports to uplink edge switches to Core switches, so in that case network type would be broadcast by default anyways and if the issue was related to MTU then peering between ospf neighbors should have stuck at exchange database, wouldn't fully converge.

You mentioned when one of the redundant connection is removed some of the servers that are connected to edge switches lose connectivity, but end hosts have connectivity to the default gateway. Did you hop on to edge switch and then upstream (core switch) to make sure if they have route to the destination address/prefix and also to source address/prefix? you can also do a traceroute to see where the packet is dropping.

Let me know if I misunderstood your question/topology.

vishal agavane · ‎11-19-2017

Thanks for your reply.

All servers are connected on core switches. Core switches has uplink to access switches which is configured with layer-3 link (Interface ip address, not using any VLAN).

I always see OSPF not sending traffic through layer-2 link/Vlan between core switches, its prefer to use layer-3 link between core and access switch to pass traffic, for example i have 2 nos access switches both are connected to core switch with dual uplink (AS-1 uplink-1 to CS-1 and uplink-2 to CS-2, same for other access switch, both core switches are connected through Layer-2 trunk/Vlan) if i remove uplink-1 from access switch-1 which is going to cs-1 and then if i use tracert from pc connected to as-1 then traffic should go to cs-2 and cs-2 should use trunk port to pass that traffic to cs-1 but tracert shows that pc request goes to cs-2 and cs-2 use AS-2 uplink to reach cs-1 due to which latency also get increased? cs-1 and cs-2 neighbourship is working on Vlan/Trunk port

But if i connect my laptop directly to directly on cs-1 and try to trace cs-2 loopback i could see trunk been used for traffic?

I have given priority-0 to all access switch uplink and only keep priority 255 on vlan interface of cs-1 (DR) and 254 and cs-2 (BDR) however OSPF not considering vlan interface priority it select DR based on highest loopback ip address not based on configured Vlan priority?

I tried to give priority on loopback interface but i couldn't see priority set to Loopback interface in show run or sh ip os ne output but command "ip ospf priority" is accepted on loopback interface? as per my knowledge if we setup router-id then highest router id would be selected as DR and so but in my case its not happening?

Current IOS running on core switches is "cat4500es8-universalk9-m release 03.09.00.E"

Your help to understand this issue would be appreciated.

cofee · ‎11-19-2017

After changing ospf priority, did you clear ospf process? Because without resetting adjacency ospf priority setting won't make any difference.

vishal agavane · ‎11-19-2017

Yes we have cleared ip os process and restart all switches in the network but no control on DR BDR selection.

I couldn't understand why Layer-3 interface is preferring over Layer-2 trunk/Vlan? we have 10 gbps link between core switches (Trunk/Vlan) and rest all are 1 gbps which are going toward access switches (Layer-3 interface). Due to this behavior our network latency increased and the reason behind ping drop i suspect TTL expire.

cofee · ‎11-20-2017

That's strange. OSPF speaking routers configured with a priority of 0 shouldn't even participate in the DR/BDR election process and therefore would never become DR/BDR. But remember, OSPF speakers with a non zero priority may not come up with desired DR/BDR selection if they are not booted in the right sequence. You should start with the switch that you want to be elected as DR.

Why a 10 gig is preferred over a 1 gig interface? I think this has to do with the default reference bandwidth of 100 MBIT used by ospf, therefore it can't tell the difference between a 100mbit/1gig or 10 gig. You will either have to change the reference bandwidth so ospf can differentiate between a 1gig and 10 gig port or you manipulate the cost on the interface itself. If you decide to change the default reference bandwidth then it would need to changed on all the ospf routers in that domain.

I hope this helps you.

Saurabh Gera · ‎11-20-2017

Does the Server IP address and Switch Connected Vlan are in Same Range.

Is it feasible for you to upload topology Diagram ?

Saurabh Gera · ‎11-20-2017

Does the Server IP address and Switch Connected Vlan are in Same Range.

vishal agavane · ‎11-20-2017

Does the Server IP address and Switch Connected Vlan are in Same Range.

- Yes area-0 neighbor ship are forming over vlan-20 and same vlan used for server connectivity. All core switches connected to each other over 10 gbps trunk uplink.

----------

Why a 10 gig is preferred over a 1 gig interface? I think this has to do with the default reference bandwidth of 100 MBIT used by ospf, therefore it can't tell the difference between a 100mbit/1gig or 10 gig. You will either have to change the reference bandwidth so ospf can differentiate between a 1gig and 10 gig port or you manipulate the cost on the interface itself. If you decide to change the default reference bandwidth then it would need to changed on all the ospf routers in that domain.

- We have applied auto reference bandwidth command but no luck. How could we define bandwidth/cost on Trunk port?

--------

If you could refer attached topology, if we connect laptop on AS-1 and remove one uplink going toward CS-1. In this case normally traffic has to to go CS-2 and then it should use Trunk port connected between CS-1 and CS-2 and reach CS-2. but here if i could tracert to CS-1 loopback IP then i can see traffic goes to CS-2 then it will send to AS-2 and through AS-2 it reach to CS-1, some time traffic goes to AS-1 then come back to CS-2 then again CS-2 send that trafic to AS-3 and so on traffic travel and reach CS-1. Why Trunk port is not consider to send traffic? trunk port has 10 gbps uplink and rest AS has 1 gbps uplink.