Solved: HSRP

Kangalala · ‎08-24-2016

hi guys,

I have set up an hsrp between 2 Catalyst 4000 L3 Switch Software (cat4000-I9S-M and has been giving me problems for the past 2 weeks. I have the topology attached.

The hsrp Lab is working fine when one switch port track interface goes down the other switch takes over as the active one.

but when preempt makes the high priority switch take over after interface has come up, users from certain network stop accessing the Internet. For my switch Core_2 it works fine but for CORE_1.

topology attached.

milan.kulik · ‎08-25-2016

Hi,

I would not say "...users from certain network stop accessing the Internet."

As the hosts #8 - 16 in your tracert outputs are using public IP adresses, so they are within the Internet, aren't they?

So the users should be able to reach some Internet destinations, I guess?

I'm just guessing but maybe there is some issue between your Core switches and your FW and the packets are sent from the FW to the other Core switch when returning from the Internet?

Best regards,

Milan

View solution in original post

Kangalala · ‎08-24-2016

My trace of a subnet that is using CORE 1 as hsrp gateway before interface goes down

C:\Users\lucas>tracert -d 8.8.8.8

Tracing route to 8.8.8.8 over a maximum of 30 hops

1    <1 ms     1 ms     1 ms 192.168.9.1
2     1 ms     1 ms     1 ms 10.0.100.1
3     1 ms     1 ms     1 ms 10.0.101.1
4     4 ms     2 ms     1 ms 192.168.1.1
5     *        *        *     Request timed out.
6    97 ms    78 ms    82 ms 10.16.16.3
7    74 ms   133 ms    92 ms 10.16.17.1
8   245 ms   217 ms   213 ms 41.72.61.70
9   281 ms   238 ms   273 ms 197.149.148.105
10   359 ms   313 ms   272 ms 197.149.151.4
11   337 ms   327 ms   327 ms 185.148.112.22
12   324 ms   353 ms   357 ms 193.136.250.20
13   321 ms   293 ms   317 ms 216.239.49.242
14   320 ms   357 ms   347 ms 209.85.245.237
15   293 ms   307 ms   287 ms 216.239.57.227
16   388 ms   362 ms   343 ms 216.239.62.153
17     *        *        *     Request timed out.
18   436 ms   357 ms   364 ms 8.8.8.8

Trace complete.

after intface goes down and comes back up I receive hsrp confirming its active switch for the certain subnet but still not able
to reach internet.

CORE_1_TESTE#
01:25:51: %HSRP-6-STATECHANGE: Vlan6 Grp 5 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan7 Grp 6 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan8 Grp 7 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan10 Grp 9 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan12 Grp 18 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan19 Grp 11 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan24 Grp 23 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan17 Grp 205 state Standby -> Active
01:25:51: %HSRP-6-STATECHANGE: Vlan45 Grp 45 state Standby -> Active

trace results

C:\Users\lucas>tracert -d 8.8.8.8

Tracing route to 8.8.8.8 over a maximum of 30 hops

1     1 ms    18 ms    20 ms 192.168.9.1
2    <1 ms     1 ms     1 ms 10.0.100.1
3     1 ms     1 ms     1 ms 10.0.101.1
4     2 ms     2 ms     1 ms 192.168.1.1
5     *        *        *     Request timed out.
6    98 ms   157 ms    97 ms 10.16.16.3
7   117 ms   127 ms    98 ms 10.16.17.1
8   156 ms   113 ms   142 ms 41.72.61.70
9   190 ms   192 ms   217 ms 197.149.148.105
10   293 ms   352 ms   288 ms 197.149.151.4
11   268 ms   298 ms   336 ms 185.148.112.22
12   329 ms   312 ms   337 ms 193.136.250.20
13   327 ms   358 ms     *     216.239.49.242
14   209 ms   217 ms   227 ms 209.85.245.237
15   298 ms   222 ms   232 ms 216.239.57.227
16   357 ms   312 ms     *     216.239.62.153
17     *        *        *     Request timed out.
18     *        *        *     Request timed out.
19     *        *        *     Request timed out.
20     *        *        *     Request timed out.
21     *        *        *     Request timed out.
22     *        *        *     Request timed out.
23     *        *        *     Request timed out.
24     *        *        *     Request timed out.
25     *        *        *     Request timed out.
26     *        *        *     Request timed out.
27     *        *        *     Request timed out.
28     *        *        *     Request timed out.
29     *        *        *     Request timed out.
30     *        *        *     Request timed out.

Trace complete.

milan.kulik · ‎08-25-2016

Hi,

I would not say "...users from certain network stop accessing the Internet."

As the hosts #8 - 16 in your tracert outputs are using public IP adresses, so they are within the Internet, aren't they?

So the users should be able to reach some Internet destinations, I guess?

I'm just guessing but maybe there is some issue between your Core switches and your FW and the packets are sent from the FW to the other Core switch when returning from the Internet?

Best regards,

Milan

Kangalala · ‎08-25-2016

could it be loop happening on the network? but for subnets on my core_2 does not happen when track interface goes down and restores, all works fine. Im running ospf as a routing protocol on my 2 core switches upwards on the topology, on my firewall I have the 2 interfaces facing Core switches as inside.

milan.kulik · ‎08-25-2016

Hi,

what are the exact symptoms?

When the track interface goes Down on Core1 switch, HSRP moves the Active interface to Core2 and users are able to connect to the Internet with no problem?

But when the track interface goes Up again, the users are not able to connect to the Internet at all? Or to some destinations only? Or does the connection recover after some time again?

How is the routing between your FW and the Core switches done in details?

When the track intrface goes Down on Core1, is the traffic for the subnets behind Core1 switch forwarded from the FW to Core2? Or still to Core1?

Are there any suspicious messages visible in the FW log?

Best regards,

Milan

Kangalala · ‎08-25-2016

1. when Core 2 asumes active for subnets under Core 1 all works fine and users are still able to access internet. but when Core 1 restores and assumes back the role as active users from subnet get time out and stop accessing the net.

2. OSPF between devices, firewall and 2 core switches exchange routes just fine.

3. no message log at all.

Im starting to think it has to do with the image, because subnets on CORE 2 works just fine when active or standby.

Kangalala · ‎08-25-2016

im using Cat image on my layer 3 switches, I was wondering if I could use the same image on both switches because they are using different ones. starting to think about bug.

milan.kulik · ‎08-26-2016

Hi,

ad 1. ...when Core 1 restores and assumes back the role as active users from subnet get time out and stop accessing the net.

So the users are not able to connect to any Internet destination at that time?

And it does not recover after some time?

ad 2.

I can see only

router ospf 1
log-adjacency-changes
network 10.0.10.0 0.0.0.255 area 0
network 192.168.5.0 0.0.0.255 area 0
network 192.168.7.0 0.0.0.255 area 0
network 192.168.9.0 0.0.0.255 area 0

in Core 1 config but more network ... commands in Core 2.

Shouldn't that be the same in both configs?

ad 3. sure it's better to run the same image on both swtches.

BR,

Milan

Kangalala · ‎08-26-2016

Hi Milan ,

Im running tests on subnet 7 that is why I only added a few subnets to ospf. I will upgrade the switches with same IOS and try again.

Kangalala · ‎08-26-2016

Do you think that perhaps it could be asymmetric routing happening on my Firewall, Im using ASA 5520 (8.4). How could could I go about making sure it is not asymmetric routing according to my topology?

milan.kulik · ‎08-26-2016

Yes, that was one of possibilities.

Or a problem with some next-hop MAC address.

That's why I was asking several times:

When the track interface goes Up again, the users are not able to connect to the Internet at all? Or to some destinations only? Or does the connection recover after some time again?

BR,

Milan

Kangalala · ‎08-30-2016

It was defenetly something wrong with the firewall and my core switches, firewall statefull inspection so I had to change my topology in order for the firewall to have only 1 inside interface and still connect both Core switches. All is working fine now and thanks for your support Milan.

New topology attached