12-22-2009 03:59 AM - edited 03-06-2019 09:02 AM
I have a strange problem with HSRP running on two Catalyst 4506's
The set up is as follows:
Two 4506's connected by an Etherchannel are hosting VLAN interfaces for all our VLAN's
Each VLAN has an interface on each 4506
These interfaces are paired together using HSRP using a single virtual address
The idea is if one of the 4506's goes down, the other will still keep routing between the VLAN's
This has worked up until yesterday, when one of the 4506's was rebooted in a moment of madness.
Since then, one of the HSRP setups has stopped working. Only one, all the others are working fine. Below is some config info:
Switch 1
--------------
interface Vlan20
ip address 10.44.2.1 255.255.255.0
ip helper-address 10.44.7.12
standby 20 ip 10.44.2.3
standby 20 timers 10 30
standby 20 priority 110
standby 20 preempt
end
#sh standby vlan20
Vlan20 - Group 20
State is Active
2 state changes, last state change 17:39:22
Virtual IP address is 10.44.2.3
Active virtual MAC address is 0000.0c07.ac14
Local virtual MAC address is 0000.0c07.ac14 (v1 default)
Hello time 10 sec, hold time 30 sec
Next hello sent in 8.628 secs
Preemption enabled
Active router is local
Standby router is unknown
Priority 110 (configured 110)
IP redundancy name is "hsrp-Vl20-20" (default)
#sh ip route
......
Gateway of last resort is 10.44.0.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 7 subnets
C 10.44.6.0 is directly connected, Vlan60
C 10.44.7.0 is directly connected, Vlan70
C 10.44.4.0 is directly connected, Vlan40
C 10.44.5.0 is directly connected, Vlan50
C 10.44.2.0 is directly connected, Vlan20
C 10.44.0.0 is directly connected, Vlan1
C 10.44.1.0 is directly connected, Vlan10
S* 0.0.0.0/0 [1/0] via 10.44.0.1
Switch 2
-----------
interface Vlan20
ip address 10.44.2.2 255.255.255.0
ip helper-address 10.44.7.12
standby 20 ip 10.44.2.3
end
#sh standby vlan20
Vlan20 - Group 20
State is Active
2 state changes, last state change 16:15:51
Virtual IP address is 10.44.2.3
Active virtual MAC address is 0000.0c07.ac14
Local virtual MAC address is 0000.0c07.ac14 (v1 default)
Hello time 3 sec, hold time 10 sec
Next hello sent in 2.660 secs
Preemption disabled
Active router is local
Standby router is unknown
Priority 100 (default 100)
IP redundancy name is "hsrp-Vl20-20" (default)
#sh ip route
Gateway of last resort is 10.44.0.4 to network 0.0.0.0
10.0.0.0/24 is subnetted, 7 subnets
C 10.44.6.0 is directly connected, Vlan60
C 10.44.7.0 is directly connected, Vlan70
C 10.44.4.0 is directly connected, Vlan40
C 10.44.5.0 is directly connected, Vlan50
C 10.44.2.0 is directly connected, Vlan20
C 10.44.0.0 is directly connected, Vlan1
C 10.44.1.0 is directly connected, Vlan10
S* 0.0.0.0/0 [1/0] via 10.44.0.4
For all the other VLANs I can ping all three addresses associated with the interface from either switch. For VLAN 20 I can only ping the local and virtual addresses, not the address of the interface on the other switch. Other switches connected to the 4506's can also ping all three addresses.
The Etherchannel between the switches is a trunk, that allows all the VLAN's to pass traffic accross it including VLAN20
I have tried changing the timers on switch one to match those on switch two but this made no difference.
All my trunks are working and passing traffic.
The main problem this is causing is that now some of the workstations on VLAN 20 are unable to route off it to other VLANs and workstations on other VLAN's are unable to connect to certain workstations on VLAN20. Before yesterday, this all worked correctly.
Does anyone have any ideas?
This is confusing me and any help would be appreciated
12-22-2009 05:01 AM
nik.sharp wrote:
I have a strange problem with HSRP running on two Catalyst 4506's
The set up is as follows:
Two 4506's connected by an Etherchannel are hosting VLAN interfaces for all our VLAN's
Each VLAN has an interface on each 4506
These interfaces are paired together using HSRP using a single virtual address
The idea is if one of the 4506's goes down, the other will still keep routing between the VLAN's
This has worked up until yesterday, when one of the 4506's was rebooted in a moment of madness.
Since then, one of the HSRP setups has stopped working. Only one, all the others are working fine. Below is some config info:
Switch 1
--------------
interface Vlan20
ip address 10.44.2.1 255.255.255.0
ip helper-address 10.44.7.12
standby 20 ip 10.44.2.3
standby 20 timers 10 30
standby 20 priority 110
standby 20 preempt
end
#sh standby vlan20
Vlan20 - Group 20
State is Active
2 state changes, last state change 17:39:22
Virtual IP address is 10.44.2.3
Active virtual MAC address is 0000.0c07.ac14
Local virtual MAC address is 0000.0c07.ac14 (v1 default)
Hello time 10 sec, hold time 30 sec
Next hello sent in 8.628 secs
Preemption enabled
Active router is local
Standby router is unknown
Priority 110 (configured 110)
IP redundancy name is "hsrp-Vl20-20" (default)
#sh ip route
......
Gateway of last resort is 10.44.0.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 7 subnets
C 10.44.6.0 is directly connected, Vlan60
C 10.44.7.0 is directly connected, Vlan70
C 10.44.4.0 is directly connected, Vlan40
C 10.44.5.0 is directly connected, Vlan50
C 10.44.2.0 is directly connected, Vlan20
C 10.44.0.0 is directly connected, Vlan1
C 10.44.1.0 is directly connected, Vlan10
S* 0.0.0.0/0 [1/0] via 10.44.0.1
Switch 2
-----------
interface Vlan20
ip address 10.44.2.2 255.255.255.0
ip helper-address 10.44.7.12
standby 20 ip 10.44.2.3
end
#sh standby vlan20
Vlan20 - Group 20
State is Active
2 state changes, last state change 16:15:51
Virtual IP address is 10.44.2.3
Active virtual MAC address is 0000.0c07.ac14
Local virtual MAC address is 0000.0c07.ac14 (v1 default)
Hello time 3 sec, hold time 10 sec
Next hello sent in 2.660 secs
Preemption disabled
Active router is local
Standby router is unknown
Priority 100 (default 100)
IP redundancy name is "hsrp-Vl20-20" (default)
#sh ip route
Gateway of last resort is 10.44.0.4 to network 0.0.0.0
10.0.0.0/24 is subnetted, 7 subnets
C 10.44.6.0 is directly connected, Vlan60
C 10.44.7.0 is directly connected, Vlan70
C 10.44.4.0 is directly connected, Vlan40
C 10.44.5.0 is directly connected, Vlan50
C 10.44.2.0 is directly connected, Vlan20
C 10.44.0.0 is directly connected, Vlan1
C 10.44.1.0 is directly connected, Vlan10
S* 0.0.0.0/0 [1/0] via 10.44.0.4
For all the other VLANs I can ping all three addresses associated with the interface from either switch. For VLAN 20 I can only ping the local and virtual addresses, not the address of the interface on the other switch. Other switches connected to the 4506's can also ping all three addresses.
The Etherchannel between the switches is a trunk, that allows all the VLAN's to pass traffic accross it including VLAN20
I have tried changing the timers on switch one to match those on switch two but this made no difference.
All my trunks are working and passing traffic.
The main problem this is causing is that now some of the workstations on VLAN 20 are unable to route off it to other VLANs and workstations on other VLAN's are unable to connect to certain workstations on VLAN20. Before yesterday, this all worked correctly.
Does anyone have any ideas?
This is confusing me and any help would be appreciated
I appreciate you said you changed the timers but are your other vlans which are working using the same timers on both switches. Having the timers set the way you do on vlan 20 would indeed cause problems. When you changed the timers did you do a shut/no shut on the interface ?
Jon
12-22-2009 06:54 AM
Jon,
All the other VLANs are working and have their timers set to 10 30 on both interfaces. I assume this is because they can see each other.
I beleive I did do a shut/no shut but you have now sown a seed of doubt. I will try this later today when I can shut the interface - not possible now as it disrupts production.
I will post back when I have done this
Thanks
Nik
12-22-2009 07:36 AM
Dear Nik,
Have you ran a debug on Switches to see if the HSRP hellos are getting there? Could be a spanning tree issue.
Are you using different HSRP group number on each Vlan ?
Can you provide the following outputs from both switches:
# sh run | inc spanning
# sh vtp status
Try to enable preemption on both switches for testing and set the same timers as you set on other vlans.
It also might be the IP conflict as well with a devices on the network. Check the switch logs also.
Regards,
Anser
12-22-2009 08:28 AM
Dear Anser
Thanks for the suggestions. All the VLANS use different HSRP numbers and I have checked for IP address conflicts.
I have run debug on both switches and enclose the output as well as other output requested. From the debug it looks like switch 2 is not getting the HSRP messages from Switch1 for this VLAN but only this VLAN, there are only hello outs fro VLAN 20
Switch 1
1d04h: HSRP: Vl20 Grp 20 Hello in 10.44.2.2 Active pri 100 vIP 10.44.2.3
1d04h: HSRP: Vl20 Grp 20 Coup out 10.44.2.1 Active pri 110 vIP 10.44.2.3
1d04h: HSRP: Vl20 Grp 20 Hello out 10.44.2.1 Active pri 110 vIP 10.44.2.3
#sh vtp status
VTP Version : running VTP2
Configuration Revision : 24
Maximum VLANs supported locally : 1005
Number of existing VLANs : 14
VTP Operating Mode : Server
VTP Domain Name : Denby
VTP Pruning Mode : Enabled
VTP V2 Mode : Enabled
VTP Traps Generation : Disabled
MD5 digest : 0x92 0xCD 0x63 0x6F 0xEA 0x3E 0x9A 0x47
Configuration last modified by 10.44.0.4 at 12-22-09 01:18:26
Local updater ID is 10.44.0.4 on interface Vl1 (lowest numbered VLAN interface found)
#sh run | inc spanning
spanning-tree mode rapid-pvst
spanning-tree portfast bpduguard default
spanning-tree portfast bpdufilter default
spanning-tree extend system-id
spanning-tree backbonefast
spanning-tree vlan 1,20,60 priority 28672
spanning-tree vlan 10,40,50,70 priority 24576
spanning-tree vlan 1,10,20,30,40,50,60,70 forward-time 9
spanning-tree vlan 1,10,20,30,40,50,60,70 max-age 12
spanning-tree portfast
spanning-tree bpduguard enable
Switch 2
31w6d: HSRP: Vl20 Grp 20 Hello out 10.44.2.2 Active pri 100 vIP 10.44.2.3
#sh vtp status
VTP Version : running VTP2
Configuration Revision : 24
Maximum VLANs supported locally : 1005
Number of existing VLANs : 14
VTP Operating Mode : Client
VTP Domain Name : Denby
VTP Pruning Mode : Enabled
VTP V2 Mode : Enabled
VTP Traps Generation : Disabled
MD5 digest : 0x92 0xCD 0x63 0x6F 0xEA 0x3E 0x9A 0x47
Configuration last modified by 10.44.0.4 at 12-22-09 01:18:26
#sh run | inc spanning
spanning-tree mode rapid-pvst
spanning-tree portfast bpduguard default
spanning-tree portfast bpdufilter default
spanning-tree extend system-id
spanning-tree backbonefast
spanning-tree vlan 1,20,60 priority 24576
spanning-tree vlan 10,40,50,70 priority 28672
spanning-tree vlan 1,10,20,30,40,50,60,70 forward-time 9
spanning-tree vlan 1,10,20,30,40,50,60,70 max-age 12
spanning-tree portfast
spanning-tree bpduguard enable
spanning-tree portfast
This seems to have identified the problem, not sure what the solution is.
Regards
Nik
12-22-2009 08:57 AM
Can you provide two more outputs from both switches:
#sh spanning-tree vlan 20
#sh spanning-tree inconsistentports | inc 20
12-22-2009 09:11 AM
Switch 1
#sh spann vlan 20
VLAN0020
Spanning tree enabled protocol rstp
Root ID Priority 24596
Address 0022.90a4.efc0
Cost 3
Port 642 (Port-channel2)
Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec
Bridge ID Priority 28692 (priority 28672 sys-id-ext 20)
Address 0022.55bf.4a00
Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec
Aging Time 300
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Po2 Root FWD 3 128.642 P2p
Po3 Desg FWD 3 128.643 P2p
Po4 Desg FWD 3 128.644 P2p
#sh spann incon | inc 20
#
Switch 2
#sh spann vlan 20
VLAN0020
Spanning tree enabled protocol rstp
Root ID Priority 24596
Address 0022.90a4.efc0
This bridge is the root
Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec
Bridge ID Priority 24596 (priority 24576 sys-id-ext 20)
Address 0022.90a4.efc0
Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec
Aging Time 300
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Po2 Desg FWD 3 128.642 P2p
Po3 Desg FWD 3 128.643 P2p
Po4 Desg FWD 3 128.644 P2p
#
May not be able to pick up on this until 24th now so may be a time lag if more info is needed
12-22-2009 11:38 AM
Hi Nik
One thing i noticed was the core switch 1 has VTP server configured, and core switch 2 has VTP clients.. Is core 1 the only vtp server on the network ? with server-client vtp architectures, it is advisible to have both the core switches configured as servers, so that even if one goes down, you will have another VTP server where you can add VLANs , if required..
I think the HSRP messages are clear:
Active router is local
Standby router is unknown
As you said, since the hellos were lost, both the switches see themselves as ACTIVE... do you monitor the bandwidth on the trunk of the switches ? we have had similar cases, where hsrp hellos have failed because of high bandwidth on the l2 trunk, because of spanning tree loops.. even in that case, is vlan 20 the only vlan which lost connectivity after the switch came up ? btw, did you say switch 1 went down, and VLAN 20 wasnt reachable from switch 2 ? Just wanted to confirm the exact issue...
I think you should follow what Glen suggested on looking for spanning tree blockages on the network... Also do you see anything suspicious in "Show log" output from your 4500 switches?
Regards
Raj
12-22-2009 09:13 AM
Things to check . Do a "show stand brief" and see if each switch sees the other side . Which side is the active side ? Make sure all your spanning tree roots are on the hsrp active side with standby side as secondary . If there is a particular switch hung off these that is having a problem , check the root on that switch for vlan 20 and see if it is the correct uplink to the 4500's . On your lower switches do a show spanning tree blockedports and see if there are any blockports that maybe should not be blocked. Also in most of our implementations we use the preempt command on both sides in order to have it work correctly , I see it is off on one side of your config. Both sides think they are the active router so there is no L2 path between your 4500's either over the trunk between them or thru a access switch hung off the 4500's. I would look around and see if vlan 20 is blocked via spanning tree somewhere . I would also check to verify that vlan 20 is allowed on the trunk between the 4500's on both sides of that link .
02-02-2010 03:05 AM
This problem has now been resolved by powering off both the core switches and then powering on Switch1 first, allowing it to come up completely, then bringing up Switch2. When this was done everthing started working correctly.
Thanks for all the suggestions and advice, much appreciated. Unfortunately, I will never know what was causing this.
Nik
02-02-2010 06:34 AM
Nik
Thanks for letting the group know.. its quite strange, but thats how some troubleshooting ends ;) Let us know if you have any more queries.
Raj
02-02-2010 12:06 PM
Can you please post a copy of your show inter vlan 20 for both switches after the reboot.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide