Re: HSRP problem on 4506 switches

nik.sharp · ‎12-22-2009

I have a strange problem with HSRP running on two Catalyst 4506's

The set up is as follows:

Two 4506's connected by an Etherchannel are hosting VLAN interfaces for all our VLAN's

Each VLAN has an interface on each 4506

These interfaces are paired together using HSRP using a single virtual address

The idea is if one of the 4506's goes down, the other will still keep routing between the VLAN's

This has worked up until yesterday, when one of the 4506's was rebooted in a moment of madness.

Since then, one of the HSRP setups has stopped working. Only one, all the others are working fine. Below is some config info:

Switch 1

--------------

interface Vlan20

ip address 10.44.2.1 255.255.255.0

ip helper-address 10.44.7.12

standby 20 ip 10.44.2.3

standby 20 timers 10 30

standby 20 priority 110

standby 20 preempt

end

#sh standby vlan20

Vlan20 - Group 20

State is Active

2 state changes, last state change 17:39:22

Virtual IP address is 10.44.2.3

Active virtual MAC address is 0000.0c07.ac14

Local virtual MAC address is 0000.0c07.ac14 (v1 default)

Hello time 10 sec, hold time 30 sec

Next hello sent in 8.628 secs

Preemption enabled

Active router is local

Standby router is unknown

Priority 110 (configured 110)

IP redundancy name is "hsrp-Vl20-20" (default)

#sh ip route

......

Gateway of last resort is 10.44.0.1 to network 0.0.0.0

10.0.0.0/24 is subnetted, 7 subnets

C 10.44.6.0 is directly connected, Vlan60

C 10.44.7.0 is directly connected, Vlan70

C 10.44.4.0 is directly connected, Vlan40

C 10.44.5.0 is directly connected, Vlan50

C 10.44.2.0 is directly connected, Vlan20

C 10.44.0.0 is directly connected, Vlan1

C 10.44.1.0 is directly connected, Vlan10

S* 0.0.0.0/0 [1/0] via 10.44.0.1

Switch 2

-----------

interface Vlan20

ip address 10.44.2.2 255.255.255.0

ip helper-address 10.44.7.12

standby 20 ip 10.44.2.3

end

#sh standby vlan20

Vlan20 - Group 20

State is Active

2 state changes, last state change 16:15:51

Virtual IP address is 10.44.2.3

Active virtual MAC address is 0000.0c07.ac14

Local virtual MAC address is 0000.0c07.ac14 (v1 default)

Hello time 3 sec, hold time 10 sec

Next hello sent in 2.660 secs

Preemption disabled

Active router is local

Standby router is unknown

Priority 100 (default 100)

IP redundancy name is "hsrp-Vl20-20" (default)

#sh ip route

Gateway of last resort is 10.44.0.4 to network 0.0.0.0

10.0.0.0/24 is subnetted, 7 subnets

C 10.44.6.0 is directly connected, Vlan60

C 10.44.7.0 is directly connected, Vlan70

C 10.44.4.0 is directly connected, Vlan40

C 10.44.5.0 is directly connected, Vlan50

C 10.44.2.0 is directly connected, Vlan20

C 10.44.0.0 is directly connected, Vlan1

C 10.44.1.0 is directly connected, Vlan10

S* 0.0.0.0/0 [1/0] via 10.44.0.4

For all the other VLANs I can ping all three addresses associated with the interface from either switch. For VLAN 20 I can only ping the local and virtual addresses, not the address of the interface on the other switch. Other switches connected to the 4506's can also ping all three addresses.

The Etherchannel between the switches is a trunk, that allows all the VLAN's to pass traffic accross it including VLAN20

I have tried changing the timers on switch one to match those on switch two but this made no difference.

All my trunks are working and passing traffic.

The main problem this is causing is that now some of the workstations on VLAN 20 are unable to route off it to other VLANs and workstations on other VLAN's are unable to connect to certain workstations on VLAN20. Before yesterday, this all worked correctly.

Does anyone have any ideas?

This is confusing me and any help would be appreciated

Jon Marshall · ‎12-22-2009

nik.sharp wrote:

I have a strange problem with HSRP running on two Catalyst 4506's

The set up is as follows:

Two 4506's connected by an Etherchannel are hosting VLAN interfaces for all our VLAN's

Each VLAN has an interface on each 4506

These interfaces are paired together using HSRP using a single virtual address

The idea is if one of the 4506's goes down, the other will still keep routing between the VLAN's

This has worked up until yesterday, when one of the 4506's was rebooted in a moment of madness.

Since then, one of the HSRP setups has stopped working. Only one, all the others are working fine. Below is some config info:

Switch 1

--------------

interface Vlan20

ip address 10.44.2.1 255.255.255.0

ip helper-address 10.44.7.12

standby 20 ip 10.44.2.3

standby 20 timers 10 30

standby 20 priority 110

standby 20 preempt

end

#sh standby vlan20

Vlan20 - Group 20

State is Active

2 state changes, last state change 17:39:22

Virtual IP address is 10.44.2.3

Active virtual MAC address is 0000.0c07.ac14

Local virtual MAC address is 0000.0c07.ac14 (v1 default)

Hello time 10 sec, hold time 30 sec

Next hello sent in 8.628 secs

Preemption enabled

Active router is local

Standby router is unknown

Priority 110 (configured 110)

IP redundancy name is "hsrp-Vl20-20" (default)

#sh ip route

......

Gateway of last resort is 10.44.0.1 to network 0.0.0.0

10.0.0.0/24 is subnetted, 7 subnets

C 10.44.6.0 is directly connected, Vlan60

C 10.44.7.0 is directly connected, Vlan70

C 10.44.4.0 is directly connected, Vlan40

C 10.44.5.0 is directly connected, Vlan50

C 10.44.2.0 is directly connected, Vlan20

C 10.44.0.0 is directly connected, Vlan1

C 10.44.1.0 is directly connected, Vlan10

S* 0.0.0.0/0 [1/0] via 10.44.0.1

Switch 2

-----------

interface Vlan20

ip address 10.44.2.2 255.255.255.0

ip helper-address 10.44.7.12

standby 20 ip 10.44.2.3

end

#sh standby vlan20

Vlan20 - Group 20

State is Active

2 state changes, last state change 16:15:51

Virtual IP address is 10.44.2.3

Active virtual MAC address is 0000.0c07.ac14

Local virtual MAC address is 0000.0c07.ac14 (v1 default)

Hello time 3 sec, hold time 10 sec

Next hello sent in 2.660 secs

Preemption disabled

Active router is local

Standby router is unknown

Priority 100 (default 100)

IP redundancy name is "hsrp-Vl20-20" (default)

#sh ip route

Gateway of last resort is 10.44.0.4 to network 0.0.0.0

10.0.0.0/24 is subnetted, 7 subnets

C 10.44.6.0 is directly connected, Vlan60

C 10.44.7.0 is directly connected, Vlan70

C 10.44.4.0 is directly connected, Vlan40

C 10.44.5.0 is directly connected, Vlan50

C 10.44.2.0 is directly connected, Vlan20

C 10.44.0.0 is directly connected, Vlan1

C 10.44.1.0 is directly connected, Vlan10

S* 0.0.0.0/0 [1/0] via 10.44.0.4

For all the other VLANs I can ping all three addresses associated with the interface from either switch. For VLAN 20 I can only ping the local and virtual addresses, not the address of the interface on the other switch. Other switches connected to the 4506's can also ping all three addresses.

The Etherchannel between the switches is a trunk, that allows all the VLAN's to pass traffic accross it including VLAN20

I have tried changing the timers on switch one to match those on switch two but this made no difference.

All my trunks are working and passing traffic.

The main problem this is causing is that now some of the workstations on VLAN 20 are unable to route off it to other VLANs and workstations on other VLAN's are unable to connect to certain workstations on VLAN20. Before yesterday, this all worked correctly.

Does anyone have any ideas?

This is confusing me and any help would be appreciated

I appreciate you said you changed the timers but are your other vlans which are working using the same timers on both switches. Having the timers set the way you do on vlan 20 would indeed cause problems. When you changed the timers did you do a shut/no shut on the interface ?

Jon

nik.sharp · ‎12-22-2009

Jon,

All the other VLANs are working and have their timers set to 10 30 on both interfaces. I assume this is because they can see each other.

I beleive I did do a shut/no shut but you have now sown a seed of doubt. I will try this later today when I can shut the interface - not possible now as it disrupts production.

I will post back when I have done this

Thanks

Nik

Muhammad Anser Khan · ‎12-22-2009

Dear Nik,

Have you ran a debug on Switches to see if the HSRP hellos are getting there? Could be a spanning tree issue.

Are you using different HSRP group number on each Vlan ?

Can you provide the following outputs from both switches:

# sh run | inc spanning

# sh vtp status

Try to enable preemption on both switches for testing and set the same timers as you set on other vlans.

It also might be the IP conflict as well with a devices on the network. Check the switch logs also.

Regards,

Anser

nik.sharp · ‎12-22-2009

Dear Anser

Thanks for the suggestions. All the VLANS use different HSRP numbers and I have checked for IP address conflicts.

I have run debug on both switches and enclose the output as well as other output requested. From the debug it looks like switch 2 is not getting the HSRP messages from Switch1 for this VLAN but only this VLAN, there are only hello outs fro VLAN 20

Switch 1

1d04h: HSRP: Vl20 Grp 20 Hello in 10.44.2.2 Active pri 100 vIP 10.44.2.3

1d04h: HSRP: Vl20 Grp 20 Coup out 10.44.2.1 Active pri 110 vIP 10.44.2.3

1d04h: HSRP: Vl20 Grp 20 Hello out 10.44.2.1 Active pri 110 vIP 10.44.2.3

#sh vtp status

VTP Version : running VTP2

Configuration Revision : 24

Maximum VLANs supported locally : 1005

Number of existing VLANs : 14

VTP Operating Mode : Server

VTP Domain Name : Denby

VTP Pruning Mode : Enabled

VTP V2 Mode : Enabled

VTP Traps Generation : Disabled

MD5 digest : 0x92 0xCD 0x63 0x6F 0xEA 0x3E 0x9A 0x47

Configuration last modified by 10.44.0.4 at 12-22-09 01:18:26

Local updater ID is 10.44.0.4 on interface Vl1 (lowest numbered VLAN interface found)

#sh run | inc spanning

spanning-tree mode rapid-pvst

spanning-tree portfast bpduguard default

spanning-tree portfast bpdufilter default

spanning-tree extend system-id

spanning-tree backbonefast

spanning-tree vlan 1,20,60 priority 28672

spanning-tree vlan 10,40,50,70 priority 24576

spanning-tree vlan 1,10,20,30,40,50,60,70 forward-time 9

spanning-tree vlan 1,10,20,30,40,50,60,70 max-age 12

spanning-tree portfast

spanning-tree bpduguard enable

Switch 2

31w6d: HSRP: Vl20 Grp 20 Hello out 10.44.2.2 Active pri 100 vIP 10.44.2.3

#sh vtp status

VTP Version : running VTP2

Configuration Revision : 24

Maximum VLANs supported locally : 1005

Number of existing VLANs : 14

VTP Operating Mode : Client

VTP Domain Name : Denby

VTP Pruning Mode : Enabled

VTP V2 Mode : Enabled

VTP Traps Generation : Disabled

MD5 digest : 0x92 0xCD 0x63 0x6F 0xEA 0x3E 0x9A 0x47

Configuration last modified by 10.44.0.4 at 12-22-09 01:18:26

#sh run | inc spanning

spanning-tree mode rapid-pvst

spanning-tree portfast bpduguard default

spanning-tree portfast bpdufilter default

spanning-tree extend system-id

spanning-tree backbonefast

spanning-tree vlan 1,20,60 priority 24576

spanning-tree vlan 10,40,50,70 priority 28672

spanning-tree vlan 1,10,20,30,40,50,60,70 forward-time 9

spanning-tree vlan 1,10,20,30,40,50,60,70 max-age 12

spanning-tree portfast

spanning-tree bpduguard enable

spanning-tree portfast

This seems to have identified the problem, not sure what the solution is.

Regards

Nik

Muhammad Anser Khan · ‎12-22-2009

Can you provide two more outputs from both switches:

#sh spanning-tree vlan 20

#sh spanning-tree inconsistentports | inc 20

nik.sharp · ‎12-22-2009

Switch 1

#sh spann vlan 20

VLAN0020

Spanning tree enabled protocol rstp

Root ID Priority 24596

Address 0022.90a4.efc0

Cost 3

Port 642 (Port-channel2)

Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec

Bridge ID Priority 28692 (priority 28672 sys-id-ext 20)

Address 0022.55bf.4a00

Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec

Aging Time 300

Interface Role Sts Cost Prio.Nbr Type

---------------- ---- --- --------- -------- --------------------------------

Po2 Root FWD 3 128.642 P2p

Po3 Desg FWD 3 128.643 P2p

Po4 Desg FWD 3 128.644 P2p

#sh spann incon | inc 20

#

Switch 2

#sh spann vlan 20

VLAN0020

Spanning tree enabled protocol rstp

Root ID Priority 24596

Address 0022.90a4.efc0

This bridge is the root

Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec

Bridge ID Priority 24596 (priority 24576 sys-id-ext 20)

Address 0022.90a4.efc0

Hello Time 2 sec Max Age 12 sec Forward Delay 9 sec

Aging Time 300

Interface Role Sts Cost Prio.Nbr Type

---------------- ---- --- --------- -------- --------------------------------

Po2 Desg FWD 3 128.642 P2p

Po3 Desg FWD 3 128.643 P2p

Po4 Desg FWD 3 128.644 P2p

#

May not be able to pick up on this until 24th now so may be a time lag if more info is needed

sachinraja · ‎12-22-2009

Hi Nik

One thing i noticed was the core switch 1 has VTP server configured, and core switch 2 has VTP clients.. Is core 1 the only vtp server on the network ? with server-client vtp architectures, it is advisible to have both the core switches configured as servers, so that even if one goes down, you will have another VTP server where you can add VLANs , if required..

I think the HSRP messages are clear:

Active router is local

Standby router is unknown

As you said, since the hellos were lost, both the switches see themselves as ACTIVE... do you monitor the bandwidth on the trunk of the switches ? we have had similar cases, where hsrp hellos have failed because of high bandwidth on the l2 trunk, because of spanning tree loops.. even in that case, is vlan 20 the only vlan which lost connectivity after the switch came up ? btw, did you say switch 1 went down, and VLAN 20 wasnt reachable from switch 2 ? Just wanted to confirm the exact issue...

I think you should follow what Glen suggested on looking for spanning tree blockages on the network... Also do you see anything suspicious in "Show log" output from your 4500 switches?

Regards

Raj

glen.grant · ‎12-22-2009

Things to check . Do a "show stand brief" and see if each switch sees the other side . Which side is the active side ? Make sure all your spanning tree roots are on the hsrp active side with standby side as secondary . If there is a particular switch hung off these that is having a problem , check the root on that switch for vlan 20 and see if it is the correct uplink to the 4500's . On your lower switches do a show spanning tree blockedports and see if there are any blockports that maybe should not be blocked. Also in most of our implementations we use the preempt command on both sides in order to have it work correctly , I see it is off on one side of your config. Both sides think they are the active router so there is no L2 path between your 4500's either over the trunk between them or thru a access switch hung off the 4500's. I would look around and see if vlan 20 is blocked via spanning tree somewhere . I would also check to verify that vlan 20 is allowed on the trunk between the 4500's on both sides of that link .

nik.sharp · ‎02-02-2010

This problem has now been resolved by powering off both the core switches and then powering on Switch1 first, allowing it to come up completely, then bringing up Switch2. When this was done everthing started working correctly.

Thanks for all the suggestions and advice, much appreciated. Unfortunately, I will never know what was causing this.

Nik

sachinraja · ‎02-02-2010

Nik

Thanks for letting the group know.. its quite strange, but thats how some troubleshooting ends ;) Let us know if you have any more queries.

Raj

bret · ‎02-02-2010

Can you please post a copy of your show inter vlan 20 for both switches after the reboot.