Thanks for the reply. No I

cer43tcent · ‎10-15-2015

Hello.I have two 6500 chassis.

Core A has VS-S720-10G running ADVENTERPRISEK9_WAN-M Version 12.2(33)SXI12

Core B has VS-SUP2T-10G running s2t54-IPBASEK9-M), Version 15.1(2)SY1

They are running hsrp between them. Before I took preemption off Core B a VLAN looked like such between the two

CoreA

vlan3

standby 3 preempt

standby 3 priority 120

CoreB

vlan3

standby 3 preempt

standby 3 priority 100

I was going to originally ask has anyone seen this successfully work? However, while posting in about 5 seconds Core B went active and then Core A regained active again. I'm starting to think its an IOS bug on Core A???

Zach S · ‎10-15-2015

As far as HSRP is concerned you'd only want preempt on your CoreA switch. If it goes down, and CoreB becomes active, then preempt on A will allow it to take back the active status. If B goes down, then A will be active and stay active and when B comes back up nothing will change.

CoreB shouldn't try to take active status with a lower priority. even if it has preempt configured, that would point to configuration issues somewhere else than your basic standby config here. Do you have any tracks for decrementing priority?

cer43tcent · ‎10-15-2015

Thanks for the reply.

No I don't have any tracks configured.

A little more detail into the situation. As you can see the cores differ in SUP modules as well as a couple of others as we upgraded modules on only Core A recently and it also has a different IOS. Prior to this upgrade they were identical as far as hardware and IOS.

What you think? I'll do a show run all and see if any differences in hsrp stand out.

Zach S · ‎10-15-2015

The difference in supervisors shouldn't cause any issues. HSRP is a Cisco protocol, so you would figure that it's standard across Cisco devices. They should be multicasting hellos to each other stating HSRP information.

Are the two switches directly connected to each other?

cer43tcent · ‎10-15-2015

I would like to add I've noticed in CoreA's log each time this random hsrp state change occurs (its happened 3 times now) that the message before the state changes begin differs

1st time

Oct 15 07:41:50: %SYS-2-MALLOCFAIL: Memory allocation of 18228 bytes failed from 0x482E6D0, alignment 32
Pool: I/O Free: 240 Cause: Not enough free memory
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Pool Manager", ipl= 0, pid= 10
-Traceback= 6BFF7B4z 6C17EC4z 482E6D4z 6C31AE8z 6C31D60z 50BEBE0z 50B8474z

2nd and 3rd time

Oct 15 12:03:41: %SSH-4-SSH2_UNEXPECTED_MSG: Unexpected message type has arrived. Terminating the connection from x.x.x.x <--this is my PC IP address
Oct 15 12:04:48: %ICC-3-BUFFER_FAIL: Failed to get buffer

cer43tcent · ‎10-15-2015

last reply for now I noticed I had the IOSs flipped flopped

CoreA has the latest and Core B has the older

Peter Paluch · ‎10-15-2015

Dear friends,

Please allow me to join and share a couple of comments.

Zach, you have written: "As far as HSRP is concerned you'd only want preempt on your CoreA switch.". Your analysis of what would happen if the switches were configured as you suggested is of course correct. However, personally, I am a staunch opponent of having a mix of preempt and non-preempt HSRP routers in a single standby group. Having all routers in a standby group configured with preemption ensures that the behavior is deterministic and predictable. Having only a subset of routers configured with preemption complicates the understanding of how the group exactly reacts to changes in routers' presence or priorities, and in certain cases, the resulting state is determined by chance - certainly an unpleasant thing to have. In addition, if there was object tracking configured on the switches then the preemption would be mandatory. It is noteworthy to mention that in VRRP, preemption is on by default, and it's a good thing.

To cer43tcent: The SYS-2-MALLOCFAIL and ICC-3-BUFFER_FAIL logging messages suggest that you are low on memory. Either it has been exhausted by running processes, or your device is running so long without a reboot that the memory became extremely fragmented (unused blocks interspersed with allocated blocks), and none of the free blocks is large enough to accomodate the needs of the process that calls for additional contiguous memory space. Unfortunately, the only thing to remedy this is to reload the device (IOS does not have a concept of virtual memory and has no ability to defragment, or consolidate, the memory conents). It is quite possible that the HSRP flaps are caused by the problems with the IOS getting a free memory block for the HSRP process.

Best regards,
Peter

Jon Marshall · ‎10-15-2015

Hi Peter

Can you give an example of where not having preemption on both routers leads to the state being determined by chance ?

I am not referring to where you are tracking an interface because as you rightly say it is needed there.

Not trying to start/win an argument just interested in case there are certain scenarios I am unaware of.

Completely agree about memory issues.

Jon

Peter Paluch · ‎10-15-2015

Hi Jon,

Can you give an example of where not having preemption on both routers leads to the state being determined by chance ?

It would need to involve more than two routers, obviously. Assume you have three routers: R1 (150, P), R2 (120, NP), R3 (100, NP) - the first number in the parentheses is the priority, the second letter(s) refer to the router being Preempt or NotPreempt.

Assume that R1 is disconnected for a prolonged time. If R2 and R3 booted roughly in the same time, R2 would became Active and R3 would become standby. However, if R3 boots up significantly sooner than R3, then the roles will become reversed: R3 will be Active, and R2 will be Standby, and this is the temporal dependency I do not like about it.

If R1 comes in after these roles have been established, it will retake the Active role because it's configured for preemption. Now, the Standby role has to be established again, but because the Standby role is always preemptible, R2 will always become the Standby. This would be true even if the order of booting was: first R3, then, after long enough, R1, and then finally R2. This goes against the expected behavior of Active whose preemptibility can be configured, and adds to the confusion.

So I would argue that when looking at a configuration you should only see what needs to be there and anything else just leads to confusion.

Ordinarily, I agree myself that I do not like seeing commands in the configuration just for the sake of them being there. But notice that here, it is not really about commands that could make the configuration more complex and confusing, but rather about their lack that leads to different devices using different rules to determine their behavior - and that's what's confusing to me. If I know that all routers behave according to the same logic and there are no differences to their decision process, it is more consistent and comprehensible to me than the need of keeping in mind that some devices are followig a different rule than the others, and trying to view the network from each device's perspective independently.

I am not imposing myself here - it's my personal feeling, and I absolutely understand that others may feel differently.

Best regards,
Peter

Jon Marshall · ‎10-15-2015

Hi Peter

Thanks for that and you are right in what you say.

I was only thinking of a pair of L3 switches because that is really all I have ever used with HSRP and completely overlooked running more than two L3 devices.

Makes perfect sense in that scenario and in fact would make more sense seen from the configuration point of view as well ie. you know which router is going to be active at any one time.

Jon

Peter Paluch · ‎10-15-2015

Hi Jon,

Thank you very much for the rating - and as always, this has been a sheer pleasure :)

Best regards,
Peter

HSRP and Preempt