ā01-05-2019 04:10 AM - edited ā03-08-2019 04:57 PM
Hello
In the below debug output what is priority 135 & 167, why and How they used in GLBP AVF election.
What I know is weighting used in the election of GLBP AVF. ( in my case all devices having default weight)
But I couldn't understand clearly what the debug outputs says.
Please help someone.
GLBP: Fa0/1 8.1 Active: i/Hello rcvd from higher pri Active router (167/10.0.0.6)
GLBP: Fa0/1 8.3 Active: i/Hello rcvd from lower pri Active router (135/10.0.0.5)
Please find attached photo which shows debug output
Solved! Go to Solution.
ā01-05-2019 09:11 AM - edited ā01-05-2019 09:12 AM
Hello Sivam,
Now I understood your question, and it's really very good!
In fact, such a state (135/x.x.x.x<167/y.y.y.y)
is usually appears in the GLBP logs, and it's not representing the real configured priorities, whatever the priority you will configure, it will keep the same values in the debugging logs!
I have attached for you a a link with a capture file,
http://packetlife.net/captures/protocol/glbp/
please check it! It's for an entire GLBP operation for 3 routeres, routers 1, 2, and 3 that participate in a GLBP election. R1 becomes the AVG due to having the highest priority (200), and R3 becomes the standby GLBP. All three routers become AVFs.
The most important, when these logs appears and what does it mean?
let me tell you that the capture shows that GLBP uses two TLV (type length value) in the hello packets sent to multicast 224.0.0.102. The first TLV contains information related to the AVG which shows the real configured priority or 100 if it's by default like in your capture, and the second TLV contains information related to the AVF, by default for all the routers 167 once they enter the election. (so this is the first time that number "167" appears, IN THE ELECTION"
The second time this number and the second number 135 appear, is when the Active AVG or the Standby fails,
let me show you some debugging logs on 2 routers running GLBP, and the active router fail,
Note the 10.1.10.4 was the active while 10.1.10.3 was the standby and 10.1.10.2 was in the listen state!
From my own interpretation, 167 is given to the second TLV for all the routers at the begining and before the elections! once the election takes place, the second TLV will not be found in packets (periodic hello packets) at all! but once the active and/or the standby fail, election takes a place and these values appear once again but 135 for the standby and 167 for the active which reflects that after the first election the routers learned what the active and gave it a higher number 167 and what the standby and gave it a lower number!
The most critical, that whatever you change the configured priority, anyway the active one will take 167 while the standby one will take 135!
I know it's a little bit confusing, I was even confused from time to time during replying your good question!
I hope my reply is helpful enough so that I can get your helpful rate, also, please mark the reply as a solution if it solves your inquiry! It will be so nice from you!
Bst Rgds
Andrew Khalil
ā01-07-2019 11:03 AM - edited ā01-07-2019 11:12 AM
Hello @sivam siva,
Actually, now I can say, that I solved for you 100%, and I would like to thank you once again for raising this question back specially after your test and notes! as I wasn't accurate enough in my first reply but now and after a deep analyzing and debugging we can confirm that it's understood well!
Let me share with you how I have tested the GLBP process in order to find out an reasonable answers for your question, Let's consider the following topology (I have used factory reseted real 4*L3 Switches (C3640-IK9O3S-M), Version 12.4(13)):
The configuration that were done on SW1,2,3:
(config)#int f0/0
(config-if)#no switchport
(config-if)#ip add 10.1.1.X 255.255.255.0
(config-if)#glbp 1 ip 10.1.1.1
(config-if)#no glbp 1 preempt
(config-if)#no shutdown
where X is the octet of each interface IP address as per the topology diagram.
Now, let me telling you what was the chronological sequence I have used to examine and prove the idea behind the 2 values 167/135:
1- SW1 up and configuring it.
2- SW2 up and configuring it.
3- SW3 up and configuring it.
4- SW1 down
Now, and according to the attached logs that were taken from all the switches, we can interpret it into 10 steps as the following:
But before let me remind you that GLBP loads the balance on maximum 4 routers per the VG (group) , so it uses 4 VF (forwarders) , the group is labeled by "group_no.", here our group is 1 so it's labeled 1, while the forwarders here are 3 only as our topology has 3 only routers per the group, and since forwarders are labeled "group_no.1,2,3,4" so in our example it's labeled 1.1 and 1.2 and 1.3).
1- When SW1 is up and configured, it becomes: Active for VG1 (using the default priority as I didn't configure any priorities) and Active for VF1.1 (using the default priority for the VF which is 167). Note: at this moment there is not other VFs.
2- When SW2 is up and configured, it becomes : Standby for VG1 (using the default priority as I didn't configure any priorities, but you may wonder why it became Standby instead of Active although it has a higher IP address, the answer is not only because the preempt is disabled, but even if it was enabled, it would be the Standby, as SW1 was up first, and this exception happens only for the Active SW!) and Active for VF1.2 (using the default priority for the VF which is 167). Note: at this moment there is no other VFs than 1.1 and 1.2.
3- When SW3 is up and configured, it becomes: Standby for VG1 (as it has a higher IP address) and Active for VF1.3 (using the default priority for the VF which is 167), while the SW2 now became: Listen for VG1 and still Active for VF1.2.
4- When SW1 is down, SW2 which is in the Listen state notices first that there is no hellos from the Active AV1, it will move from the Listen to the Speak state and send this hello:
and then it notices that Active VF1.1 (SW1) also is expired, so it starts to move from the Listen to Active state, and send send one hello regarding the VF1.1 with priority 135 which means that this is not its distributed VF from the Active router, while it was a responsibility of a failed router and I will offer to serve it :
When SW3 get these messages, it responds by the following message:
Confirming that it learned before about this VF but with higher priority so it's not accepted!
So SW2 recieves this messages:
Then Once SW3 notices that there is no more hello from the SW1 as the hold time is expired, it starts to move from the Standby state to the Active one, and do the same exactly as SW2, which's offering it services to take care of VF1 by sending this messages
And as SW3 changed to Active, so the hold time of SW2 for waiting hello from the Standby expired, and again, it sends hello to change from Speak to Standby! and after being Standby, it keeps sending this hello:
Which means that it is standby for VG1 and primary Active for VF2.
While the Active VG1 will keep send this hello:
Which means that it is Active for VG1 and secondary Active for VF1 and primary Active for VF3.
Note that i have attached for you the debugging logs of each switch in each step! that you can correlate it together to check!
The conclusion, that initially the VFs assigned by priority 167, but when 1 of the routers fails, some other router should take care of the failed VF, so it takes it but marks it with a less priority!
I have tested to make SW3 fails, to see may I will get more values or not for the VF priority less than 135, but I have found that SW2 generated this message:
Which means that 167 means the primary VF assigned to the router, any 135 means an extra job it care of it due to failing of 1 or 2 or 3 or whatever!
That's it! ))))
I do apologize for the prolongation, but there was no any other way to explain it! and also for the late reply, as it took a long time for such testing procedures!
I hope I could explain it crystal clear so that you can get confident about the info..
Please, don't forget to rate EACH helpful response and to mark it as a solution, it will be so nice from you!!!
Bst Rgds,
Andrew Khalil
ā01-05-2019 04:31 AM
Hello,
not sure what you are asking exactly, but priority is used for AVG election (higest priority router becomes an active AVG)...AVF uses weight.
ā01-05-2019 04:44 AM
Hello Sivam,
Greetings,
In fact I couldn't understand your question!
But if you provide us the output of the #show glbp and #show glbp brief, maybe we can try to interpret for you the logs,
Usually each router in the GLBP group must send hellos to every other GLBP peer to prove its presence alive, so such hellos message that are appearing in the logs are normal,
while the AVG election takes a place using the highest priority or the highest IP address if there is no change in the priorities as you have mentioned!
Please provide us more info. so that we can support you!
Please don't forget to rate any helpful responses!
Bst Rgds,
Andrew Khalil
ā01-05-2019 07:15 AM
Hello
Thanks for the reply
Can you see the numbers 135 and 167 in the debug output?
GLBP priority is100,
But why we this numbers (135&167)
See the attached packet capture.
ā01-05-2019 09:11 AM - edited ā01-05-2019 09:12 AM
Hello Sivam,
Now I understood your question, and it's really very good!
In fact, such a state (135/x.x.x.x<167/y.y.y.y)
is usually appears in the GLBP logs, and it's not representing the real configured priorities, whatever the priority you will configure, it will keep the same values in the debugging logs!
I have attached for you a a link with a capture file,
http://packetlife.net/captures/protocol/glbp/
please check it! It's for an entire GLBP operation for 3 routeres, routers 1, 2, and 3 that participate in a GLBP election. R1 becomes the AVG due to having the highest priority (200), and R3 becomes the standby GLBP. All three routers become AVFs.
The most important, when these logs appears and what does it mean?
let me tell you that the capture shows that GLBP uses two TLV (type length value) in the hello packets sent to multicast 224.0.0.102. The first TLV contains information related to the AVG which shows the real configured priority or 100 if it's by default like in your capture, and the second TLV contains information related to the AVF, by default for all the routers 167 once they enter the election. (so this is the first time that number "167" appears, IN THE ELECTION"
The second time this number and the second number 135 appear, is when the Active AVG or the Standby fails,
let me show you some debugging logs on 2 routers running GLBP, and the active router fail,
Note the 10.1.10.4 was the active while 10.1.10.3 was the standby and 10.1.10.2 was in the listen state!
From my own interpretation, 167 is given to the second TLV for all the routers at the begining and before the elections! once the election takes place, the second TLV will not be found in packets (periodic hello packets) at all! but once the active and/or the standby fail, election takes a place and these values appear once again but 135 for the standby and 167 for the active which reflects that after the first election the routers learned what the active and gave it a higher number 167 and what the standby and gave it a lower number!
The most critical, that whatever you change the configured priority, anyway the active one will take 167 while the standby one will take 135!
I know it's a little bit confusing, I was even confused from time to time during replying your good question!
I hope my reply is helpful enough so that I can get your helpful rate, also, please mark the reply as a solution if it solves your inquiry! It will be so nice from you!
Bst Rgds
Andrew Khalil
ā01-05-2019 11:36 AM - edited ā01-05-2019 11:39 AM
Thank you very much for your excellent reply
ā01-05-2019 11:45 AM
Dear @sivam siva
You are welcome, in fact I should thank you because you took my attention to analyze the GLBP behavior, I will also keep analyzing for more time because there is no data or information covering such values 135/167!
Finally, I am happy to help you!
Bst Rgds,
Andrew Khalil
ā01-07-2019 02:11 AM - edited ā01-07-2019 02:14 AM
Hello @Andrew Khalil
Just to correct you! As per my test devices begins with 135 priority in type 2 (req/res) message.
That became 167 when the device took active role.
Finally all the active AVFs starts send Hello with 167.
have another doubt !!
In the beginning active router sends type 2(request/response) message for all AVF with 135 priority , when that message was received by Listening state routers I got the below debug output.
GLBP: Fa0/1 8.2 Listen: i/Hello rcvd from lower pri Active router (135/10.0.0.5)
My question is
1.How the Listen state router can have higher(167) priority ?
2.How Active router sends message with 135 priority ?
I tested by turning on all the devices simultaneously .Still its confusing
Help me on one more question
As i said , I turned on all routers " simultaneously ",
and also "one by one" in the next test ,but all the times same router became Backup AVF or secondary AVF so why it was not become active AVF even if i turned on that earlier than others ?, again i had doubt how AVF election happens when the weight was same and load balancing method was Round-Robin.
Hope you understand my Question,Please help someone.
ā01-07-2019 03:50 AM
Hello @Andrew Khalil
Just to correct you! As per my test devices begins with 135 priority in type 2 (req/res) message.
That became 167 when the device took active role.
Finally all the active AVFs starts send Hello with 167.
have another doubt !!
In the beginning active router sends type 2(request/response) message for all AVF with 135 priority , when that message was received by Listening state router I got the below debug output
GLBP: Fa0/1 8.3 Listen: i/Hello rcvd from lower pri Active router (135/10.0.0.5)
My question is
1.How the Listen state router can have higher(167) priority ?
2.How Active router sends message with 135 priority ?
I tested by turning on all the devices simultaneously. Still its confusing
Help me on one more question
As i said , i turned on all routers simultaneously
and also one by one in the next tests ,but all the times same router became Backup AVF or secondary AVF so why it was not become active AVF even if i turned on that earlier than others ?, again i had doubt how AVF election happens when the weight was same and load balancing method was Round-Robin.
Hope you understand ,please help someone
ā01-07-2019 04:34 AM
Hello @sivam siva
Happy to get from you more feedback!
Actually, all should send type with 167 value, it's so weird to send 135,
But let me now test once again using the same procedures as you mentioned, and then I will reply you back.
Bst Rgds,
Andrew Khalil
ā01-07-2019 06:14 AM - edited ā01-07-2019 06:24 AM
Thanks once again @Andrew Khalil
don't forget to execute "Debug glbp packets" & "Debug glbp events " command and "configure SPAN to capture packets while election", before do test so that you can analyze incoming and outgoing pockets.I know that you would have aware of this,just to suggest to you.
I did but got confused.
Unfortunately no one could explain in my institute. i am waiting for your reply.
ā01-07-2019 11:03 AM - edited ā01-07-2019 11:12 AM
Hello @sivam siva,
Actually, now I can say, that I solved for you 100%, and I would like to thank you once again for raising this question back specially after your test and notes! as I wasn't accurate enough in my first reply but now and after a deep analyzing and debugging we can confirm that it's understood well!
Let me share with you how I have tested the GLBP process in order to find out an reasonable answers for your question, Let's consider the following topology (I have used factory reseted real 4*L3 Switches (C3640-IK9O3S-M), Version 12.4(13)):
The configuration that were done on SW1,2,3:
(config)#int f0/0
(config-if)#no switchport
(config-if)#ip add 10.1.1.X 255.255.255.0
(config-if)#glbp 1 ip 10.1.1.1
(config-if)#no glbp 1 preempt
(config-if)#no shutdown
where X is the octet of each interface IP address as per the topology diagram.
Now, let me telling you what was the chronological sequence I have used to examine and prove the idea behind the 2 values 167/135:
1- SW1 up and configuring it.
2- SW2 up and configuring it.
3- SW3 up and configuring it.
4- SW1 down
Now, and according to the attached logs that were taken from all the switches, we can interpret it into 10 steps as the following:
But before let me remind you that GLBP loads the balance on maximum 4 routers per the VG (group) , so it uses 4 VF (forwarders) , the group is labeled by "group_no.", here our group is 1 so it's labeled 1, while the forwarders here are 3 only as our topology has 3 only routers per the group, and since forwarders are labeled "group_no.1,2,3,4" so in our example it's labeled 1.1 and 1.2 and 1.3).
1- When SW1 is up and configured, it becomes: Active for VG1 (using the default priority as I didn't configure any priorities) and Active for VF1.1 (using the default priority for the VF which is 167). Note: at this moment there is not other VFs.
2- When SW2 is up and configured, it becomes : Standby for VG1 (using the default priority as I didn't configure any priorities, but you may wonder why it became Standby instead of Active although it has a higher IP address, the answer is not only because the preempt is disabled, but even if it was enabled, it would be the Standby, as SW1 was up first, and this exception happens only for the Active SW!) and Active for VF1.2 (using the default priority for the VF which is 167). Note: at this moment there is no other VFs than 1.1 and 1.2.
3- When SW3 is up and configured, it becomes: Standby for VG1 (as it has a higher IP address) and Active for VF1.3 (using the default priority for the VF which is 167), while the SW2 now became: Listen for VG1 and still Active for VF1.2.
4- When SW1 is down, SW2 which is in the Listen state notices first that there is no hellos from the Active AV1, it will move from the Listen to the Speak state and send this hello:
and then it notices that Active VF1.1 (SW1) also is expired, so it starts to move from the Listen to Active state, and send send one hello regarding the VF1.1 with priority 135 which means that this is not its distributed VF from the Active router, while it was a responsibility of a failed router and I will offer to serve it :
When SW3 get these messages, it responds by the following message:
Confirming that it learned before about this VF but with higher priority so it's not accepted!
So SW2 recieves this messages:
Then Once SW3 notices that there is no more hello from the SW1 as the hold time is expired, it starts to move from the Standby state to the Active one, and do the same exactly as SW2, which's offering it services to take care of VF1 by sending this messages
And as SW3 changed to Active, so the hold time of SW2 for waiting hello from the Standby expired, and again, it sends hello to change from Speak to Standby! and after being Standby, it keeps sending this hello:
Which means that it is standby for VG1 and primary Active for VF2.
While the Active VG1 will keep send this hello:
Which means that it is Active for VG1 and secondary Active for VF1 and primary Active for VF3.
Note that i have attached for you the debugging logs of each switch in each step! that you can correlate it together to check!
The conclusion, that initially the VFs assigned by priority 167, but when 1 of the routers fails, some other router should take care of the failed VF, so it takes it but marks it with a less priority!
I have tested to make SW3 fails, to see may I will get more values or not for the VF priority less than 135, but I have found that SW2 generated this message:
Which means that 167 means the primary VF assigned to the router, any 135 means an extra job it care of it due to failing of 1 or 2 or 3 or whatever!
That's it! ))))
I do apologize for the prolongation, but there was no any other way to explain it! and also for the late reply, as it took a long time for such testing procedures!
I hope I could explain it crystal clear so that you can get confident about the info..
Please, don't forget to rate EACH helpful response and to mark it as a solution, it will be so nice from you!!!
Bst Rgds,
Andrew Khalil
ā01-10-2019 10:22 AM
Excellent
Glad to get reply from such a logical mind, like you
and i would be thankful for your reply
Have one little doubt !
As you said,first SW2 takes a VF1 (when it detects SW1 fails in terms of timer expire ) and sends out hello with 135
when SW3 receives it , decided 135 is less than what it learned for VF1(at this point VF1 is not expired in his cache), so its just ignored, then after SW3 came to know that 10.1.1.11 expired , its trying take active Role for both VG and VF1 , and it took also, Now both SW2 and SW3
Active for VF1 right ?
How SW2 has given the Active VF1 role to SW3 ?
ā01-12-2019 01:08 AM
@sivam siva, Thanks a lot for your kind words!
Again, I am the one who should thank you for your question!
you said "As you said,first SW2 takes a VF1 (when it detects SW1 fails in terms of timer expire ) and sends out hello with 135"
my reply: SW2 didn't take! it asked to take it but once it got the rejecting message from the SW3, so it realized that it's not allowed! at this moment there is no router active for VF1 YET!
I hope my reply is helpful enough to get your helpful rate and mark as a solution!
Thanks in advance!
Bst Rgds!
Andrew Khalil
ā01-14-2019 12:33 AM
Hello
Thanks for the reply ,i got the point.
Happy to tell you that i have found two more priority values in the GLBP ,
160- sends by Active VF when it goes down ( Similar to HSRP Resign message )
39- Sends by Active VF when its threshold level goes below lower value,(when weight goes down below lower threshold value )
please see the attached photo.have you noticed this ?
Even if the weight values are slight up and down in the different routers ,it load balance by using this priority value only , am i right ?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide