11-10-2011 07:08 AM
Dear friends,
I have stumbled across a different behavior of the neighbor send-label in BGP in IOS versions 12.4(24)T4 up to 12.4(24)T6 inclusive, and I wanted to ascertain whether it is a bug or just a new behavior I am not yet aware of.
Consider the following scenario: Router X, Y and Z are peered in BGP according to the exhibit. Router X is in AS 2, routers Y and Z are in AS 1. X/Y are peered using their physical interface addresses, routers Y/Z are peered using their loopback addresses. Each peering is duly configured with neighbor send-label.
The BGP configuration on router Y is as follows:
Y# show run | sec router bgp
router bgp 1
bgp log-neighbor-changes
neighbor 10.1.255.1 remote-as 1
neighbor 10.1.255.1 update-source Loopback0
neighbor 192.168.1.2 remote-as 2
!
address-family ipv4
redistribute ospf 1
neighbor 10.1.255.1 activate
neighbor 10.1.255.1 send-label
neighbor 192.168.1.2 activate
neighbor 192.168.1.2 send-label
no auto-summary
no synchronization
exit-address-family
Router Y is receiving a set of routes from X, in particular:
Y# show ip bgp regexp _2
BGP table version is 22, local router ID is 10.1.255.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.2.12.0/24 192.168.1.2 4 0 2 ?
*> 10.2.23.0/24 192.168.1.2 3 0 2 ?
*> 10.2.34.0/24 192.168.1.2 2 0 2 ?
*> 10.2.45.0/24 192.168.1.2 0 0 2 ?
*> 10.2.255.1/32 192.168.1.2 5 0 2 ?
*> 10.2.255.2/32 192.168.1.2 4 0 2 ?
*> 10.2.255.3/32 192.168.1.2 3 0 2 ?
*> 10.2.255.4/32 192.168.1.2 2 0 2 ?
*> 10.2.255.5/32 192.168.1.2 0 0 2 ?
The show ip bgp label on router Y, however, produces rather interesting results:
Y# show ip bgp labels
Network Next Hop In label/Out label
10.2.12.0/24 192.168.1.2 nolabel/16
10.2.23.0/24 192.168.1.2 nolabel/17
10.2.34.0/24 192.168.1.2 nolabel/18
10.2.45.0/24 192.168.1.2 nolabel/imp-null
10.2.255.1/32 192.168.1.2 nolabel/19
10.2.255.2/32 192.168.1.2 nolabel/20
10.2.255.3/32 192.168.1.2 nolabel/21
10.2.255.4/32 192.168.1.2 nolabel/22
10.2.255.5/32 192.168.1.2 nolabel/imp-null
Note that while the routes are being received with MPLS labels, the router Y does not seem to allocate any local label bindings to these labels although all these routes are being further advertised to router Z via iBGP.
On router Z, the results are also confusing. First of all, networks received from router Y are still learned with the original next-hop set to 192.168.1.2 instead of 10.1.255.5 (using send-label on router Y should imply next-hop-self):
Z# show ip route bgp
10.0.0.0/8 is variably subnetted, 18 subnets, 2 masks
B 10.2.12.0/24 [200/4] via 192.168.1.2, 00:26:28
B 10.2.23.0/24 [200/3] via 192.168.1.2, 00:26:28
B 10.2.45.0/24 [200/0] via 192.168.1.2, 00:26:28
B 10.2.34.0/24 [200/2] via 192.168.1.2, 00:26:28
B 10.2.255.5/32 [200/0] via 192.168.1.2, 00:26:28
B 10.2.255.4/32 [200/2] via 192.168.1.2, 00:26:28
B 10.2.255.3/32 [200/3] via 192.168.1.2, 00:26:28
B 10.2.255.2/32 [200/4] via 192.168.1.2, 00:26:28
B 10.2.255.1/32 [200/5] via 192.168.1.2, 00:26:28
Verifying the show ip bgp label on router Z shows another interesting behavior: although Y has claimed it has not allocated any labels itself, it has in fact advertised the eBGP routes to Z with the original labels as allocated by X (hence highlighted in the previous and current output):
Z# show ip bgp labels
Network Next Hop In label/Out label
10.2.12.0/24 192.168.1.2 nolabel/16
10.2.23.0/24 192.168.1.2 nolabel/17
10.2.34.0/24 192.168.1.2 nolabel/18
10.2.45.0/24 192.168.1.2 nolabel/imp-null
10.2.255.1/32 192.168.1.2 nolabel/19
10.2.255.2/32 192.168.1.2 nolabel/20
10.2.255.3/32 192.168.1.2 nolabel/21
10.2.255.4/32 192.168.1.2 nolabel/22
10.2.255.5/32 192.168.1.2 nolabel/imp-null
An ironic fact is that on router Y, the labels 16-22 are already allocated for different internal networks by LDP. If router Z uses the labels as advertised by router Y, this will cause the packets to be heavily misrouted from router Y to completely different destinations:
Y# show mpls forwarding-table
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or VC or Tunnel Id Switched interface
16 Pop Label 192.168.1.2/32 0 Fa0/0 192.168.1.2
17 Pop Label 10.1.255.4/32 0 Fa0/1 10.1.45.4
18 20 10.1.255.3/32 0 Fa0/1 10.1.45.4
19 19 10.1.255.2/32 0 Fa0/1 10.1.45.4
20 18 10.1.255.1/32 0 Fa0/1 10.1.45.4
21 16 10.1.12.0/24 0 Fa0/1 10.1.45.4
22 17 10.1.23.0/24 0 Fa0/1 10.1.45.4
So, there are two suspicious facts about the behavior of router Y:
An interesting fact is that after adding the command neighbor 10.1.255.1 next-hop-self to the router's Y configuration, the behavior becomes correct again:
Y(config)# router bgp 1
Y(config-router)# address-family ipv4
Y(config-router-af)# neighbor 10.1.255.1 next-hop-self
Y(config-router-af)# do show ip bgp label
Network Next Hop In label/Out label
10.2.12.0/24 192.168.1.2 24/16
10.2.23.0/24 192.168.1.2 25/17
10.2.34.0/24 192.168.1.2 31/18
10.2.45.0/24 192.168.1.2 27/imp-null
10.2.255.1/32 192.168.1.2 26/19
10.2.255.2/32 192.168.1.2 28/20
10.2.255.3/32 192.168.1.2 29/21
10.2.255.4/32 192.168.1.2 30/22
10.2.255.5/32 192.168.1.2 32/imp-null
On Z:
Z# show ip bgp labels
Network Next Hop In label/Out label
10.2.12.0/24 10.1.255.5 nolabel/24
10.2.23.0/24 10.1.255.5 nolabel/25
10.2.34.0/24 10.1.255.5 nolabel/31
10.2.45.0/24 10.1.255.5 nolabel/27
10.2.255.1/32 10.1.255.5 nolabel/26
10.2.255.2/32 10.1.255.5 nolabel/28
10.2.255.3/32 10.1.255.5 nolabel/29
10.2.255.4/32 10.1.255.5 nolabel/30
10.2.255.5/32 10.1.255.5 nolabel/32
Router Y is a 2811 currently running 2800 Software (C2800NM-ADVIPSERVICESK9-M), Version 12.4(24)T6. I have originally came across this behavior with 12.4(24)T4. I have confirmed that this behavior is not present with ADVENTERPRISEK9-M 12.4(22)T, so if this is a bug, it must have been "added" in some intermediate versions.
I currently do not have any option of testing newer IOSes from the 15.x series, as the router does not have the inordinate 512MB of RAM necessary for those IOS versions so I apologize for not testing this behavior on the most recent releases.
Did anyone experience similar behavior? Is this really a bug? Will this be corrected in 12.4T train yet? Thank you for all suggestions!
Best regards,
Peter
Solved! Go to Solution.
11-17-2011 12:30 AM
Hello Peter,
thanks for your kind remarks.
BGP should be used for inter AS scenarios or for scalable Carrier supporting Carrier scenarios.
I explored the first in more depth for studies and as a possible migration solution for merging two networks.
RFC 3107 is the first about labeled BGP and explains that:
the label is integral part of a new type of NLRI in MP BGP with SAFI=4
multiple labels can be carried each taking a 3 octects field in the labeled NLRI
As I have written in my previous post the job for BGP is to join LSP segments that are created in each AS, this may require the use of more labels ( more depth in the label stack) so that PE loopbacks of provider A are seen in provider B network via ASBR of provider B. So all LSPs with destination PE nodes of provider A are pushed into the LSP with destination the ASBR of provider B ( from this the increase in the label stack depth) that can be built by LDP or RSVP TE just to say.
ASBR nodes are required to perform non trivial label swap operations that can change also the label stack depth. They may need to change two labels at once for example.
From the fact that the label is integral part of the NLRI it comes that it can be modified only when the BGP next-hop attribute is changed. This is the way the implementation has been designed, because it is what is needed.
As you have noted each node has its own label space and propagating the RX label choices to RZ is not a good job indeed as RY label choices are clearly different.
I agree with Riccardo the behaviour is now correct in IOS.
Proposal:
A warning message could be added when configuring neigh send-label to remind of the need of next-hop-self as when we put an interface under a VRF we are reminded that the IP address will be removed.
Hope to help
Giuseppe
12-05-2011 05:56 AM
Hi Peter,
I just got the confirmation that the behavior you see now is the correct one and of course RFC3107 confirms it.
The ratio behind it is that when a LSR assigns label it starts 'attracting' traffic towards the prefixes it assigned the labels for as it is advertising that it is in the path. That is the reason why you do not configure next-hop-self on RRs otherwise they will attract all the traffic in the network (bringing it to its knees) as they should not be in the traffic path.
So the implicit next-hop-self behavior you previously saw is indeed buggy.
By the way also in IOS-XR the default behaviour has now changed and we do need to expressely configure next-hop-self if we want the LSR to assign local labels to prefixes.
The internal bug which introduced it is "CSCtk53821 BGP IAS functionality now requires explicit next-hop-self config"
regards,
Riccardo
11-11-2011 03:04 AM
Hi Peter,
What a fantastic problem description you got us!!
I wish I got similar ones when I was in the TAC!!!
Anyway I have the impression you got one of those 'gray areas' for which a given behavior is consistent (and apparently correct) across various releases until it turns out that it is not expected. I think this is your case also.
From my research I found out that:
1. 'using send-label on router Y should imply next-hop-self' (hence on iBGP sessions) it was not the wanted behavior for IOS even though it is the expected behavior on IOS-XR (I could not confirm this myself but I just found this statement on an email exchange between BGP developers).
2. Apparently IOS for a prolonged time on different IOS releases had this behavior, even though it was not documented anywhere (By the way did you find it documented somewhere?).
3. Various internal bugs addressed the issue from different perspectives; all of them were finally duplicated into an external one whose release notes are quite incomplete and misleading (I gotta admit that...) as they only mention CSC scenarios whereas the issue affects various BGP implementations.
Among the internal bugs the following have in their titles already part of the solution:
CSCsi18597 IPv4+labels: router always does next-hop-self when send-label is enabled
CSCsq49865 missing labels in mpls forwarding table
CSCsu33177 TEA bgp next hop will be broken after neighor send-label config
I have to mention them since, as I wrote, the external one which fixed the issue is quite misleading.
Here it is anyway:
CSCek55668 bgp next hop will be broken after neighor send-label config
In conclusion I think that 12.4(24)T4 and 12.4(24)T6 simply have the new and correct behavior which is not to have next-hop-self implicitely enabled on iBGP sessions. If you want it you have to expressely configure it (as it is apparently happening in your case).
In previous releases (lots of train affected) the next-hop-self was somehow implicit as you noticed. From what I see in the bug notes 15.0 and 15.1 in their latest rebuilds have the 'new' and correct behavior.
Let's see if some other BGP expert has anything to add to this or let me know if you have comments on this.
Riccardo
11-11-2011 05:44 AM
Hello Riccardo,
Thank you for your informative reply, I appreciate that immensely!
Whether send-label should imply next-hop-self - I do not remember seeing it stipulated in any official Cisco documentation but the books about MPLS I've read take it for granted. In any case, the behavior as seen in older IOSes (send-label implying next-hop-self) is generally wished for: it makes sure that the particular LSP towards an appropriate BGP ASBR is chosen and prevents from possible premature PHP-ing the topmost transport label. Things can work both with and without the next-hop-self-implied behavior, although implying it increases the chances of the MPLS labeling work properly. Whether this or that way, I wish that it was clearly described in the documentation that the behavior is being changed so that all IOSes are going to behave identically.
What is more grave, however, is the part with the label mappings. In my example, router Y did not create any local label bindings to received eBGP routes. When Y subsequently advertised the networks to router Z, it merely reused, i.e. copied, the label values as received from router Z, in effect confusing the outgoing tag values with incoming tag values. This is an outright incorrect behavior: the same incoming label values on router Y's LFIB already correspond to different destinations, and result in traffic being misrouted and blackholed. Why configuring the next-hop-self on router Y corrected these label bindings is beyond my comprehension - a particular modification of a next-hop attribute should have no influence on local assignment of labels!
Sadly, I do not have a TAC contract so I can not submit this as a bug to investigate.
Thank you once more, Riccardo, and to anyone willing to share his/her views on this issue!
Best regards,
Peter
11-11-2011 09:16 AM
Hi Peter,
on Monday I will ask a BGP guru to have a look at this.
Riccardo
11-11-2011 03:27 PM
Riccardo,
Thank you so much! I will be eagerly watching this thread for any new information. Thanks again, your help is very, very much appreciated!
Best regards,
Peter
11-14-2011 05:30 AM
Hello Peter,
>>
Why configuring the
next-hop-self
on router Y corrected these label bindings is beyond my comprehension - a particular modification of a next-hop attribute should have no influence on local assignment of labels!
because it matches with an MPLS LSP segment that starts on the ASBR router RY in your case.
with labeled BGP you cannot cover the swap labels operation. This is triggered/emulated by the change of the next-hop
I have given a look at my tests on BGP with labels and I was using next-hop self towards iBGP neighbors my devices were C7200 and C7500
it is interesting to see that some OS corrections may break our habits as explained by Simone.
Hope to help
Giuseppe
11-15-2011 04:00 AM
Hello Giuseppe,
Thank you very much for your answer. I am not sure I understand it correctly - please let me reexplain my major point and let me ask you for your kind advice.
Issue 1:
All routers are configured with send-label, neither of them is configured with next-hop-self. Router Y receives labeled BGP routes from router X and the show ip bgp labels displays the following table:
Y# show ip bgp labels
Network Next Hop In label/Out label
10.2.12.0/24 192.168.1.2 nolabel/16
10.2.23.0/24 192.168.1.2 nolabel/17
10.2.34.0/24 192.168.1.2 nolabel/18
10.2.45.0/24 192.168.1.2 nolabel/imp-null
10.2.255.1/32 192.168.1.2 nolabel/19
10.2.255.2/32 192.168.1.2 nolabel/20
10.2.255.3/32 192.168.1.2 nolabel/21
10.2.255.4/32 192.168.1.2 nolabel/22
10.2.255.5/32 192.168.1.2 nolabel/imp-null
Note that while router Y knows remote bindings for these networks (the "Out label" column), it has not created any local label bindings to these networks (the "In label" column says nolabel to all networks). I can assume that this is done to prevent assigning local labels to BGP routes that may eventually be routed through a different ASBR and possibly misunderstood en route. In other words, the local label binding has a local significance only. If there is no guarantee the packets will go through Y (without the next-hop-self), local label bindings on Y should not be created nor advertised. Am I correct in this line of reasoning?
Issue 2:
With the same configuration, router Y has advertised the BGP routes to router Z, however, it has retained the same label bindings it has learned itself - i.e. Y has not created any local bindings itself, it just "forgot" to remove the label bindings when advertising the routes to router Z:
Z# show ip bgp labels
Network Next Hop In label/Out label
10.2.12.0/24 192.168.1.2 nolabel/16
10.2.23.0/24 192.168.1.2 nolabel/17
10.2.34.0/24 192.168.1.2 nolabel/18
10.2.45.0/24 192.168.1.2 nolabel/imp-null
10.2.255.1/32 192.168.1.2 nolabel/19
10.2.255.2/32 192.168.1.2 nolabel/20
10.2.255.3/32 192.168.1.2 nolabel/21
10.2.255.4/32 192.168.1.2 nolabel/22
10.2.255.5/32 192.168.1.2 nolabel/imp-null
Note that the outgoing labels on Z are exactly the same as with router Y. This is in my opinion a bug. Take, for example, the route towards 10.2.255.2. The bottom label will be 20, the upper label will be a label towards 192.168.1.2. In my particular topology, the PHP will pop this transport label correctly before the router Y, and Y will receive a packet labeled with label 20. However, on Y, the 20 is not a mapping assigned to the 10.255.255.2, as BGP has not created any local bindings itself, and instead, the label 20 corresponds to a totally different network somewhere inside the cloud between routers Y and Z, as evidenced by the following output on Y:
Y# show mpls forwarding-table
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or VC or Tunnel Id Switched interface
16 Pop Label 192.168.1.2/32 0 Fa0/0 192.168.1.2
17 Pop Label 10.1.255.4/32 0 Fa0/1 10.1.45.4
18 20 10.1.255.3/32 0 Fa0/1 10.1.45.4
19 19 10.1.255.2/32 0 Fa0/1 10.1.45.4
20 18 10.1.255.1/32 0 Fa0/1 10.1.45.4
21 16 10.1.12.0/24 0 Fa0/1 10.1.45.4
22 17 10.1.23.0/24 0 Fa0/1 10.1.45.4
So the mere fact that the BGP on Y did not create local bindings is kind of understandable, however, the fact that it retained the remote label bindings as learned from X and advertised them without change to Z is, in my opinion, a grave bug. What is your opinion on this?
Thank you very much!
Best regards,
Peter
11-17-2011 12:30 AM
Hello Peter,
thanks for your kind remarks.
BGP should be used for inter AS scenarios or for scalable Carrier supporting Carrier scenarios.
I explored the first in more depth for studies and as a possible migration solution for merging two networks.
RFC 3107 is the first about labeled BGP and explains that:
the label is integral part of a new type of NLRI in MP BGP with SAFI=4
multiple labels can be carried each taking a 3 octects field in the labeled NLRI
As I have written in my previous post the job for BGP is to join LSP segments that are created in each AS, this may require the use of more labels ( more depth in the label stack) so that PE loopbacks of provider A are seen in provider B network via ASBR of provider B. So all LSPs with destination PE nodes of provider A are pushed into the LSP with destination the ASBR of provider B ( from this the increase in the label stack depth) that can be built by LDP or RSVP TE just to say.
ASBR nodes are required to perform non trivial label swap operations that can change also the label stack depth. They may need to change two labels at once for example.
From the fact that the label is integral part of the NLRI it comes that it can be modified only when the BGP next-hop attribute is changed. This is the way the implementation has been designed, because it is what is needed.
As you have noted each node has its own label space and propagating the RX label choices to RZ is not a good job indeed as RY label choices are clearly different.
I agree with Riccardo the behaviour is now correct in IOS.
Proposal:
A warning message could be added when configuring neigh send-label to remind of the need of next-hop-self as when we put an interface under a VRF we are reminded that the IP address will be removed.
Hope to help
Giuseppe
11-29-2011 02:43 AM
Hello Giuseppe,
I apologize for replying lately. I have read the RFC 3107 and found some indications that corroborate your point of view. Namely, the Section 3 mandates:
The label(s) specified for a particular route (and associated with its address prefix) must be assigned by the LSR which is identified by the value of the Next Hop attribute of the route. When a BGP speaker redistributes a route, the label(s) assigned to that route must not be changed (except by omission), unless the speaker changes the value of the Next Hop attribute of the route.
These paragraphs say that the labels are valid with respect to the LSR identified by the Next Hop attribute, and the labels may not be changed unless
I agree with Riccardo the behaviour is now correct in IOS.
I must honestly say that I do not think at all that the behavior is now correct - because in general, this can not work properly. You see, what I object to is router Y simply keeping the labels as received from router X when advertising routes to Z. In fact, this would work only if the router X and Y were peered in eBGP using their loopback addresses, an action that further complicates the inter-AS peering (the need to create static routes between X and Y to mutually reach these loopbacks, the need to redistribute them into IGPs of the corresponding ASes).
Imagine that you activated a BGP peer in address-family vpnv4 but the command neighbor send-community extended would not be added automatically, contrary to the IOS behavior. Understandably, the exchange of VPNv4 prefixes would then be impossible because without extended communities, it would be impossible to carry the sets of RTs with each VPNv4 route. I believe that this behavior would be highly objectionable. In the same way, in my opinion, allowing a neighbor to receive labeled routes without implying next-hop-self is strongly objectionable - because apart from specific scenarios, this configuration will behave incorrectly.
Best regards,
Peter
12-05-2011 05:56 AM
Hi Peter,
I just got the confirmation that the behavior you see now is the correct one and of course RFC3107 confirms it.
The ratio behind it is that when a LSR assigns label it starts 'attracting' traffic towards the prefixes it assigned the labels for as it is advertising that it is in the path. That is the reason why you do not configure next-hop-self on RRs otherwise they will attract all the traffic in the network (bringing it to its knees) as they should not be in the traffic path.
So the implicit next-hop-self behavior you previously saw is indeed buggy.
By the way also in IOS-XR the default behaviour has now changed and we do need to expressely configure next-hop-self if we want the LSR to assign local labels to prefixes.
The internal bug which introduced it is "CSCtk53821 BGP IAS functionality now requires explicit next-hop-self config"
regards,
Riccardo
12-07-2011 10:05 AM
Hello Riccardo,
I had to think things over at least twice Huge thanks to you, Luc and Giuseppe for not being swayed by my (usually) persuasive arguments. I seem to finally get the idea behind the entire stuff and appreciate the logic in what you and Giuseppe told me. Thank you very much!
I now see the flaw in my logic: I assumed incorrectly that just because a BGP router advertises labeled routes, the labels must be assigned by the advertising router and are related to it. Wrong! The labels are related to the router identified by the NEXT_HOP attribute as it is this router that originated the label mappings in the first place, and other BGP routers may simply be relaying these labeled routes. Unless a BGP speaker changes the NEXT_HOP to itself, it is not allowed to modify the label mappings. Knowing when to remove the labels and advertise pure IPv4 networks is too difficult to perform reliably, so no surprise that my router Y simply relayed the labeled routes to router Z without removing the labels.
Once again, huge, huge thanks to you, Riccardo and Giuseppe!
Best regards,
Peter
12-07-2011 10:52 PM
Hi all,
I had big issue with Next-hop and send-label togather from the ASBR to the route-reflector on the ibgp ipv4 peering and some how the send-label not forwarding the labels and I have to look for different solution. This is nearly 4 years back.
regards,
Skanda
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide