04-25-2014 05:52 PM
I have a simple ring network with 4 3600Xs with IP/MPLS 10 gig backbone between all units (with OSPF running in the core). Per the 3600 design guide I turned on IPFRR under OSPF for fast reroute of traffic around faults. I have a l3vpn on the 3600s that I'm using to test. The FRR works quite well when the repair route is a ECMP (equal cost multipath) route, I don't even notice an interruption in ping between l3vpn sites when an 'active' link goes down.
The issue arises when the repair route is a remote-LFA (loop free alternative) MPLS tunnel. I've done a few tests, and the failover time when the repair route is a remote LFA tunnel is the same as when FRR isn't turned on at all, it's just the normal route convergence time and there is a significant traffic interruption (as compared to FRR when an ECMP route is the repair route).
The thing is I'm not quite sure how even to diagnose this. I was thinking that maybe the remote FLA tunnel was using the link that failed, so it in essence was 'down' as well, hence the traffic interruption as routing fully converged. But I looked at the remote-LFA interfaces, and as much as I understand them they are taking the right path out of the router anyway (that is, away from the link that would fail in order to activate the remote-LFA route).
Are there any resources or tips to help troubleshoot why these remote-LFA tunnel repair routes don't seem to be working well?
04-28-2014 10:05 AM
Hi,
rLFA while identifying the PQ node (to which LDP tunnel will be established) will make sure that the tunnel is not going over the link to be protected. SO I dont think that is the reason.
Do you see the back path installed in RIB/FIB table?. Just to make sure it is not taking the failing link. Can you try and check if the path to reach the backup tunnel is not over the failing link?.
-Nagendra
04-28-2014 12:23 PM
Thanks for the reply Nagendra. When you ask if I've seen the back path installed in RIB/FIB, I'm not exactly sure what you mean. I do see repair paths referncing remote LFAs on both the 3600 that would be the source and the destination of the test traffic. Like this:
* 172.16.0.3, from 10.10.10.3, 01:55:50 ago, via TenGigabitEthernet0/2
Route metric is 2, traffic share count is 1
Repair Path: 10.10.10.4, via MPLS-Remote-Lfa40
and on the other router:
* 172.16.0.2, from 10.10.10.1, 01:56:34 ago, via TenGigabitEthernet0/1
Route metric is 2, traffic share count is 1
Repair Path: 10.10.10.2, via MPLS-Remote-Lfa32
If you're looking for some specific command output, let me know.
04-28-2014 03:43 PM
Hi,
Can you check if 10.10.10.4 is using Teng0/2 as egress interface?. (and 10.10.10.2 using Teng0/1)?.
If not, then it is expected behaviour and programming as expected.
Few common mis understanding I have seen in the past from different people are below:
1. Applied rLFA on one direction (which will work fine) while no rLFA for return traffic. So it will create an impression that it waits for convergence (as it is the case for return packet).
2. How to simulate the link failure?. Interface Shut on router where you want to trigger LFA?.
-Nagendra
04-29-2014 09:32 AM
Nagendra, I'm not completely sure what you are asking, but 10.10.10.4 does use t0/2 t get to 10.10.10.3 and 10.10.10.2 usues t0/1 to get to 10.10.10.1.
1. It does appear that there are rLFA, and proper ones, in both directions, so I don't think this is the problem
2. I am physically pulling the cable to simulate a link down
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide