loop

Antonio_1_2 · ‎04-13-2010

Hi,

in attachment is topology and depicted situation (with output of show commands).

Accidentely, static route that was supposed to be on PE1 was configured on PE2 and thus LOOP was created.

(next-hop of network 11.11.11.0/24 is 10.10.10.10 which is directly connected to PE1),

My question why only PE2's CPU is 100% utilised.? It is Cisco 7600 and it should work in hardware.

PE1, P1, P2 have 1% CPU utilisation.

Maybe it started to forward packets for this network 11.11.11.0/24 in proccess switching mode instead in CEF, but again why isn't it switched in CEF?

Does anyone have idea why is this happening?

(There are a lot of VPNs, static routes and traffic (in this and other VPNs) in my netwok and everything works fine, so MPLS network is properly configured.)

Thanks in advance,

A.

Giuseppe Larosa · ‎04-13-2010

Hello Antonio,

from the picture you have attached we can see that PE1 is sending traffic for 11.11.11.0 to PE2 and we can understand this from the label stack with 585 as internal label that is the same as local label on PE2.

PE2 uses recursion to resolve the next-hop and finds out that it has to send traffic to PE1 that advertises net 10.10.10.0/24 on MP BGP

so PE2 is the only one to advertise net 11.11.11.0/24 in MP BGP (advertised by BGP 65001 in sh ip route vrf VPN_1 11.11.11.0) but with a next-hop known in MP BGP actually iBGP from PE1 (from sh ip route vrf VPN_1 10.10.10.10 we see AD 200 with global IP next-hop 192.168.100.1)

Probably CEF detects the inconsistency and sends all packets to RP causing high cpu because stats are at 0.

The CEF entry exists.

I agree that I would expect a symmetric behaviour but PE2 is the one pointing to a far next-hop.

Have you had a chance to get a sh proc cpu sorted on PE2 before fixing the configuration error?

What IOS image is running on PE2 and PE1? is it the same or different ?

Are both devices configured the same way regarding MPLS TTL propagation?

traffic should be sent from PE2 to PE1 to PE2 until TTL expires and someone has to send an ICMP unreachable to source of original packet.

We can guess that the behaviuor is deterministic and given an initial TTL all packets will expire after N loops on same node.

May be PE2 is the node that is charged to send icmp with TTL expired for each packet.

or You may have hit a SW bug

Hope to help

Giuseppe

Antonio_1_2 · ‎04-15-2010

Hello Giuseppe,

"Probably CEF detects the inconsistency and sends all packets to RP causing high cpu because stats are at 0.The CEF entry exists".

I issued on PE2#sh ip cef inconsistency records detail
Consistency checker master control: enabled
Table consistency checker state:
lc-detect: enabled
0/0/0/0 queries sent/ignored/checked/iterated
scan-lc: enabled [83 prefixes checked every 60s]
0/0/0/0 queries sent/ignored/checked/iterated
scan-rp: enabled [83 prefixes checked every 60s]
583808/0/0/0 queries sent/ignored/checked/iterated
scan-rib: enabled [1000 prefixes checked every 60s]
1652900/0/1652900/0 queries sent/ignored/checked/iterated
scan-hw-sw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
scan-sw-hw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-rib: enabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-rp: enabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-lc: enabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-hw-sw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-sw-hw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
Inconsistency error messages are disabled
Inconsistency auto-repair is disabled
Inconsistency auto-repair runs: 0
Inconsistency statistics: 0 confirmed, 0/16 recorded
Table test modes:
Insert mode: normal

shouldn't it be shown here if there were any inconistances? If there is inconsistency with cef are then packets forwarded via proccess switching?

"Have you had a chance to get a sh proc cpu sorted on PE2 before fixing the configuration error"

yes I Have, before fixing configuration error:

sh proc cpu sorted
CPU utilization for five seconds: 48%/48%; one minute: 46%; five minutes: 40%
PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
26        3920     96958         40 0.07% 0.02% 0.00%   0 HC Counter Timer
   1           0        32          0 0.00% 0.00% 0.00%   0 Chunk Manager
   2         112     78569          1 0.00% 0.00% 0.00%   0 Load Meter
   3        4336    812792          5 0.00% 0.00% 0.00%   0 OSPF Router 1
   4           4       615          6 0.00% 0.00% 0.00%   0 TACACS+
   5     1089080     72851      14949 0.00% 0.33% 0.28%   0 Check heaps
   6           0         1          0 0.00% 0.00% 0.00%   0 Pool Manager
   7           0         2          0 0.00% 0.00% 0.00%   0 Timers
   8        9896     30906        320 0.00% 0.00% 0.00%   0 ARP Input
   9           0         1          0 0.00% 0.00% 0.00%   0 AAA_SERVER_DEADT
10           0         2          0 0.00% 0.00% 0.00%   0 AAA high-capacit
11          36        68        529 0.00% 0.00% 0.00%   0 Entity MIB API
12           0         1          0 0.00% 0.00% 0.00%   0 IFS Agent Manage
13           8      6549          1 0.00% 0.00% 0.00%   0 IPC Dynamic Cach
14           4        49         81 0.00% 0.00% 0.00%   0 PF_Split Sync Pr
15          76    392585          0 0.00% 0.00% 0.00%   0 IPC Periodic Tim
16         104    392583          0 0.00% 0.00% 0.00%   0 IPC Deferred Por
17      140212     13383      10476 0.00% 0.01% 0.00%   0 IPC Seat Manager
18           0         1          0 0.00% 0.00% 0.00%   0 IPC Stdby Update
19           0         2          0 0.00% 0.00% 0.00%   0 DDR Timers
20           0         2          0 0.00% 0.00% 0.00%   0 Dialer event

after fixing error it goes down to 1%/0%.

What IOS image is running on PE2 and PE1? is it the same or different ?

It is different PE1 is 12.2(18)SXF7 and on PE2 is 12.2(18)SXF.

"Are both devices configured the same way regarding MPLS TTL propagation?" yes on both routers is configred "no mpls ip propagate-ttl"

But I made some further testing. I made inverse configuration. This time I configured static rute on PE1 with nexthop that is connected to PE2, and now PE1 CPU goes to 100%.

I think I reproduced this also in lab, and if there is "mpls ip propagate-ttl" in configuration then router's CPU works fine, but if there is "no mpls ip propagate-ttl" then CPU utilisation goes up to 100%.

thank you Guiseppe,

A.