04-13-2010 05:09 AM
Hi,
in attachment is topology and depicted situation (with output of show commands).
Accidentely, static route that was supposed to be on PE1 was configured on PE2 and thus LOOP was created.
(next-hop of network 11.11.11.0/24 is 10.10.10.10 which is directly connected to PE1),
My question why only PE2's CPU is 100% utilised.? It is Cisco 7600 and it should work in hardware.
PE1, P1, P2 have 1% CPU utilisation.
Maybe it started to forward packets for this network 11.11.11.0/24 in proccess switching mode instead in CEF, but again why isn't it switched in CEF?
Does anyone have idea why is this happening?
(There are a lot of VPNs, static routes and traffic (in this and other VPNs) in my netwok and everything works fine, so MPLS network is properly configured.)
Thanks in advance,
A.
04-13-2010 02:45 PM
Hello Antonio,
from the picture you have attached we can see that PE1 is sending traffic for 11.11.11.0 to PE2 and we can understand this from the label stack with 585 as internal label that is the same as local label on PE2.
PE2 uses recursion to resolve the next-hop and finds out that it has to send traffic to PE1 that advertises net 10.10.10.0/24 on MP BGP
so PE2 is the only one to advertise net 11.11.11.0/24 in MP BGP (advertised by BGP 65001 in sh ip route vrf VPN_1 11.11.11.0) but with a next-hop known in MP BGP actually iBGP from PE1 (from sh ip route vrf VPN_1 10.10.10.10 we see AD 200 with global IP next-hop 192.168.100.1)
Probably CEF detects the inconsistency and sends all packets to RP causing high cpu because stats are at 0.
The CEF entry exists.
I agree that I would expect a symmetric behaviour but PE2 is the one pointing to a far next-hop.
Have you had a chance to get a sh proc cpu sorted on PE2 before fixing the configuration error?
What IOS image is running on PE2 and PE1? is it the same or different ?
Are both devices configured the same way regarding MPLS TTL propagation?
traffic should be sent from PE2 to PE1 to PE2 until TTL expires and someone has to send an ICMP unreachable to source of original packet.
We can guess that the behaviuor is deterministic and given an initial TTL all packets will expire after N loops on same node.
May be PE2 is the node that is charged to send icmp with TTL expired for each packet.
or You may have hit a SW bug
Hope to help
Giuseppe
04-15-2010 05:38 AM
Hello Giuseppe,
"Probably CEF detects the inconsistency and sends all packets to RP causing high cpu because stats are at 0.The CEF entry exists".
I issued on PE2#sh ip cef inconsistency records detail
Consistency checker master control: enabled
Table consistency checker state:
lc-detect: enabled
0/0/0/0 queries sent/ignored/checked/iterated
scan-lc: enabled [83 prefixes checked every 60s]
0/0/0/0 queries sent/ignored/checked/iterated
scan-rp: enabled [83 prefixes checked every 60s]
583808/0/0/0 queries sent/ignored/checked/iterated
scan-rib: enabled [1000 prefixes checked every 60s]
1652900/0/1652900/0 queries sent/ignored/checked/iterated
scan-hw-sw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
scan-sw-hw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-rib: enabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-rp: enabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-lc: enabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-hw-sw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
full-scan-sw-hw: disabled
0/0/0/0 queries sent/ignored/checked/iterated
Inconsistency error messages are disabled
Inconsistency auto-repair is disabled
Inconsistency auto-repair runs: 0
Inconsistency statistics: 0 confirmed, 0/16 recorded
Table test modes:
Insert mode: normal
shouldn't it be shown here if there were any inconistances? If there is inconsistency with cef are then packets forwarded via proccess switching?
"Have you had a chance to get a sh proc cpu sorted on PE2 before fixing the configuration error"
yes I Have, before fixing configuration error:
sh proc cpu sorted
CPU utilization for five seconds: 48%/48%; one minute: 46%; five minutes: 40%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
26 3920 96958 40 0.07% 0.02% 0.00% 0 HC Counter Timer
1 0 32 0 0.00% 0.00% 0.00% 0 Chunk Manager
2 112 78569 1 0.00% 0.00% 0.00% 0 Load Meter
3 4336 812792 5 0.00% 0.00% 0.00% 0 OSPF Router 1
4 4 615 6 0.00% 0.00% 0.00% 0 TACACS+
5 1089080 72851 14949 0.00% 0.33% 0.28% 0 Check heaps
6 0 1 0 0.00% 0.00% 0.00% 0 Pool Manager
7 0 2 0 0.00% 0.00% 0.00% 0 Timers
8 9896 30906 320 0.00% 0.00% 0.00% 0 ARP Input
9 0 1 0 0.00% 0.00% 0.00% 0 AAA_SERVER_DEADT
10 0 2 0 0.00% 0.00% 0.00% 0 AAA high-capacit
11 36 68 529 0.00% 0.00% 0.00% 0 Entity MIB API
12 0 1 0 0.00% 0.00% 0.00% 0 IFS Agent Manage
13 8 6549 1 0.00% 0.00% 0.00% 0 IPC Dynamic Cach
14 4 49 81 0.00% 0.00% 0.00% 0 PF_Split Sync Pr
15 76 392585 0 0.00% 0.00% 0.00% 0 IPC Periodic Tim
16 104 392583 0 0.00% 0.00% 0.00% 0 IPC Deferred Por
17 140212 13383 10476 0.00% 0.01% 0.00% 0 IPC Seat Manager
18 0 1 0 0.00% 0.00% 0.00% 0 IPC Stdby Update
19 0 2 0 0.00% 0.00% 0.00% 0 DDR Timers
20 0 2 0 0.00% 0.00% 0.00% 0 Dialer event
after fixing error it goes down to 1%/0%.
What IOS image is running on PE2 and PE1? is it the same or different ?
It is different PE1 is 12.2(18)SXF7 and on PE2 is 12.2(18)SXF.
"Are both devices configured the same way regarding MPLS TTL propagation?" yes on both routers is configred "no mpls ip propagate-ttl"
But I made some further testing. I made inverse configuration. This time I configured static rute on PE1 with nexthop that is connected to PE2, and now PE1 CPU goes to 100%.
I think I reproduced this also in lab, and if there is "mpls ip propagate-ttl" in configuration then router's CPU works fine, but if there is "no mpls ip propagate-ttl" then CPU utilisation goes up to 100%.
thank you Guiseppe,
A.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide