04-29-2019 03:45 PM
Experts
Please I need some help with this:
Experts
Please I need some help with this:
I have a Cisco 6509 (CORE 1) that is connected to another switch 6509 (CORE 2), but on March 14 the adjacency with the other networks was lost, so all the neighbors in EIGRP were down for a while and went back up, a TAC case was opened where it was indicated that it was possibly due to a high CPU usage, for which they recommended applying an EEM script to capture the outputs when the CPU is again in high, it was observed during the previous analysis that the switch as such had not received the hello packet from each of its neighbors within the default time, because they were lost in the middle or just did not arrive, what call our attention was that the equipment lost all the neighbors, which are established by different interfaces, which makes them infer that the switch did receive hello packets, but they were not processed by the CPU, causing neighborhoods to fluctuate. " They reviewed the "show tech" file of the equipment that presented the EIGRP flapping and mentioned that the only thing that could cause the problem with are some CPU spikes as they could see. If the CPU is high, the switch will not be able to handle the EIGRP packets properly.
The problem arose again and a new TAC case was opened where they mentioned that the first difference they saw is that the other 6509 (CORE 2) equipment to which it is connected is protected at (CPU) control plane level, also noted that only the 6509 (CORE 1) was affected, even when running the same IOS; so it would make sense to apply it on 6509 (CORE 1).
Everything indicated was applied and still the problem persists, thanks in advance if someone could guide me with this problem
Log
“%DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.26 (GigabitEthernet2/24) is down: holding time expired
%DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.22 (GigabitEthernet2/23) is down: holding time expired
%DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.254 (Port-channel10) is down: holding time expired
%DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.206 (Port-channel108) is up: new adjacency…..
Mar 14 07:53:48: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.206 (Port-channel108) is down: holding time expired
Mar 14 07:53:48: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.218 (GigabitEthernet2/12) is down: holding time expired
Mar 14 07:53:52: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.218 (GigabitEthernet2/12) is up: new adjacency
Mar 14 07:54:01: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Standby -> Active
Mar 14 07:54:01: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Active -> Speak
Mar 14 07:54:05: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.202 (Port-channel107) is down: holding time expired
Mar 14 07:54:05: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.2 (GigabitEthernet1/23) is down: holding time expired
Mar 14 07:54:06: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.2 (GigabitEthernet1/23) is up: new adjacency
Mar 14 07:54:11: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Speak -> Standby
Mar 14 07:54:12: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Standby -> Active
Mar 14 07:54:17: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Active -> Speak
Mar 14 07:54:21: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.2 (GigabitEthernet1/23) is down: holding time expired
Mar 14 07:54:22: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.26 (GigabitEthernet2/24) is down: holding time expired
Mar 14 07:54:26: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.2 (GigabitEthernet1/23) is up: new adjacency
Mar 14 07:54:28: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.22 (GigabitEthernet2/23) is down: peer restarted
Mar 14 07:54:29: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Speak -> Standby
Mar 14 07:54:31: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.6 (GigabitEthernet1/24) is down: peer restarted
Mar 14 07:54:32: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.14 (GigabitEthernet2/22) is down: peer restarted
Mar 14 07:54:37: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.22 (GigabitEthernet2/23) is up: new adjacency
Mar 14 07:54:40: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Standby -> Active
Mar 14 07:54:41: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.2 (GigabitEthernet1/23) is down: holding time expired
Mar 14 07:54:43: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Active -> Speak
Mar 14 07:54:43: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.202 (Port-channel107) is up: new adjacency
Mar 14 07:54:46: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.85 (Port-channel103) is down: peer restarted
Mar 14 07:54:49: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.218 (GigabitEthernet2/12) is down: holding time expired
Mar 14 07:54:49: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.26 (GigabitEthernet2/24) is up: new adjacency
Mar 14 07:54:51: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.2 (GigabitEthernet1/23) is up: new adjacency
Mar 14 07:54:52: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.206 (Port-channel108) is up: new adjacency
Mar 14 07:54:54: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Speak -> Standby
Mar 14 07:54:59: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.254 (Port-channel10) is down: holding time expired
Mar 14 07:55:00: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.218 (GigabitEthernet2/12) is up: new adjacency
Mar 14 07:55:01: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.6 (GigabitEthernet1/24) is up: new adjacency
Mar 14 07:55:07: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.206 (Port-channel108) is down: holding time expired
Mar 14 07:55:16: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.6 (GigabitEthernet1/24) is down: holding time expired
Mar 14 07:55:18: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.22 (GigabitEthernet2/23) is down: holding time expired
Mar 14 07:55:18: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.14 (GigabitEthernet2/22) is up: new adjacency
Mar 14 07:55:20: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.6 (GigabitEthernet1/24) is up: new adjacency
Mar 14 07:55:21: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.26 (GigabitEthernet2/24) is down: holding time expired
Mar 14 07:55:22: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Standby -> Active
Mar 14 07:55:22: %HSRP-5-STATECHANGE: Vlan498 Grp 198 state Active -> Speak
Mar 14 07:55:22: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.22 (GigabitEthernet2/23) is up: new adjacency
Mar 14 07:55:22: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.254 (Port-channel10) is up: new adjacency
Mar 14 07:55:22: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.85 (Port-channel103) is up: new adjacency
Mar 14 07:55:23: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.10 (GigabitEthernet1/22) is down: peer restarted
Mar 14 07:55:24: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.26 (GigabitEthernet2/24) is up: new adjacency
Mar 14 07:55:26: %SEC-6-IPACCESSLOGS: list 99 denied 10.135.197.13 68 packets
Mar 14 07:55:26: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.0.206 (Port-channel108) is up: new adjacency
Mar 14 07:55:27: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.199.1.10 (GigabitEthernet1/22) is up: new adjacency
6509_ER_CORE1#sh processes cpu history
11 111111
1122222333332222266666666668888855555333332222233333000001
100
90
80
70
60
50
40
30
20
10 ** ******************** ******
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)
1 111111111 111 1111 11111111111111 111 111111111 111111
1874406642395439954407432207862197327534973334033338333005
100
90
80
70
60
50
40
30
20 * ** * * *** ** * *
10 #*#########*####*####*####*####*####*####*###**####*####*#
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
1556292122122212211212621671222221122222222222226222222222222222212122
9582462911933083277088128017225128918132032111011012007011224700362933
100 *
90 *
80 *
70 * *
60 *** * * ** *
50 *** * * ** *
40 *** * * ** *
30 *** * ** ** * * * * *
20 **********************************************************************
10 ######################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%
EEM Recomended
event manager applet HIGH_CPU
event syslog pattern "%DUAL-5-NBRCHANGE: EIGRP-IPv4"
action 1.01 syslog msg "High CPU DETECTED: Total CPU Utilization is over 90%"
action 1.02 cli command "enable"
action 1.03 cli command "term length 0"
action 1.04 cli command "debug netdr cap rx"
action 1.05 cli command "show netdr cap | append sup-bootdisk:HIGH_CPU.txt"
action 1.06 cli command "show proc cpu sort | append sup-bootdisk:HIGH_CPU.txt"
action 1.07 cli command "Show users | append sup-bootdisk:HIGH_CPU.txt"
action 1.08 cli command "Show proc cpu history | append sup-bootdisk:HIGH_CPU.txt"
action 1.09 cli command "show logging | append sup-bootdisk:HIGH_CPU.txt"
action 1.10 cli command "show spanning-tree detail | append sup-bootdisk:HIGH_CPU.txt"
action 1.11 cli command "show ip traffic | append sup-bootdisk:HIGH_CPU.txt"
action 1.12 cli command "show clock | append sup-bootdisk:HIGH_CPU.txt"
action 1.13 cli command "undebug all"
action 1.14 cli command "term length 24"
action 1.15 cli command "exit"
Control Plane Config in 6509 (CORE 2)
policy-map copp-system-policy
class copp-system-class-undesirable
police cir 32000 conform-action transmit exceed-action transmit
class copp-system-class-critical
police cir 1024000 conform-action transmit exceed-action transmit
class copp-system-class-important
police cir 1024000 conform-action transmit exceed-action transmit
class copp-system-class-management
police cir 1024000 conform-action transmit exceed-action transmit
class copp-system-class-normal
police cir 512000 conform-action transmit exceed-action transmit
class copp-system-class-monitoring
police cir 128000 conform-action transmit exceed-action transmit
class copp-system-rest-ip
police cir 32000 conform-action transmit exceed-action transmit
class class-default
police cir 64000 conform-action transmit exceed-action transmit
!
class-map match-any copp-system-class-important
match access-group name copp-system-acl-hsrp class-map match-any copp-system-rest-ip
match access-group name copp-system-rest-ip class-map match-any copp-system-class-undesirable
match access-group name copp-system-acl-undesirable class-map match-any copp-system-class-critical
match access-group name copp-system-acl-eigrp class-map match-any copp-system-class-monitoring
match access-group name copp-system-acl-icmp
match access-group name copp-system-acl-traceroute class-map match-any copp-system-class-management
match access-group name copp-system-acl-tacacs
match access-group name copp-system-acl-ntp
match access-group name copp-system-acl-ftp
match access-group name copp-system-acl-tftp
match access-group name copp-system-acl-snmp
match access-group name copp-system-acl-ssh
match access-group name copp-system-acl-telnet class-map match-any copp-system-class-normal
match protocol arp
!
control-plane
service-policy input copp-system-policy !
6509_ER_CORE2
!
mls qos
04-30-2019 12:13 AM
Hello,
your 'copp-system-class-important', which includes the 'copp-system-acl-eigrp' access list, has a lot of potential traffic. I would suggest to match this ACL in the 'class copp-system-class-critical'.
Also, make sure that the EIGRP ACL looks like below:
access-list 120 permit eigrp any <router receive block>
access-list 120 permit eigrp any host 224.0.0.10
05-08-2019 07:04 AM
Thank you very much for your prompt response. I will try to apply that and I will let you know. However, I am researching about the newer releases of IOS, do you think it would be appropriate to update it to a more recent version and maybe solve the problem ?, if you need more details, logs or something else to help me please let me know.
Regards.
04-30-2019 01:04 AM - edited 04-30-2019 06:39 AM
Hello
Expired hold time suggests possible L2 issues, So adding to what TAC have stated regards your cpu utilization you could try increasing the eigrp hold time to accommodate the above as/when the cpu spikes occur.
If you perform pings with a small packet trace than a larger one do you receive any error?
ping <eigrp neighbors> size 100
ping <eigrp neighbors> size 1400
ping 224.0.0.10 size 100
ping 224.0.0.10 size 1400
05-08-2019 07:05 AM
Thank you very much Paul for your prompt response. I will try to apply that and I will let you know. However, I am researching about the newer releases of IOS, do you think it would be appropriate to update it to a more recent version and maybe solve the problem ?, if you need more details, logs or something else to help me please let me know.
Regards.
05-08-2019 10:50 AM
Hello
@igzcollado wrote:
Thank you very much Paul for your prompt response. I will try to apply that and I will let you know. However, I am researching about the newer releases of IOS, do you think it would be appropriate to update it to a more recent version and maybe solve the problem ?,
Absolutely make sense, manipulation of the hold timings was a temporary suggestion, if you can upgrade and its viable to to so i would definitely go with TAC's suggestion.
05-08-2019 11:52 AM
Thank you very much for your quick reply, although the TAC did not recommend Upgrade, that's why I consulting you if it is viable, in such a case, which software version can you suggest for the upgrade?, this is the current version: Cisco IOS Software , s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2 (33) SXI13, RELEASE SOFTWARE (fc3).
Thanks in advance.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide