cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2387
Views
5
Helpful
7
Replies

Cisco 6509-E 99% CPU Usage

Himanshu Malani
Level 1
Level 1

Hi,

We are running Cisco 6509-e and we are running load test and when traffic reach 80 mbps switch start reponding very slow. I checked CPU usage and it was using 100% and connection to the switch from outside to inside are 80K. once connection dropp Cisco release CUP and it start responding normal.

CPU utilization for five seconds: 99%/17%; one minute: 84%; five minutes: 74%

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

IP NAT Ager using very high CPU.

I am running

sup-bootflash:s72033-ipservicesk9-mz.122-33.SXH8.bin

Regards,

HM

7 Replies 7

InayathUlla Sharieff
Cisco Employee
Cisco Employee

Hi Himanshu,

CPU will be higher on the Supervisor as there are some functions of NAT that

require CPU in order to build a flow. NAT is stored in the Netflow portion

of the TCAM. In order to program NAT into hardware the first packet of

_every_ new flow has to be process-switched. Likewise when a flow is aged

out you will see some CPU utilization at "IP NAT Ager" process, which is the

process that is run to age out the old NAT entries. So when timers are set

to something low like 5 mins, you may run into a situation where we see a

lot of process-switching and a lot of NAT aging.

We can raise the timers to a higher timeout value and IF you aren't constantly building

new entries the CPU usgae should go down. I am not quite sure what timeouts you have configured on the device.

eg:-

ip nat translation timeout 3600

ip nat translation tcp-timeout 3600

ip nat translation finrst-timeout 3600

ip nat translation syn-timeout 3600

ip nat translation icmp-timeout 3600

Also kinldy have a check at the limitation of TCAM on your Sup installed on the device if that is exceeding you see the cpu to go high.

HTH

Regards

Inayath

Hi Inayath,

Thank you so much for the information. We are getting approximately 80000 connection and it create very long nat list so how we can handel this hugh IP NAT list. Can we  bring down timeout for NAT Translation to low.

HM

Hi Inayath,

Please check following limit of TCAM.

CISCO#show tcam counts

           Used        Free        Percent Used       Reserved

           ----        ----        ------------       --------

Labels:(in)  4        4092            0

Labels:(eg)  3        4093            0

ACL_TCAM

--------

  Masks:    124        3972            3                    72

Entries:    128       32640            0                   576

QOS_TCAM

--------

  Masks:     22        4074            0                    18

Entries:     22       32746            0                   144

    LOU:      0         128            0

  ANDOR:      0          16            0

  ORAND:      0          16            0

    ADJ:      3        2045            0

Thank You,

HM

Hi Himanshu,

Let me try to explain you this in details. Please spare some time and go through this and you will understand how this NAT will be impacted.

Feature Manager Troubleshooting

On the sup720 when a feature is configured it needs to be determined if this feature is supported by the hardware.  This is accomplished by Feature Manager.  Feature  Manager is a software entity that keeps track of all features that  have  impact on hardware forwarding, and will resolve conflicts as well as share hardware resources among multiple interfaces.

If a feature cannot be programmed due to a conflict with another feature OR due to the fact that it is not supported in hardware FM will display this information and generate a message to the syslog.

ACL and Netflow TCAM:

Throughout  this document ACL TCAM and Netflow TCAM will be referenced.  Each of  these are independent of each other and are stored on different  locations on the PFC.

ACL TCAM is the location where features are blocked/allowed or redirected,  based on the configuration of the switch.  This is stored on the PFC and  has a finite number of available entires for information to be stored.   You can read about ACL TCAM on the 6500 more in depth via the following  link:

http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper09186a00800c9470.shtml

Neflow TCAM is the location where features such as NDE/NAT/WCCP program their forwarding  information.  These features need to punt the first packet of every flow  in order to program entires into Netflow TCAM.   This will hold the  adjancey for the flow installed in order to forward this traffic in  hardware.

FM and ACL merge:

FM is needed because there is only one ACL lookup per-direction on the 6500.  If   traffic needs to be processed by two different features, that need to   take different paths in hardware, the traffic will need to be software   switched to accomplish this task.

An example of merging two ACLs based features: RACL (Routed ACL) and PBR (Policy-Based Routing), the logic is:

– If RACL ACE drops the packet then PBR is not concerned, i.e. merged action is DROP

– If RACL ACE permits the packet and PBR ACE doesn’t match this packet then merged action is PERMIT

– If RACL ACE permits the packet and PBR ACE does match this packet then merged action is REDIRECT (for PBR)

Below is a chart explaining further explaning this logic:

Above you can see the merge result when a RACL and PBR match the same traffic.

Common FM issues:

When troubleshooting FM, there are two common issues that are typically seen:

  • Feature conflicts: certain features impose conflicting requirements on hardware and cannot coexist in HW for example: NAT & PBR.
  • Merge failure of two particular ACLs created by a feature are proportional to the size of each ACL.  This may result to an ACL  bigger than TCAM capacity.  This is commonly referred to as a merge explosion.

Below I will go through how to look at ACL TCAM programmed on a layer 3 interface as well as how to find and resolve a feature conflisct  as well as TCAM overutilization.

Looking at ACL TCAM when a Feature is configured:

When a feature is configured on an interface the sup720 may need to adjust the ACL TCAM on the interface in order to satisfy the requirement for that feature.

We can view how ACL TCAM is currently programmed with the “show tcam interface” command, as seen below:

6500-2#show run interface vlan 10

Building configuration...

Current configuration : 85 bytes

!

interface Vlan10

ip address 10.100.101.1 255.255.255.0

end

6500-2#show tcam interface vlan 10 acl in ip

* Global Defaults shared

Entries from Bank 0

Entries from Bank 1

    permit       ip any any

Above  we can see that all traffic is being forward in hardware due to the  "permit ip any any" programmed. This works by the same logic as an ACL, where this will permit all traffic passing through this interface.   Since we have not features programmed on this interface on ingress this is expected.

We can also see that since we have no features programmed on egress that nothing will be seen in ACL TCAM:

6500-2#show tcam interface vlan 10 acl out ip

Now if we applied the following access-list with a log keyword on input we will see the following:

6500-2#sh access-list 1

Standard IP access list 1

    10 permit 10.10.10.20 log

    20 permit any

6500-2#sh run int vlan 10

Building configuration...

Current configuration: 85 bytes

!

interface Vlan10

ip address 10.100.101.1 255.255.255.0

ip access-group 1 in

end

6500-2#sh tcam interface vlan 10 acl in ip

* Global Defaults shared

Entries from Bank 0

Entries from Bank 1

    punt         ip host 10.10.10.20 any

    permit       ip any any

Now we can see a punt installed for any traffic from host 10.10.10.20, due to the log keyword on input.  The punt represents traffic that needs to be sent to the RP CPU.    In this case, all this traffic that is sources from 10.10.10.20 will be sent to the RP CPU to be processed switched.

Keep in mind that this is only configured on ingress.

Under normal operation we should not see any punt adjacencies installed in ACL TCAM on the interface.  A punt implies that the operation cannot be performed in hardware and must be punted to the RP CPU in order to be software switched.


**Note   that there are times when you will see a punt in ACL TCAM, but a Netflow entry has been installed overrriding this punt.   If traffic matches both an entry in ACL TCAM and a Netflow entry, the Netflow entry takes precendence. This may  occur with netflow assisted features which need to punt the first  packet of  every flow to create a netflow entry


If we configured a deny statement for this IP on ACL 1 we would see a   "deny ip any any" installed irrepsective to the log keyword.  This can  be seen below:

6500-2#sh access-list 1

Standard IP access list 1

     5  deny   10.10.10.0 0.0.0.255

    10 permit 10.10.10.20 log

    20 permit any

6500-2#sh run int vlan 10

Building configuration...

Current configuration: 85 bytes

!

interface Vlan10

ip address 10.100.101.1 255.255.255.0

ip access-group 1 in

end

6500-2#sh tcam interface vlan 10 acl in ip

* Global Defaults shared

Entries from Bank 0

Entries from Bank 1

    deny         ip host 10.10.10.0 0.0.0.255 any

    permit       ip any any

Note that only a deny statement was installed.  This is due to the fact  there is no reason to install duplicate entries in hardware.  The deny  for the network of 10.10.10.0/24 also encompasses the permit statement  for the 10.10.10.20 entry, hence the log statement is not installed in  hardware to save space.

Looking at a feature configured on an interface:

Lets take a look at the same interface and see what FM (Feature Manager) says about the features configured on this interface:

6500-2#sh access-list 1

Standard IP access list 1

    10 permit 10.10.10.20 log

    20 permit any

6500-2#sh run int vlan 10

Building configuration...

Current configuration : 85 bytes

!

interface Vlan10

ip address 10.100.101.1 255.255.255.0

ip access-group 1 in

end

Lets take a look at FM via FIE (Feature Interaction engine) with an ACL configured.  FIE is the component of Feature Manager which implements the feature onto an interface.

6500-2#sh fm fie interface vlan 10

Interface Vl10:

Feature interaction state created: Yes

Flowmask conflict status for protocol IP : FIE_FLOWMASK_STATUS_SUCCESS

Flowmask conflict status for protocol OTHER : FIE_FLOWMASK_STATUS_SUCCESS

Interface Vl10 [Ingress]:

Slot(s) using the protocol IP : 2

FIE Result for protocol IP : FIE_SUCCESS_NO_CONFLICT

Features Configured : RACL   - Protocol : IP

FM Label when FIE was invoked : 52

Current FM Label : 52

Last Merge is for slot: 0

Features in Bank2 = RACL

+-------------------------------------+

            Action Merge Table

+-------------------------------------+

   RACL               RSLT              R_RSLT            COL

+-------------------------------------+

   L2R              L2R               P                 0          

   SB               HB                P                 0          

   HB               HB                P                 0          

   L3D              L3D               L3D               0          

   P                P                 P                 0          

+-------------------------------------+

num# of strategies tried : 1

Description of merging strategy used:

  Serialized Banks: FALSE

  Bank1 Only Features: [empty]

  Bank2 Only Features: [empty]

  Banks Swappable: TRUE

Merge Algorithm: ODM

  num# of merged VMRs in bank 1 = 0

  num# of free TCAM entries in Bank1 = 32730

  num# of merged VMRs in bank 2 = 2

  num# of free TCAM entries in Bank2 = 32758

Slot(s) using the protocol OTHER : 2

FIE Result for protocol OTHER : FIE_SUCCESS_NO_CONFLICT

Features Configured : OTH_DEF   - Protocol : OTHER

FM Label when FIE was invoked : 52

Current FM Label : 52

Last Merge is for slot: 0

Features in Bank2 = OTH_DEF

+-------------------------------------+

            Action Merge Table

+-------------------------------------+

   OTH_DEF            RSLT              R_RSLT            COL

+-------------------------------------+

   SB           HB                P                 0          

   X                P                 P                 0          

+-------------------------------------+

num# of strategies tried : 1

Description of merging strategy used:

  Serialized Banks: FALSE

  Bank1 Only Features: [empty]

  Bank2 Only Features: [empty]

  Banks Swappable: TRUE

Merge Algorithm: ODM

  num# of merged VMRs in bank 1 = 0

  num# of free TCAM entries in Bank1 = 32730

  num# of merged VMRs in bank 2 = 1

  num# of free TCAM entries in Bank2 = 32757

Interface Vl10 [Egress]:

No Features Configured

No IP Guardian Feature Configured

No IPv6 Guardian Feature Configured

No QoS Feature Configured

Above we can see that the feature “RACL” (Routed ACL) is configured as ingress feature. We can see that this has invoked FM label 52 to be applied to this interface.  The   FM label is dynamically selected and used as a pointer.  This is done   so that the same label can be applied to multiple interfaces.  This  allows the 6500 to only have to determine the merge required for the same features one time.

We can also see the IP Guardian has not been triggered.  IP Guardian is a portion of feature manager that determines what flowmask is needed on an interface for the features configured.  This is used when netflow assistance is needed for a feature.  It will also protect from having incompatible features programmed into hardware.

Some features require specific flowmasks in order to be put into hardware. This can be seen at the following link:

http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SXF/native/configuration/guide/netflow.html#wp1132875

If no flowmask is configured, the 6500 will dynamically choose the appropriate flowmask via IP Guardian.  If a flow mask is configured using the "mls ip flow " command, the 6500 will only use this mask for features.

You can look at the specific label with command "show fm label <label #>", which will show us where this label is applied.


6500-2#sh fm label 52
Label 52:
  Hardware state is Not Reduced
  Force merge is FALSE
  Protocol number 0:
     Protocol switching is enabled
     Configured features:
        FM_GUARDIAN (ingress)
        IPV4 Default Result Feature (ingress)
  Protocol number 2:
     Protocol switching is enabled
     Configured features:
        OTHER Default Result Feature (ingress)
  Interfaces (I/E = Ingress/Egress; * = associate pending)
     I     Vlan10

In this case we can see that this label is only applied to interface Vlan 10.

Troubleshooting a Feature Conflict:

A  common issue with FM is having a netflow based feature and an ACL based feature matching the same traffic, configured on the same interface.  Keep in mind that not all ACL/Netflow combinations will cause an issue, such as a security ACL and NAT.  However this would be an issue with NAT and PBR, due to the fact that PBR needs to redirect this traffc on a different path.

The problem is traffic cannot be sent on two different paths in hardware, as the sup720 only makes one pass through ACL TCAM per-direction(ingress/egress).  We can not send the traffic down one path for netflow creation and one for the dection made by the ACL otherwise duplicate traffic would occur.   The  following is an example how this would look if you had NAT (netflow  assisted) and PBR (ACL based) configured to match the same traffic on the same interface.

Below  we can see that NAT and PBR are matching the same set of traffic.   We know that this will not work due to the fact that the traffic can not  both be punted to create a netflow entry AND be redirected to specific  hop via PBR.

ip nat pool test 10.10.101.3 10.10.101.4 prefix-length 24

ip nat inside source list 7 pool test

access-list 7 permit 10.10.10.0 0.0.0.255

route-map PBR permit 10

match ip address 1

set ip next-hop 10.10.101.1

!

access-list 1 permit 10.10.10.0 0.0.0.255

!

ip route 0.0.0.0 0.0.0.0 10.10.101.10

!

interface Vlan10

ip address 10.100.101.1 255.255.255.0

ip nat inside

ip policy route-map PBR

!

interface Vlan20

ip address 10.10.101.2 255.255.255.0

ip nat outside

end

Lets take a look at FM on SVI 10 to see how this would be displayed when both PBR and NAT  are applied:

6500-2#show fm fie interface vlan 10

Interface Vl10:

Feature interaction state created: Yes

Flowmask conflict status for protocol IP : FIE_FLOWMASK_STATUS_SUCCESS

Flowmask conflict status for protocol OTHER : FIE_FLOWMASK_STATUS_SUCCESS

Interface Vl10 [Ingress]:

Slot(s) using the protocol IP : 2

FIE Result for protocol IP : FIE_SUCCESS_NO_CONFLICT

Features Configured : PBR   - Protocol : IP

FM Label when FIE was invoked : 52

Current FM Label : 52

Last Merge is for slot: 0

Features in Bank1 = PBR

+-------------------------------------+

            Action Merge Table

+-------------------------------------+

   PBR                RSLT              R_RSLT            COL

+-------------------------------------+

   SB               HB                P                 0          

   AdR              AdR               P                 0          

   X                P                 P                 0          

+-------------------------------------+

num# of strategies tried : 1

Description of merging strategy used:

  Serialized Banks: TRUE

Merge Algorithm: ODM

  num# of merged VMRs in bank 1 = 12

  num# of free TCAM entries in Bank1 = Unknown

  num# of merged VMRs in bank 2 = 0

  num# of free TCAM entries in Bank2 = Unknown

Slot(s) using the protocol OTHER : 2

FIE Result for protocol OTHER : FIE_SUCCESS_NO_CONFLICT

Features Configured : OTH_DEF   - Protocol : OTHER

FM Label when FIE was invoked : 52

Current FM Label : 52

Last Merge is for slot: 0

Features in Bank2 = OTH_DEF

+-------------------------------------+

            Action Merge Table

+-------------------------------------+

   OTH_DEF            RSLT              R_RSLT            COL

+-------------------------------------+

   SB               HB                P                 0          

   X                P                 P                 0          

+-------------------------------------+

num# of strategies tried : 1

Description of merging strategy used:

  Serialized Banks: FALSE

  Bank1 Only Features: [empty]

  Bank2 Only Features: [empty]

  Banks Swappable: TRUE

Merge Algorithm: ODM

  num# of merged VMRs in bank 1 = 0

  num# of free TCAM entries in Bank1 = 32730

  num# of merged VMRs in bank 2 = 1

  num# of free TCAM entries in Bank2 = 32760

Interface Vl10 [Egress]:

No Features Configured

IP Guardian Feature Configured

Current IP Guardian flowmask : Intf Full Flow

FIE guardian interaction : Done

FIE guardian interaction result : SUCCESS

FIE guardian flowmask conflict status : FIE_FLOWMASK_NO_CONFLICT

num# of features that bump this interface : 1

features that bump this interface : NAT

No IPv6 Guardian Feature Configured

No QoS Feature Configured

Above  we can see that FM label of 52 was selected and that PBR is only listed  as a feature programmed in hardware.  If we look further down, we   can see the IP guardian has selected to use the “Interface Full Flow”   flowmask.  This is a requirment for NAT to work correct in hardware.  We can also see that NAT is listed as a cause for the interface to be bumped.   This siginifies that traffic matching this feature needs to be sent to the RP  CPU to either be software switched OR  for a the creation of a netflow entry in  hardware.

For the creation of a netflow entry to occur the first packet  in every flow needs to be punted, after which it will match on the  netflow entry.

***Note  netflow always takes precedence to whatever is configured in ACL TCAM.  Typically a “redirect” in hardware will demonstrate this traffic is being pushed to the netflow path.  However, this could also be seen a punt.  Please check for a netflow entry for the traffic if a “punt “ is installed and the feature is a netflow assisted feature.  An example of this is demonstrated below.

Lets also take a look at what the TCAM would look like with these features configured.

6500-2#sh tcam int vlan 10 acl in ip

* Global Defaults shared

Entries from Bank 0

Entries from Bank 1

    redirect     ip any any fragments

    redirect     ip any any fragments

    redirect     ip any any

    permit       ip any 224.0.0.0 15.255.255.255 fragments

    permit       ip any 224.0.0.0 15.255.255.255 fragments

    permit       ip any 224.0.0.0 15.255.255.255

    policy-route ip 10.10.10.0 0.0.0.255 any fragments

    policy-route ip 10.10.10.0 0.0.0.255 any fragments

    policy-route ip 10.10.10.0 0.0.0.255 any

    permit       ip any any fragments

    permit       ip any any fragments

    permit       ip any any

We  can see that a redirect is installed for all traffic.  This would cause  all traffic to be sent to the netflow path, including the traffic  matched for PBR.  Though we  see the “policy-route” statement installed  for PBR, this traffic will never be hit in hardware due to the redirect  statement matching all traffic.


Now I am going to send a flow from an Ixia from  10.10.10.2 to 10.10.102.10.

The redirect installed for NAT will cause this traffic to be redirected to the RP CPU for a netflow entry to be installed.  However,   If we look at netflow we will see the entry installed for this traffic:

6500-2#sh mls netflow ip

Displaying Netflow entries in Active Supervisor EARL in module 2

DstIP           SrcIP           Prot:SrcPort:DstPort  Src i/f      :AdjPtr

-----------------------------------------------------------------------------

Pkts         Bytes         Age   LastSeen  Attributes

---------------------------------------------------

0.0.0.0         0.0.0.0         0   : 0      : 0

       --           :0x0       

30           1380          72    13:31:25   L3 - Dynamic

10.10.102.10    10.10.10.2      tcp : 0      : 0     Vl10          :0x0       

0            0             82    13:31:25   L3 - Dynamic

We can see in red above that no packets/bytes have hit this entry.  This is because this traffic is being software forwarded.  This is due to the fact that PBR is configured for the same subnet (10.10.10.x).  This traffic can not both be redirect to the next-hop via PBR and go through the netflow patch in hardware.  This traffic is pushed to the RP CPU in an attempt to perform both operations.  We can also see that the CPU is high due to this traffic from IXIA:

6500-2#sh processes cpu

CPU utilization for five seconds: 57%/57%; one minute: 13%; five minutes: 7%

**Note all this will cause the CPU to be drive by interrupts not a process.

Notice that as soon as we remove PBR we can see that this entry is now being hit:

6500-2(config)#int vlan 10

6500-2(config-if)#no ip policy route-map PBR

6500-2(config-if)#do sh mls netflow ip

Displaying Netflow entries in Active Supervisor EARL in module 2

DstIP           SrcIP         Prot:SrcPort:DstPort  Src i/f      :AdjPtr

----------------------------------------------------------------------------

Pkts         Bytes         Age   LastSeen  Attributes

---------------------------------------------------

10.10.102.10    10.10.10.2      tcp : 0      : 0        Vl10       :0x80002   

77239        3552994       6     13:36:03   L3 - SwInstalled

0.0.0.0         0.0.0.0         0   : 0      : 0        --         :0x0       

126          5796          351   13:35:57   L3 – Dynamic

We  can also see that the netflow entry has moved from SwInstalled compared  to Dynamic as well as an AdjPtr created for this traffic, which will be  used for any further traffic in this flow.

***Note - When NAT is configured, we need to drive a different  flowmask for fragment packets so that the fragment packets (basically  the first fragments which have L4 port information) are not hardware  shortcut by matching the NAT netflow shortcut (by driving different flow  mask for fragment packets, fragment packets are prevented from matching  the NAT netflow shortcut installed with L4 port information) and these  packets need to be translated in software (first fragment and later  fragments).  We will continue to see high CPU if the traffic is  fragemented.


We have one option supported for PFC3B and later: "mls ip nat netflow-frag-l4-zero" which when configured, the netflow lookup key formed to lookup/match the netflow table would have the L4 information zeroed out for fragment packets (including first fragments) so that the first fragment (which has L4 information) does not match the netflow short installed for NAT in this case.

We can also see that the CPU has dropped, as expected:

6500-2#sh proc cpu | ex 0.00

CPU utilization for five seconds: 0%/0%; one minute: 45%; five minutes: 20%

Now, lets take a look to see if FM has changed:

6500-2#sh fm fie int vlan 10

Interface Vl10:

Feature interaction state created: Yes

Flowmask conflict status for protocol IP : FIE_FLOWMASK_STATUS_SUCCESS

Flowmask conflict status for protocol OTHER : FIE_FLOWMASK_STATUS_SUCCESS

Interface Vl10 [Ingress]:

Slot(s) using the protocol IP : 2

FIE Result for protocol IP : FIE_SUCCESS_NO_CONFLICT

Features Configured : [empty] - Protocol : IP

FM Label when FIE was invoked : 52

Current FM Label : 52

Last Merge is for slot: 0

num# of strategies tried : 1

  num# of merged VMRs in bank 1 = 0

  num# of free TCAM entries in Bank1 = Unknown

  num# of merged VMRs in bank 2 = 6

  num# of free TCAM entries in Bank2 = Unknown

Slot(s) using the protocol OTHER : 2

FIE Result for protocol OTHER : FIE_SUCCESS_NO_CONFLICT

Features Configured : OTH_DEF   - Protocol : OTHER

FM Label when FIE was invoked : 52

Current FM Label : 52

Last Merge is for slot: 0

Features in Bank2 = OTH_DEF

+-------------------------------------+

            Action Merge Table

+-------------------------------------+

   OTH_DEF            RSLT              R_RSLT            COL

+-------------------------------------+

   SB               HB                P                 0          

   X                P                 P                 0          

+-------------------------------------+

num# of strategies tried : 1

Description of merging strategy used:

  Serialized Banks: FALSE

  Bank1 Only Features: [empty]

  Bank2 Only Features: [empty]

  Banks Swappable: TRUE

Merge Algorithm: ODM

  num# of merged VMRs in bank 1 = 0

  num# of free TCAM entries in Bank1 = 32730

  num# of merged VMRs in bank 2 = 1

  num# of free TCAM entries in Bank2 = 32746

Interface Vl10 [Egress]:

No Features Configured

IP Guardian Feature Configured

Current IP Guardian flowmask : Intf Full Flow

FIE guardian interaction : Done

FIE guardian interaction result : SUCCESS

FIE guardian flowmask conflict status : FIE_FLOWMASK_NO_CONFLICT

num# of features that bump this interface : 1

features that bump this interface : NAT

No IPv6 Guardian Feature Configured

No QoS Feature Configured

Notice  that FM  still shows that NAT is listed as a feature that is bumping the   interface.  This is due to the fact that the first flow for all traffic   that needs to be NATed must be punted to the CPU for the creation of a   netflow entry.  However, as we can see above this netflow entry is now  being used in preference to what is configured in ACL TCAM.

Notice   that the flowmask is still the same, due to the fact that an   interface-full flowmask is required for NAT to work in hardware.

We   can also see that PBR is no longer listed as an installed feature and   is now listed as “empty” since this feature was removed from this   interface.

**Note   that the FM lael has not changed in this case.  This is due to the  fact  that FM assigns these dynamially and no other feature had already   reserved this label.  This is due to the fact that this is the only   interfaces on this 6500 with features configured.

If we look at ACL TCAM we can see that only a redirect is now installed for NAT.


6500-2#sh tcam int vlan 10 acl in ip

* Global Defaults not shared

Entries from Bank 0


Entries from Bank 1

    redirect     ip any any fragments
    redirect     ip any any fragments
    redirect     ip any any
    permit       ip any any fragments
    permit       ip any any fragments
    permit       ip any any

Traffic that does not match the redirect will be forwarded normally.

Below shows what this would look like if  NAT were removed rather than PBR.

The TCAM would look like the following:

6500-2#sh tcam int vlan 10 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit                   ip any 224.0.0.0 15.255.255.255
    policy-route          ip 10.10.10.0 0.0.0.255 any
    permit                   ip any any

If you look at FM we can see that PBR is only listed in hardware and NAT is no longer listed as a feature that is configured:

6500-2#sh fm fie interface vlan 10
Interface Vl10:
Feature interaction state created: Yes
Flowmask conflict status for protocol IP : FIE_FLOWMASK_STATUS_SUCCESS
Flowmask conflict status for protocol OTHER : FIE_FLOWMASK_STATUS_SUCCESS
Interface Vl10 [Ingress]:
Slot(s) using the protocol IP : 6
FIE Result for protocol IP : FIE_SUCCESS_NO_CONFLICT
Features Configured : PBR   - Protocol : IP
FM Label when FIE was invoked : 52
Current FM Label : 52
Last Merge is for slot: 0
Features in Bank1 = PBR 
+-------------------------------------+
    Action Merge Table
+-------------------------------------+
   PBR        RSLT      R_RSLT    COL
+-------------------------------------+
   SB       HB        P         0   
   AdR      AdR       P         0   
   X        P         P         0   
+-------------------------------------+
num# of strategies tried : 1
Description of merging strategy used:
  Serialized Banks: TRUE
Merge Algorithm: ODM
  num# of merged VMRs in bank 1 = 3
  num# of free TCAM entries in Bank1 = Unknown
  num# of merged VMRs in bank 2 = 0
  num# of free TCAM entries in Bank2 = Unknown
Slot(s) using the protocol OTHER : 6
FIE Result for protocol OTHER : FIE_SUCCESS_NO_CONFLICT
Features Configured : OTH_DEF   - Protocol : OTHER
FM Label when FIE was invoked : 52
Current FM Label : 52
Last Merge is for slot: 0
Features in Bank2 = OTH_DEF 
+-------------------------------------+
    Action Merge Table
+-------------------------------------+
   OTH_DEF    RSLT      R_RSLT    COL
+-------------------------------------+
   SB       HB        P         0   
   X        P         P         0   
+-------------------------------------+
num# of strategies tried : 1
Description of merging strategy used:
  Serialized Banks: FALSE
  Bank1 Only Features: [empty]
  Bank2 Only Features: [empty]
  Banks Swappable: TRUE
Merge Algorithm: ODM
  num# of merged VMRs in bank 1 = 0
  num# of free TCAM entries in Bank1 = 32734
  num# of merged VMRs in bank 2 = 1
  num# of free TCAM entries in Bank2 = 32742
Interface Vl10 [Egress]:
No Features Configured
No IP Guardian Feature Configured
No IPv6 Guardian Feature Configured
No QoS Feature Configured

TCAM overutilization from a feature configuration:

In certain cases the sup720 will not be able to fit all of the required parametes into the TCAM on the switch.  This most commonly an issue with overutilization of the ACL or Netflow TCAM.  When the traffic can not be fit into hardware it is fowarding in software via the RP CPU.

If we attempt to install an ACL that is to large to fit into TCAM we will see the following messages.  We can also see via the "show fm summary" command that the interfaces where these ACL's were reduced FM is inactive, which means this traffic is being software switched.

If we see an issue like this occur we need to confirm the space that is left with in TCAM.  This can be done via the "show tcam counts" command will which show ACL and QOS TCAM space available:

In order to look at this information for a DFC module (which will hold independet TCAM from the supervisor), you need to specify the module after the "show tcam counts" command.  For example, "show tcam counts module <mod#>"


**Note that TCAM counts displayed above may be after the TCAM merge has failed and may not show the TCAM has been overutilized.

If you have run out of Netflow TCAM, you will see the following error message:

%EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [99%]

You can check your TCAM utilization with the "show mls netflow table-contention detailed" command.

6500#show mls netflow table-contention detailed
Earl in Module 6
Detailed Netflow CAM (TCAM and ICAM) Utilization
================================================
TCAM Utilization             :     0%
ICAM Utilization             :     0%
Netflow TCAM count           :     3
Netflow ICAM count           :     0
Netflow Creation Failures    :     0
Netflow CAM aliases          :     0

The above shows us the Netflow TCAM utilization as a percentage, as well as the number of entires that are currently in TCAM.

**Note the ICAM listed above is used when two TCAM entries resolve to the same HASH, the ICAM is used to resolve these common entires by adding an additional parameter to distinguish these flows.

If you still need assistance troubleshooting FM, please open a Cisco TAC case to investigate your issue further.

HTH

Regards

Inayath

*Plz rate the usefull posts.

andresdavid
Level 1
Level 1
hello everyone, I have the same problem, I tried with the above values, the CPU remains the same at 99%, I also use Cisco 6509, installed version: 122-33.SXJ10.bin , an opinion please?

Captură de ecran din 2023.04.28 la 19.32.54.png

andresdavid
Level 1
Level 1

Apr 28 19:38:09: %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for ipnat node. No memory available -Process= "Chunk Manager", ipl= 2, pid= 1
-Traceback= 4169851C 41684214 41684200
Apr 28 19:38:19: %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for ipnat node. No memory available -Process= "Chunk Manager", ipl= 2, pid= 1
-Traceback= 4169851C 41684214 41684200
Apr 28 19:38:29: %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for FM FLow Info C. No memory available -Process= "Chunk Manager", ipl= 2, pid= 1
-Traceback= 4169851C 41684214 41684200
Apr 28 19:38:39: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x416C6C98, alignment 8
Pool: Processor Free: 3168116 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "IP Input", ipl= 0, pid= 278
-Traceback= 4169979C 4169FF54 416C6CA0 416987F0 416C83EC 416986A0 40D4A610 40D4B754 40D4C1FC 40D4E8A8 40D49EA0 40D2E25C 40B25954 40B14454 40B132E8 40B134C4
Apr 28 19:38:39: %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for ipnat node. No memory available -Process= "Chunk Manager", ipl= 2, pid= 1
-Traceback= 4169851C 41684214 41684200
Apr 28 19:38:49: %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for FM FLow Info C. No memory available -Process= "Chunk Manager", ipl= 2, pid= 1

This your same issue @MHM Cisco World and I have been looking at in the Routing topic?

If so, these syslog message likely are very relevant to your CPU high usage.  The errors, themselves, are bad stuff.

Interesting, NAT related(?).

Assuming you don't have a memory leak (bug), and since Cisco platforms (still?) don't do memory garbage collection, the normal solutions are, try to configure your device to use less memory (if possible) and/or add RAM to the device (if possible).

Review Cisco Networking products for a $25 gift card