cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1011
Views
0
Helpful
7
Replies
markmayfield
Beginner

ASR9k - 9901 or 9904 A9K-4X100GE-SE Line Card ACL/ACE capabilities

I'm trying to determine real-world limits on TCAM for (mainly IPv4) ACE's in the ASR 9901 and/or the 9904 with the A9K-4X100GE-SE line card from people that have experience with them..  

 

My Cisco team is having difficulty getting to hard specification numbers since there's some flexibility in how TCAM is allocated and used in these platforms apparently.  

 

Since I may also get folks willing to throw out other approaches than the 9k, I'll also add a bit of background:

 

My operating environment is unusual; something of a metro-area Internet/application/managed service provider to mid-sized suburban governments, mostly on private fiber to our regional datacenters.  As such, we host applications and services with fairly large and varied ACL's to segment these government units from each other but still access shared resources. 

 

I'm on a Nexus 7000 right now and have a total ACE count just under 50k; the largest single ACL is 18k. A firewall isn't terribly appealing, as my needs are more towards throughput and state tracking isn't critical, though my biggest ACL's are the least throughput demanding, so I could probably move some items and get down to 30k.   

 

Thanks for your thoughts!

 

Mark

1 ACCEPTED SOLUTION

Accepted Solutions

With default LPTS taking up 8k entries you have around 88k entries available for ACLs in IPv4 on those cards, yes.

 

I should note that when you go to modify an ACL it keeps the original ACL and then makes a copy with the changes and commits that to TCAM too, then when you issue commit it verifies the ACL is valid and that the programming is valid and will then delete the old ACL and point to the new one, so you will have double the entries (approximately) when you go to make a change. You can of course work around this by removing the ACL completely and then re-adding the entire ACL from scratch. But thought I would share that if you are looking at larger ACLs.

Another option for large ACLs is object-groups, those are pretty cool and allow for specifying networks in a list and then port ranges in another and then it basically dynamically creates the ACL, its easier to read and understand and can result in fewer TCAM entries used when multiple networks use the same ports.

 

Sam

View solution in original post

7 REPLIES 7
smilstea
Cisco Employee

 

 

This command shows the total TCAM for the card and how many entries are used. IPv4 uses ODS2 and IPv6 uses ODS8 tables.

show prm server tcam summary all [ACL | AFMON | IFIB | LI | PBR | QOS | all] [all | np0 | npx] location {location}

RP/0/RSP0/CPU0:ASR9006-E#show prm server tcam summary all all np4 loc 0/1/cpu0
Wed Jun 25 16:07:18.618 UTC

                Node: 0/1/CPU0:
----------------------------------------------------------------

TCAM summary for NP4:

  TCAM Logical Table: TCAM_LT_L2 (1)
    Partition ID: 0, priority: 2, valid entries: 3, free entries: 317
    Partition ID: 1, priority: 2, valid entries: 0, free entries: 320
    Partition ID: 2, priority: 1, valid entries: 0, free entries: 320
    Partition ID: 3, priority: 1, valid entries: 0, free entries: 11840
    Partition ID: 4, priority: 0, valid entries: 76, free entries: 11700
  TCAM Logical Table: TCAM_LT_ODS2 (2), free entries: 16304, resvd 128
    ACL Common Region: 0 entries allocated. 0 entries free
    Application ID: NP_APP_ID_IFIB_IPV4 (0)
      Total: 1 vmr_ids, 8005 active entries, 8005 allocated entries.
    Application ID: NP_APP_ID_QOS (1)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: NP_APP_ID_IPV4_ACL (2)
      Total: 1 vmr_ids, 139 active entries, 139 allocated entries.
    Application ID: NP_APP_ID_AFMON (3)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: (null) (4)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: NP_APP_ID_PBR (5)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
  TCAM Logical Table: TCAM_LT_ODS8 (3), free entries: 1929, resvd 64
    ACL Common Region: 0 entries allocated. 0 entries free
    Application ID: NP_APP_ID_IFIB_IPV6 (0)
      Total: 1 vmr_ids, 2103 active entries, 2103 allocated entries.
    Application ID: NP_APP_ID_QOS_IPV6 (1)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: NP_APP_ID_ACL_IPV6 (2)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: (null) (3)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: (null) (4)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.
    Application ID: NP_APP_ID_PBR_IPV6 (5)
      Total: 0 vmr_ids, 0 active entries, 0 allocated entries.

 

 

 

This command shows the sequence number and how many TCAM entries per ACE, also total ACEs and total TCAM usage

show pfilter-ea fea ipv4-acl <ACL> loc <loc>

 

 

 

How many TCAM entries does an ACL/ACE use?

 

Due to bit-boundaries and what ACL options are used a particular ACE may use multiple TCAM entries

 

1 ACE without any ranges (TCP ports, UDP ports, ttl) will typicall map to 1 TCAM entry.

ACEs with ranges occupy multiple TCAM entries

Note: 1 ACL cannot have more than 64K TCAM entries

 

Using bit boundaries helps to minimize the number of TCAM entries needed:

For example matching ports ranged 0 - 1023 would take a single TCAM entry

However, if you needed to match 0-1024, that would be two rules:

  •  1 to match 0-1023
  •  1 to match only 1024

 

As well interfaces on the same NP having the same ACL applied in the same direction share TCAM entries, this helps minimize the TCAM space that is used.

 

 

-SE cards have more TCAM than -TR cards.

 

 

 

Options to improve ACL scale

 

Increase the physical size of the TCAM

    • not realistic -- hardware spin costs, very high NRE, very long TTM

 

Use software (either built in or external) to compress ACLs.  This is essentially the opposite of range expansion... if we have two separate ACL rules that can be combined into a single binary equivalent, we can represent those two entries by a single new entry.

  • Downsides:
    • development cost/time
    • stats are only maintained per-entry so we lose some statistics granularity
    • lots of dependencies on how rules are written and ordered, algorithmic approach may not do the best job optimizing

 

Re-carve TCAM memory to allow more ACL space

    • reduces scale for some other application (possible reduction in L2 interface scale, similar to L3/L3XL profiles)
    • requires larger memory cards to get any value
    • still limited by data structures to 2^16 entries in a single ACL

 

ACL Compression

 

Scale ACL’s can be compressed to minimize TCAM usage

  • Less amount of TCAM’s used
  • Additional lookups needed à impacts NP performance

Typhoon Supports 3 compression levels

 

Level 0: No compression.

  • Simply expands the object-groups and creates ACE’s. Performance same as legacy ACLs
  • Benefit: ACL definitions are very simple and can be grouped

Level 1: Source prefix compression.

  • Uses less amount of TCAM entries
  • Leads to NP performance hit.

Level 3: Compresses all 4 parameters (source/dest IP and port numbers)

  • Further improves TCAM space usage
  • Uses larger size TCAM entries (640 size Keys instead of 160 size keys. Hence utilizing TCAM space meant for IPv6 ACL’s)
  • Higher NP performance impact

 

Configuration is done on the port level by adding the 'compress' keyword after the direction keyword.

 

 

Example TCAM Recarving

 

On a MOD80-TR LC, by default, there are 20K v4 TCAM entries and around 4K v6 TCAM entries.

Each v6 entry consumes 4 times the TCAM bits compared to v4 (640 bits vs 160 bits), so even if you assigned all the bits to v6 you can get at most 8-9K v6 entries.

 

To re-carve the TCAM use the admin configuration 'hw-module profile tcam [default | tcam-part-30-70 | tcam-part-40-60 | tcam-part-50-50 | tcam-part-70-30]'.

The first percentage indicates the v4 TCAM entries, and the second indicates the v6 TCAM entries percentage. By default this is 60:40.

 

 

To verify the partitioning using the following CLI to see the total ods2/ods8 (IPv4/IPv6) allocation and the used amount:

 

RP/1/RSP0/CPU0:ASR9001-A#show prm server tcam partition all loc 0/0/CPU0
Wed Nov  5 15:46:43.863 UTC

                Node: 0/0/CPU0:
----------------------------------------------------------------
TCAM partition information: 1 ods2 blk = 2048 entries, 1 ods8 blk = 512 entries
NP0 : tot-ods2-blks 47 [60% of ods2+ods8 blks], used-ods2-blks 17 [22% of ods2+ods8 blks]
NP0 : tot-ods8-blks 31 [40% of ods2+ods8 blks], used-ods8-blks  2 [ 3% of ods2+ods8 blks]
NP1 : tot-ods2-blks 47 [60% of ods2+ods8 blks], used-ods2-blks  6 [ 8% of ods2+ods8 blks]
NP1 : tot-ods8-blks 31 [40% of ods2+ods8 blks], used-ods8-blks  2 [ 3% of ods2+ods8 blks]

 

 

Sam

Thank you @smilstea !

 

Could I follow up on a few points:

What card and what type of card is that output representative of? What I'm trying to get to is how close is this example output to either the 9901 or the 4x100GE-SE Line Card?

 

The first example would seem to suggest there are about 16k entries for the ODS2 tables there:

  TCAM Logical Table: TCAM_LT_ODS2 (2), free entries: 16304, resvd 128

 

However towards the bottom of your example about re-carving you show this:

TCAM partition information: 1 ods2 blk = 2048 entries, 1 ods8 blk = 512 entries
NP0 : tot-ods2-blks 47 [60% of ods2+ods8 blks], used-ods2-blks 17 [22% of ods2+ods8 blks]

Which if there are 47 blocks each supporting 2048 entries, that would show about 96k ODS2 entries, which is a statistic I've seen elsewhere on the Tomahawk NP. 

 

Also if this example were re-carved to 70/30, I presume would there then be more than 47 blocks?  

 

Finally, I understand that the options applied affect how much TCAM a specific ACE will use.  For my purposes, I'm mainly trying to understand the reasonable supported count of the most basic ACE's.  Does this output indicate that the example device in its current tcam profile can support 96k TCAM entries?  And if so does this example apply to the 9901 and/or the line card I am investigating?

 

Thanks so much!

 

Mark

 

The first example is from a -TR typhoon card which has 24k entries, 8k were allocated for LPTS already which is why you see about 16k free entries.

 

96k is for -SE, both typhoon and tomahawk share this value.

 

Correct on blocks, so how it works is that ipv4 entries are 160B in size, and ipv6 are 640B (4 times the size). So when we move from ipv6 blocks to ipv4 blocks we free up a lot more entries for ipv4 than if we exchange ipv4 entries for ipv6 entries.

 

LPTS you will see take up 8k entries. But yes 96k for -SE cards which the 9901 is one of, I also checked the other LC in my lab and its roughly the same number of entries.

 

This is from a 9901 in my lab:

TCAM Logical Table: TCAM_LT_ODS2 (2), free entries: 89709, resvd 128
ACL Common Region: 448 entries allocated. 448 entries free
Application ID: NP_APP_ID_IFIB (0)
Total: 1 vmr_ids, 8005 active entries, 8005 allocated entries.

 

Sam

@smilstea , thank you for the additional information.  That was very helpful.

 

One final inquiry if I may:  Will the 9901 able to utilize the "'hw-module profile tcam tcam-part-70-30" configuration, and if so would that further increase the TCAM blocks availible for IPv4 ACE's? 

 

My math suggests that would then become 112,640 assuming one rounds the 70% calculation up, which appears to be the case in the slides for BRKSPG-2904.  Does that appear to be correct and possible on the 9901?

 

Thank you again for the info!

 

Mark

My apologies, I did some looking and that hw-module profile tcam is only supported on typhoon LCs, not tomahawk which are the two cards you asked about.

 

Sam

@smilstea Thanks again.

 

No worries on that distinction regarding the "hw-module profile" command. 

 

Just so I'm 100% clear then, both the 9901 and the modular Tomahawk -SE cards are fixed at a max of 96k ODS2 TCAM entries, resulting in a real-world IPv4 ACE limit somewhere below that number depending on ACE complexity? 

 

Thanks!

 

Mark

With default LPTS taking up 8k entries you have around 88k entries available for ACLs in IPv4 on those cards, yes.

 

I should note that when you go to modify an ACL it keeps the original ACL and then makes a copy with the changes and commits that to TCAM too, then when you issue commit it verifies the ACL is valid and that the programming is valid and will then delete the old ACL and point to the new one, so you will have double the entries (approximately) when you go to make a change. You can of course work around this by removing the ACL completely and then re-adding the entire ACL from scratch. But thought I would share that if you are looking at larger ACLs.

Another option for large ACLs is object-groups, those are pretty cool and allow for specifying networks in a list and then port ranges in another and then it basically dynamically creates the ACL, its easier to read and understand and can result in fewer TCAM entries used when multiple networks use the same ports.

 

Sam

View solution in original post