ASR9000/XR - BNG - L3 sub-interface limit for trunk (4096) error - what is the work around?

Andy Erickson · ‎03-10-2015

We currently have 7,500 broadband subscribers that we will be terminating on our ASR 9001.

Each one of our customers will be terminating on a sub-interface on a bundle.

On the 9k, there will be a QoS policy applied to rate-limit their broadband connection (see example below).

The challenge that we are running into right now is scaling beyond 4096 L3 sub-interfaces. When running through this in our lab, we receive the following fail message:

RP/0/RSP0/CPU0:BNG(config-subif)#show config failed

Tue Mar 10 18:32:07.552 UTC

!! SEMANTIC ERRORS: This configuration was rejected by

!! the system due to semantic errors. The individual

!! errors with each failed configuration command can be

!! found below.

interface Bundle-Ether10.6941171

!!% The L3 sub-interface limit for the trunk interface has been reached: Trunk limit for L3 subinterfaces on Bundle-Ether10 is 4096

We have added the following on to each of the sub-interfaces to "fake" out the NPU, but even with SPD configured, we are receiving the max 4096 message:

service-policy output <POLICY> subscriber-parent resource-id 0
service-policy output <POLICY> subscriber-parent resource-id 1
service-policy output <POLICY> subscriber-parent resource-id 2
service-policy output <POLICY> subscriber-parent resource-id 3

It is my understanding that we have a total of 4 resource ID's to use (0-3) and the ASR 9001 will support up to 32,000 sub-interfaces (system wide or 8,000 sub interfaces per resource-id).

See attached image for reference this design.

Main question to the community is what is the work around to scale beyond 4096 L3 sub-interfaces??

In our case it is not feasible to bring in additional bundles and spread the customers out.

Look forward to your responses.

Below is a sample configuration:

policy-map 10M_D
class class-default
shape average 10100000 bps
!
end-policy-map
!
policy-map 10M_U
class class-default
police rate 10300000 bps
exceed-action drop
!
!
end-policy-map
!

interface Bundle-Ether10.650102
description ---INT: GigabitEthernet0/0/1.650102 NAME: TEST #1---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 0
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 102
!
interface Bundle-Ether10.650103
description ---GigabitEthernet0/0/1.650103 NAME: TEST #2---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 1
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 103
!
interface Bundle-Ether10.650104
description ---INT: GigabitEthernet0/0/1.650104 NAME: TEST #3---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 2
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 104
!
interface Bundle-Ether10.650105
description ---INT: GigabitEthernet0/0/1.650105 NAME: TEST #4---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 3
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 105
!

interface Bundle-Ether10.650106
description ---INT: GigabitEthernet0/0/1.650106 NAME: TEST #5---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 0
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 106
!
interface Bundle-Ether10.650107
description ---INT: GigabitEthernet0/0/1.650107 NAME: TEST #6---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 1
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 107
!
interface Bundle-Ether10.650108
description ---INT: GigabitEthernet0/0/1.650108 NAME: TEST #7---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 2
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 108
!
interface Bundle-Ether10.650109
description ---INT: GigabitEthernet0/0/1.650109 NAME: TEST #8---
service-policy input 10M_U
service-policy output 10M_D subscriber-parent resource-id 3
ipv4 point-to-point
local-proxy-arp
ipv4 unnumbered Loopback10
encapsulation dot1q 650 second-dot1q 109
!

xthuijs · ‎03-12-2015

hey andy,

since the config for all your subifs is rather the same, you could "save" some subinterfaces by configuring it like this:

interface Bundle-Ether10.650102
...<all stuff here>
encapsulation ambiguous dot1q 650 second-dot1q 102-109

You will have a limit of 4 L3 configured subinterfaces (they register with NETIO, hence the limit here), the subscriber interfaces dont use NETIO hence do not count against that number.

Also the issue you are seeing here is not related to the q's and chunks etc. It is really the number of L3 subinterfaces you have defined.

xander

Andy Erickson · ‎03-12-2015

xander,

Thanks for the response on the challenge we’re having.

Please take a look at the attached high level design diagram (attached).

Note on the design requirements:

Each broadband customer needs to have the ability to be completely separate from all other customers.
1. Separate IP address(s) (tied back to separate Loopbacks)
  1. Static
    1. I.E Loopback 10
  2. DHCP
    1. I.E Loopback 20
2. Bandwidth Package
  1. Each customer pays for separate “speed” packages
  2. Need to be rate-limited separately from one another
    1. I.e. 10Mb, 25Mb, 100Mb, 1Gbps packages

The challenge I see in the suggestion of <encapsulation ambiguous dot1q 650 second-dot1q 101-109> is that the customer separation requirement "listed above" will not happen as they are all sharing the same common characteristics. This design works around the 4096 sub-interface limit, but not for the overall design requirement.

Keep in mind that we have ~7,500 broadband customers.

Any way we can get around this 4096 sub-interface limit w/out adding separate “customer facing” bundles?

Thanks in advance!

-ae

xthuijs · ‎03-12-2015

hey andy,

the access interface is just what it is: an access interface, it needs to have IP enabled to consume the DHCP discover, but once the subscriber interface is created the "ingress interface" from a FIB perspective will be the bundle-e Z.X.ipY, and not bundle-e Z.X.

so your subs have natural separation alrady based on the dhcp binding and their unique interface assigned.

The best design practice is to aggregate as many vlans as possible with the ambigious option, that simplifies your config also.

only when the subscriber if is gone AND they retain their lease, then they will route against the access interface, but that we can easily mitigate with an ACL or a packet trigger subscriber creation.

cheers!

xander

Andy Erickson · ‎03-12-2015

xander,

do you have any config examples of how this would look?

Mostly curious on how we keep the 'per subscriber' separation requirement (static vs dhcp) / (speed package differences), etc, etc... w/out having to configure separate sub interfaces per customer.

Thanks again xander!

-ae

xthuijs · ‎03-12-2015

that is the config example I gave you below (or up there :), the:

encap ambigious blabla

you can possibly enhance this with:

1) adding urpf on the access interface and giving its unnumbered a bogus address so that all dhcp users will fail the urpf

2) using an ACL to only allow udp port 69 incoming (to block all other user traffic in case their subscr context is lost, but allows them to discover and renew)

3) use a packet trigger subscriber initiator like this:

interface Bundle-Ether100.2
ipsubscriber ipv4 l2-connected
initiator unclassified-source

cheers

xander

Andy Erickson · ‎03-12-2015

xander,

Few more questions:

1. Is there no way around the 4096 sub-interface limit?

a) Hardware limitation??

2. when would I have additional sub-interfaces?

a) DHCP in one sub-interface (ex. bundle-e 10.101) and static's in another sub-interface (ex. bundle-e 10.102)?

3. how am I going to rate-limit on a per subscriber bases in what you are suggesting?

Just trying what you're saying all together. ;)

-ae

xthuijs · ‎03-12-2015

1) the limitation originates from the NETIO. NETIO (aka "ip input in ios, process level switching that needs to know about all interfaces for ICMP and apps) was originally not designed for that high scale of interfaces per LC. That is why we designed the SINT (Subscriber INTerface) for this purpose to serve apps and better support. But still every L3 if is registering with NETIO if they are not subscriber based.

2) you really don't need that many subifs, that is why we designed the amb vlan subifs, to allow for that elimination, the access interface is merely to consume the initial discover, but after that it is all subscriber (andh based on the dhcp binding, hence the need for proxy and therefore relay will not work with subscribers, to allow for this high scale.

3) that is in the nature already of the implementation, we have LPTS for overall control plane polcing and protection, and THEN we have the subscriber manager in front of that to make sure that there is no single sub abusing any punt policer.

So you're taken care of pretty much. I do know where your "concern" comes from. We had that situation in earlier hardware platforms, and we took care of that in this one :)

cheers

xander

Andy Erickson · ‎03-12-2015

xander,

getting closer -> i promise ;)

#3 there is where i'm getting hung up on i guess...

where i'm not making the tie yet is if i have "X" number of customers coming in all unique "double tagged", how is the 9k going to know / identify the customers (no PPPoE or authentication) from one another so it can attach them into the proper loopback and apply the proper speed profile for rate-limiting.

Once i'm clear on this, i'll have it =)

Look forward to your response.

-ae

Andy Erickson · ‎03-12-2015

**note -> keep in mind in my scenario there is no AAA/radius server.

-ae

xthuijs · ‎03-12-2015

Doesn't change the scenario :)

AAA is in the picture between discover reception and hauling it to the AAA server for accept (based on mac, RID, CID, class etc). You can or cannot use AAA, but it doesnt affect the protection of the control plane we discussed above.

It only controls whether the discover is relayed or not to the dhcp server AND possibly the radius can provide back some basic user interface config if you like.

But this can come from the template also, and you can always use COA to modify the subscr if also.

cheers!

xander

Andy Erickson · ‎03-13-2015

xander,

really curious what the overall XR config would look like in this case. Do you have any full config examples?

Specifically how each customer is provisioned for IP address (dhcp or static) and speed profiles are attached w/out a AAA server involved controlling each customer.

Still foggy on how this is accomplished w/out a sub-interface per customer and if i could see the config of multiple customers, it would sure help.

look forward to your response.

CHEERS!

-ae

xthuijs · ‎03-13-2015

yeah sure!

check this one: https://supportforums.cisco.com/document/77646/asr9000-understanding-bng-configuration-walkthrough

and then in the config portion just skip the :

" 20 authorize aaa list default format LOGIN password ISG" portion of it

that will omit the radius interaction and go from discover to dhcp server and reply.

cheers!

xander

Andy Erickson · ‎03-21-2015

xander,

Thanks again for all your feedback on this.

One additional question for you. In our design there is no PPPoE and we'll need to authenticate the subscribers based on their QinQ tags (which are unique).

What would the AAA format look like on the 9k to use these values to authenticate to RADIUS?

In your BNG post, I see that you're using the following:

aaa attribute format NAS-PORT-ID

circuit-id plus remote-id

!

aaa radius attribute nas-port-id format NAS-PORT-ID

When I look at our 9k in the lab, i see the following options:

RP/0/RSP0/CPU0:BNG-LAB(config)#aaa attribute format BNG-TEST-Q-in-Q
RP/0/RSP0/CPU0:BNG-LAB(config-id-format)#?
apply-group Apply configuration from a group
circuit-id Circuit ID
clear Clear the uncommitted configuration
commit Commit the configuration changes to running
describe Describe a command without taking real actions
do Run an exec command
exit Exit from this submode
format-string extended format
mac-address Mac-address
no Negate a command or set its defaults
pwd Commands used to reach current submode
remote-id Remote ID
root Exit to the global configuration mode
show Show contents of configuration
username-strip User name formatting
RP/0/RSP0/CPU0:BNG-LAB(config-id-format)#

Any feedback on how we're going to authenticate base in QinQ tags to radius would be much appreciate.

Thank you!

-ae

xthuijs · ‎03-21-2015

hi andy,

in order to use radius, you need some sort of BNG interaction here, that means a control policy. This control policy can be activated by either:

-pppoe PADI (pppoe access)

-dhcp discovery (ip access)

-uncasslified source ip address (ip access)

When any of these 3 events happen, you can select and compose a username for authentication.

This would look like in the control policy for that event session-start like this:

10 authorize aaa list default format MYFORMAT

Then you define your MYFORMAT like this:

aaa attribute format MYFORMAT

format-string "vlan-%s:vlan-%s" inner-vlan-id outer-vlan-id

this will result in a radius access-request with the username looking like:

vlan-100:vlan-200 if the QiQ combo was 200:100

cheers

xander