hi Rafal,

rafal.almusawi · ‎06-27-2017

Hello All,

we are trying to migrate our PPPoE users termination from ASR1k to ASR9k but we have many questions about that and we faced some problems

below is that I have.

2x A9K-RSP440-SE

2x A9K-24X10GE-SE

software version 5.3.4

BNG package and license are installed

Core Issue....

we set the router up and everything looks fine BUT when we put some load real users and having some traffic we notice that the line card CPU is very high relativity to the number of users ( total user was something around 2k ) and if we put something around 15k user with their traffic the line card CPU reaches 100% and the router stop functioning, we face the same issue when we are terminating the users on the line card or on the RSP

BNG Questions....

1- what i understand is that the 9k is able to terminate 64k user per LC if we used LC based subscriber or 128k if we do RP based ( is that mean i can use dual RSP to have 256k user )

2- with LC based there is a restriction of using PQOS, is that lifted with the new version.

3- what is the best and recommended software for BNG deployment

below is a part of the router configuration

pool vrf default ipv4 expired
address-range 10.5.0.0 10.5.0.254
!
pool vrf default ipv4 SB_pool-B
address-range 10.4.0.2 10.4.7.254
!
pool vrf default ipv4 SB_pool-G
address-range 10.4.16.2 10.4.23.254
!
pool vrf default ipv4 SB_pool-P
address-range 10.4.24.2 10.4.31.254
!
pool vrf default ipv4 SB_pool-S
address-range 10.4.8.2 10.4.15.254
!
pool vrf default ipv4 SB_pool-VIP
address-range 10.4.32.2 10.4.39.254
!
ipv4 source-route
dynamic-template
type ppp DTP
service-policy type pbr HTTP_REDIRECT
ppp authentication chap pap
ppp lcp delay 1 0
keepalive 10 1
ppp ipcp dns x.x.x.x
ppp ipv6cp passive
ppp ipv6cp prot-rej
service-policy input UPLOAD
accounting aaa list default type session periodic-interval 4
ipv4 unnumbered Loopback1
ipv4 access-group USERS ingress
!
!
ipv4 access-list 50

ipv4 access-list USERS
10 deny tcp 10.0.0.0 0.255.255.255 10.0.0.0 0.255.255.255 eq www

!
ipv4 access-list EXPIRE
10 permit tcp 10.5.0.0 0.0.0.255 any eq www

!
ipv4 access-list ANY_ACL
10 permit ipv4 any any
!
ipv4 access-list PORTIAL
10 permit tcp any host 192.168.30.25 eq www
!
ipv4 access-list PRIVATE
10 permit ipv4 host 192.168.99.2 any
!
ipv4 access-list BLCK_EXP
5 permit ipv4 any host 192.168.30.25
!
ipv4 access-list SUBS_POOL
10 permit tcp any 10.4.0.0 0.0.255.255 eq www
!
ipv4 access-list CRITICAL_LIST
10 permit tcp any eq domain any

!
ipv4 access-list LOCAL_SERVICE
10 permit ipv4 x.x.x.0 0.0.0.255 any
!
class-map match-any ANY
match access-group ipv4 ANY_ACL
end-class-map
!
class-map match-any CRITICAL
match access-group ipv4 CRITICAL_LIST
end-class-map
!
class-map match-any LOCAL_SERVICE_CLASS
match access-group ipv4 LOCAL_SERVICE
end-class-map
!
class-map type traffic match-any SUBS_HTTP
match access-group ipv4 SUBS_POOL
end-class-map
!
class-map type traffic match-any EXPIRE_CLASS
match access-group ipv4 EXPIRE
end-class-map
!
class-map type traffic match-any PORTIAL_CLASS
match access-group ipv4 PORTIAL
end-class-map
!
policy-map Vip
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 40 mbps
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 12 mbps
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map Gold
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 24 mbps
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 6 mbps
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map Bronze
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 8 mbps
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 2 mbps
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map Silver
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 16 mbps
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 3 mbps
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map UPLOAD
class ANY
police rate 100 mbps
!
!
class class-default
!
end-policy-map
!
policy-map Platinum
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 32 mbps
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 8 mbps
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map type pbr HTTP_REDIRECT
class type traffic SUBS_HTTP
transmit
!
class type traffic PORTIAL_CLASS
transmit
!
class type traffic EXPIRE_CLASS
http-redirect https://192.168.30.25/info
!
class type traffic class-default
!
end-policy-map
!
interface Loopback1
ipv4 address 10.0.0.1 255.0.0.0
interface TenGigE0/1/0/0.11
ipv4 address 192.168.11.2 255.255.255.252
encapsulation dot1q 11
ipv4 access-group BLCK_EXP egress
!
interface TenGigE0/1/0/0.12
description FTTx-HQ
service-policy type control subscriber PPP
pppoe enable bba-group BBA_1
encapsulation dot1q 12
!
interface TenGigE0/1/0/0.79
ipv4 address 192.168.30.27 255.255.255.248
encapsulation dot1q 79
!
interface TenGigE0/1/0/0.1199
ipv4 address 172.1.1.1 255.255.255.0
service-policy type control subscriber PPP
pppoe enable bba-group BBA_1
encapsulation dot1q 1199
!
interface TenGigE0/1/0/0.2050
service-policy type control subscriber PPP
pppoe enable bba-group BBA_1
encapsulation ambiguous dot1q 2050 second-dot1q 2-4094
!
interface TenGigE0/1/0/0.2051
service-policy type control subscriber PPP
pppoe enable bba-group BBA_1
encapsulation ambiguous dot1q 2051 second-dot1q 2-4094
!
interface TenGigE0/1/0/0.2060
service-policy type control subscriber PPP
pppoe enable bba-group BBA_1
encapsulation dot1q 2060
!
ssh server v2
ssh server vrf mgmt ipv4 access-list 50
ssh server vrf default ipv4 access-list 50
aaa accounting service default group radius
aaa accounting subscriber default group radius
aaa authorization subscriber default group radius
aaa authentication subscriber default group radius
subscriber
pta tcp mss-adjust 1420
!
pppoe bba-group BBA_1
service selection disable
sessions vlan limit 65535
sessions inner-vlan limit 65535
sessions outer-vlan limit 65535
sessions access-interface limit 65535
pado delay 0
!
class-map type control subscriber match-any PPP
match protocol ppp
end-class-map
!
!
policy-map type control subscriber PPP
event session-start match-first
class type control subscriber PPP do-until-failure
10 activate dynamic-template DTP
!
!
event session-activate match-first
class type control subscriber PPP do-until-failure
10 authenticate aaa list default
20 authorize aaa list default identifier username password use-from-line
!
!
end-policy-map
!
end

thanks in advance

Aleksandar Vidakovic · ‎06-27-2017

hi Rafal,

you have chosen a good IOS XR release for BNG.

To find the cause of the high CPU, check which process is consuming most CPU cycles; also look into punt reasons in NP counters (sh controller np counters ...), netio (sh netio clients) and ipv4 traffic statistics (sh ipv4 traffic).

Typical cause of high CPU in BNG deployments is when ICMP unreachables are enabled and lots of packets are denied by the access-list. You should easily find evidence for this in commands that I have mentioned.

BNG scale numbers for 5.3 are the same as 5.2:

https://supportforums.cisco.com/document/12529621/bng-deployment-scale-guidelines-asr9000

Actually, these scale numbers remain the same also in 6.x releases on 32-bit IOS XR. Further increase in scale numbers comes with BNG support on 64-bit IOS XR.

hope this helps,

/Aleksandar

rafal.almusawi · ‎06-28-2017

Hello Aleksander,

now I put something around 8k user on the router but we face the LC CPU is 45% and the RP is 15%

the other notice is that the other LC is having a CPU load of 25% even there is nothing connected to it.

attached is all the output of the command that you asked.

your help is really appreciated

Thanks

Aleksandar Vidakovic · ‎06-28-2017

hi Rafal,

These are the counters you should do something about:

 303  RSV_PUNT_IP_MTU_EXCEEDED                               92318          23
 832  PUNT_NO_MATCH                                         398154          77
1050  PPPOE_FRAG_NEEDED_PUNT                                 92318          23

Run the "sh controllers np descriptions location <location>" command to see what each of them means. The 1st and 3rd require MTU adjustment. The 2nd is related to ICMP unreachables that I was already mentioning.

Apart from this, I don't see any other concerns.

CPU utilisation is not a linear function of tasks that the CPU has to do. There are always internal things that the CPU has to take care of. In the samples you have shared eth_server at 8% means that control plane updates are happening. Possibly because of the subscriber interfaces that are created and required the associated structures to be updated (RIB, FIB, etc.).

/Aleksandar

rafal.almusawi · ‎06-28-2017

Hello Aleksandar,

many thanks for your prompt response, i had adjusted the MTU for all interfaces.

can you please let me know what should i do about the ICMP unreachable issue, i already has " ipv4 unreachables disable" in the dynamic template

the other thing is , why i'm getting load on the other LC that nothing connected in to it?

thanks

Aleksandar Vidakovic · ‎06-28-2017

hi Rafal,

you can confirm that ipv4 unreachables are really disabled on subscriber interfaces:

RP/0/RSP0/CPU0:our9001#sh uidb data location 0/0/CPU0 BE1.103.ip1 ingress | i ICMP
  IPV4 ICMP Punt                   0x0
  IPV6 ICMP Punt                   0x0

If you see 0x1, it means unreachables are enabled.

All line cards in the chassis must have the same forwarding information. When you create a subscriber interface on one line card, a RIB and FIB entry are associated with it. This FIB entry must be created on all other line cards. This is why processes like eth_server and prm_server_ty were showing single digit CPU utilisation. The former is responsible for communication via EOBC and the latter of HW programming. Ultimately, 25% CPU utilisation at steady state is not a concern. CPU utilisation is not a linear graph. When you see 50% CPU utilisation, it doesn't mean that the CPU is at 50% of it's capacities.

/Aleksandar

rafal.almusawi · ‎06-28-2017

Hello Sir,

RP/0/RSP0/CPU0:BNG#sh uidb data location 0/0/CPU0 BE10.14.pppoe740 ingress | i ICMP

Wed Jun 28 16:13:20.137 GMT

IPV4 ICMP Punt 0x0

IPV6 ICMP Punt 0x0

RP/0/RSP0/CPU0:BNG#

its already not enabled

what i understand now that everything things looks ok expect the 3 point that you mention.

for that what is the best way to terminate the PPPoE user , LC or RP , can we have the RP active/active mode to handle 256k users?

Thanks

Aleksandar Vidakovic · ‎06-28-2017

hi Rafal,

in IOS XR Standby RP is really in standby mode, regardless of the feature in question. That means the presence of the standby RP has no impact on the scale.

Please don't confuse this with the status of the switch fabric ASIC on standby RSP. :)

On 32-bit IOS XR you can achieve 256k subscirbers if you go for line card based subscribers. Again, refer to https://supportforums.cisco.com/document/12529621/bng-deployment-scale-guidelines-asr9000 that I was mentioning before.

If you prefer to stick to RP based subscribers, the limit on 32-bit XR remains 128k per chassis. In future we will support 256k RP based subscribers on 64-bit XR.

/Aleksandar

ASR9k BNG