06-27-2017 03:53 AM - edited 03-01-2019 03:07 PM
Hello All,
we are trying to migrate our PPPoE users termination from ASR1k to ASR9k but we have many questions about that and we faced some problems
below is that I have.
2x A9K-RSP440-SE
2x A9K-24X10GE-SE
software version 5.3.4
BNG package and license are installed
Core Issue....
we set the router up and everything looks fine BUT when we put some load real users and having some traffic we notice that the line card CPU is very high relativity to the number of users ( total user was something around 2k ) and if we put something around 15k user with their traffic the line card CPU reaches 100% and the router stop functioning, we face the same issue when we are terminating the users on the line card or on the RSP
BNG Questions....
1- what
2- with LC based there is a restriction of using PQOS, is that lifted with the new version.
3- what is the best and recommended software for BNG deployment
below is a part of the router configuration
pool
address-range 10.5.0.0 10.5.0.254
!
pool
address-range 10.4.0.2 10.4.7.254
!
pool
address-range 10.4.16.2 10.4.23.254
!
pool
address-range 10.4.24.2 10.4.31.254
!
pool
address-range 10.4.8.2 10.4.15.254
!
pool
address-range 10.4.32.2 10.4.39.254
!
ipv4 source-route
dynamic-template
type
service-policy type
keepalive 10 1
service-policy input UPLOAD
accounting aaa list default type session periodic-interval 4
ipv4 unnumbered Loopback1
ipv4 access-group USERS ingress
!
!
ipv4 access-list 50
ipv4 access-list USERS
10 deny tcp 10.0.0.0 0.255.255.255 10.0.0.0 0.255.255.255 eq www
!
ipv4 access-list EXPIRE
10 permit tcp 10.5.0.0 0.0.0.255 any eq www
!
ipv4 access-list ANY_ACL
10 permit ipv4 any any
!
ipv4 access-list
10 permit
!
ipv4 access-list PRIVATE
10 permit ipv4 host 192.168.99.2 any
!
ipv4 access-list BLCK_EXP
5 permit ipv4 any host 192.168.30.25
!
ipv4 access-list SUBS_POOL
10 permit
!
ipv4 access-list CRITICAL_LIST
10 permit
!
ipv4 access-list LOCAL_SERVICE
10 permit ipv4 x.x.x.0 0.0.0.255 any
!
class-map match-any ANY
match access-group ipv4 ANY_ACL
end-class-map
!
class-map match-any CRITICAL
match access-group ipv4 CRITICAL_LIST
end-class-map
!
class-map match-any LOCAL_SERVICE_CLASS
match access-group ipv4 LOCAL_SERVICE
end-class-map
!
class-map type traffic match-any SUBS_HTTP
match access-group ipv4 SUBS_POOL
end-class-map
!
class-map type traffic match-any EXPIRE_CLASS
match access-group ipv4 EXPIRE
end-class-map
!
class-map type traffic match-any PORTIAL_CLASS
match access-group ipv4
end-class-map
!
policy-map Vip
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 40
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 12
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map Gold
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 24
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 6
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map Bronze
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 8
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 2
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map Silver
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 16
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 3
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map UPLOAD
class ANY
police rate 100
!
!
class class-default
!
end-policy-map
!
policy-map Platinum
class CRITICAL
priority level 1
!
class LOCAL_SERVICE_CLASS
police rate 32
conform-action transmit
exceed-action drop
!
!
class ANY
police rate 8
conform-action transmit
exceed-action drop
!
!
class class-default
!
end-policy-map
!
policy-map type
class type traffic SUBS_HTTP
transmit
!
class type traffic PORTIAL_CLASS
transmit
!
class type traffic EXPIRE_CLASS
!
class type traffic class-default
!
end-policy-map
!
interface Loopback1
ipv4 address 10.0.0.1 255.0.0.0
interface TenGigE0/1/0/0.11
ipv4 address 192.168.11.2 255.255.255.252
encapsulation dot1q 11
ipv4 access-group BLCK_EXP egress
!
interface TenGigE0/1/0/0.12
description FTTx-HQ
service-policy type control subscriber PPP
encapsulation dot1q 12
!
interface TenGigE0/1/0/0.79
ipv4 address 192.168.30.27 255.255.255.248
encapsulation dot1q 79
!
interface TenGigE0/1/0/0.1199
ipv4 address 172.1.1.1 255.255.255.0
service-policy type control subscriber PPP
encapsulation dot1q 1199
!
interface TenGigE0/1/0/0.2050
service-policy type control subscriber PPP
encapsulation ambiguous dot1q 2050 second-dot1q 2-4094
!
interface TenGigE0/1/0/0.2051
service-policy type control subscriber PPP
encapsulation ambiguous dot1q 2051 second-dot1q 2-4094
!
interface TenGigE0/1/0/0.2060
service-policy type control subscriber PPP
encapsulation dot1q 2060
!
ssh server v2
ssh server
ssh server
aaa accounting service default group radius
aaa accounting subscriber default group radius
aaa authorization subscriber default group radius
aaa authentication subscriber default group radius
subscriber
!
service selection disable
sessions
sessions inner-
sessions outer-
sessions access-interface limit 65535
!
class-map type control subscriber match-any PPP
match protocol
end-class-map
!
!
policy-map type control subscriber PPP
event session-start match-first
class type control subscriber PPP do-until-failure
10 activate dynamic-template DTP
!
!
event session-activate match-first
class type control subscriber PPP do-until-failure
10 authenticate aaa list default
20 authorize aaa list default identifier username password use-from-line
!
!
end-policy-map
!
end
thanks in advance
06-27-2017 03:27 PM
hi Rafal,
you have chosen a good IOS XR release for BNG.
To find the cause of the high CPU, check which process is consuming most CPU cycles; also look into punt reasons in NP counters (sh controller np counters ...), netio (sh netio clients) and ipv4 traffic statistics (sh ipv4 traffic).
Typical cause of high CPU in BNG deployments is when ICMP unreachables are enabled and lots of packets are denied by the access-list. You should easily find evidence for this in commands that I have mentioned.
BNG scale numbers for 5.3 are the same as 5.2:
https://supportforums.cisco.com/document/12529621/bng-deployment-scale-guidelines-asr9000
Actually, these scale numbers remain the same also in 6.x releases on 32-bit IOS XR. Further increase in scale numbers comes with BNG support on 64-bit IOS XR.
hope this helps,
/Aleksandar
06-28-2017 04:17 AM
Hello Aleksander,
now I put something around 8k user on the router but we face the LC CPU is 45% and the RP is 15%
the other notice is that the other LC is having a CPU load of 25% even there is nothing connected to it.
attached is all the output of the command that you asked.
your help is really appreciated
Thanks
06-28-2017 04:48 AM
hi Rafal,
These are the counters you should do something about:
303 RSV_PUNT_IP_MTU_EXCEEDED 92318 23 832 PUNT_NO_MATCH 398154 77 1050 PPPOE_FRAG_NEEDED_PUNT 92318 23
Run the "sh controllers np descriptions location <location>" command to see what each of them means. The 1st and 3rd require MTU adjustment. The 2nd is related to ICMP unreachables that I was already mentioning.
Apart from this, I don't see any other concerns.
CPU utilisation is not a linear function of tasks that the CPU has to do. There are always internal things that the CPU has to take care of. In the samples you have shared eth_server at 8% means that control plane updates are happening. Possibly because of the subscriber interfaces that are created and required the associated structures to be updated (RIB, FIB, etc.).
/Aleksandar
06-28-2017 05:44 AM
Hello Aleksandar,
many thanks for your prompt response,
can you please let me know what should
thanks
06-28-2017 06:03 AM
hi Rafal,
you can confirm that ipv4 unreachables are really disabled on subscriber interfaces:
RP/0/RSP0/CPU0:our9001#sh uidb data location 0/0/CPU0 BE1.103.ip1 ingress | i ICMP
IPV4 ICMP Punt 0x0
IPV6 ICMP Punt 0x0
If you see 0x1, it means unreachables are enabled.
All line cards in the chassis must have the same forwarding information. When you create a subscriber interface on one line card, a RIB and FIB entry are associated with it. This FIB entry must be created on all other line cards. This is why processes like eth_server and prm_server_ty were showing single digit CPU utilisation. The former is responsible for communication via EOBC and the latter of HW programming. Ultimately, 25% CPU utilisation at steady state is not a concern. CPU utilisation is not a linear graph. When you see 50% CPU utilisation, it doesn't mean that the CPU is at 50% of it's capacities.
/Aleksandar
06-28-2017 06:25 AM
Hello Sir,
06-28-2017 06:41 AM
hi Rafal,
in IOS XR Standby RP is really in standby mode, regardless of the feature in question. That means the presence of the standby RP has no impact on the scale.
Please don't confuse this with the status of the switch fabric ASIC on standby RSP. :)
On 32-bit IOS XR you can achieve 256k subscirbers if you go for line card based subscribers. Again, refer to https://supportforums.cisco.com/document/12529621/bng-deployment-scale-guidelines-asr9000 that I was mentioning before.
If you prefer to stick to RP based subscribers, the limit on 32-bit XR remains 128k per chassis. In future we will support 256k RP based subscribers on 64-bit XR.
/Aleksandar
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide