04-27-2013 05:50 PM - edited 03-01-2019 02:39 PM
Hi folks.
I have an issue with a 10008 acting as BRAS PPPOE server.
when there is an outage somewhere between the 10008 and clients, all clients have to reconnect to the 10008, and this has a major impact on CPU of the 10008, hitting 100% and then having problem of handling about 12 000 reconnections attemps by customers. After painfully reaching 8000 connections OK, it does loose and get back connections for a while, and painfully reaches 12 000 connections back after a few hours.
SSS manager process is about 20%
ROUTER#sh proc cpu sorted
CPU utilization for five seconds: 99%/19%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
170 966348 109491 8825 22.54% 21.08% 20.83% 0 SSS Manager
317 479292 238020 2013 9.59% 9.29% 9.41% 0 PPP Events
333 209572 55311 3788 8.95% 5.50% 4.70% 0 VTEMPLATE Backgr
310 336336 70939 4741 7.35% 7.60% 7.42% 0 PPPoE Discovery
242 158484 3727 42523 5.19% 4.69% 4.61% 0 c10k_periodic_st
181 232316 70878 3277 4.71% 4.94% 4.82% 0 SSM connection m
243 73280 668 109700 4.23% 2.22% 2.10% 0 STATS DMA Daemon
32 176984 40482 4371 3.43% 5.40% 6.14% 0 ARP Input
328 146300 73268 1996 3.35% 3.17% 3.50% 0 RADIUS
166 94528 71505 1321 2.15% 1.99% 2.04% 0 PPPoE Background
324 27116 82403 329 1.43% 1.21% 1.10% 0 SNMP ENGINE
150 57944 87227 664 0.95% 1.21% 1.30% 0 AAA Server
169 28804 10270 2804 0.87% 0.20% 0.40% 0 PPP IP Route
238 19464 3451 5640 0.79% 0.66% 0.60% 0 CEF: IPv4 proces
161 18168 102918 176 0.63% 0.44% 0.41% 0 IP Input
208 30860 70705 436 0.47% 0.74% 0.69% 0 IP Background
49 31924 43991 725 0.47% 0.37% 0.42% 0 Net Background
61 4304 42723 100 0.39% 0.17% 0.12% 0 IF-MGR control p
278 30340 44381 683 0.31% 0.60% 0.64% 0 AAA SEND STOP EV
172 8620 62002 139 0.31% 0.23% 0.20% 0 SSS Feature Mana
94 3988 3186 1251 0.23% 0.16% 0.11% 4 Virtual Exec
209 8048 10252 785 0.23% 0.08% 0.10% 0 IP RIB Update
322 3616 12275 294 0.15% 0.15% 0.12% 0 IP SNMP
330 1080 14005 77 0.07% 0.01% 0.00% 0 IPHC Admin
143 3360 40211 83 0.07% 0.05% 0.05% 0 CCM
77 400 8559 46 0.07% 0.00% 0.00% 0 C10K OIR process
279 3440 1483 2319 0.07% 0.36% 0.29% 3 Virtual Exec
152 928 82385 11 0.07% 0.01% 0.00% 0 ACCT Periodic Pr
214 2732 64087 42 0.07% 0.07% 0.07% 0 static
I do think that I need to use CoPP for rate limiting pppoe packets to CPU and did try the following :
class-map match-all COPP
match protocol pppoe
policy-map COPP
class COPP
police rate 500 pps burst 1000 packets peak-burst 0 packets conform-action transmit exceed-action drop violate-action drop
control-plane
service-policy input COPP
However it looks like this is not working as no packet are matched :
ROUTER#sh policy-map control-plane
Control Plane
Service-policy input: COPP
Class-map: COPP (match-all)
0 packets, 0 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Match: protocol pppoe
Police:
rate 500 pps, 1000 limit, 0 extended limit
conformed 0 packets, 0 bytes; action:
transmit
exceeded 0 packets, 0 bytes; action:
drop
violated 0 packets, 0 bytes; action:
drop
Class-map: class-default (match-any)
150989 packets, 17708701 bytes
5 minute offered rate 130000 bps, drop rate 0 bps
Match: any
Any idea on what i'm doing wrong?
I would like just to rate limit enough to be able not to hit 100 %CPU, and have a quick and steady recovery. Any other commands could help?
thanks
Solved! Go to Solution.
04-30-2013 05:03 AM
Hi Olivier,
Indeed, I was also looking at some documentation in Cisco.com but couldn't find any document describing properly. I had some links stored in my bookmarks but they doesn't seem to be available anymore in the website.
In any case, to explain the configuration of CAC, this is basically what you need to know (when using call admission new-model):
- The call rejection is based on either, cpu-limit or charge limit, whichever is exceeded first.
- The “limit” command specifies the total session charge the system will accept before incoming calls will be rejected.
- You need to define the charge for each session. For example, “call admission pppoe 10 1” specifies the charge for a single PPPoE session:
10 = session charge, 1 = session lifetime
- Approximate CPS = (Limit) / (Session Charge * (Session Lifetime + 1))
- cpu-limit of X means CAC will drop incoming calls when the measured 5-second CPU utilization is X% or higher
You mentioned that you would like to have 12000 sessions to establish in 30 minutes. This would be a rate of aprox. 6.6 sessions per second (let's say 7). Based on the above, you would need to configure:
call admission new-model
call admission pppoe 10 1
call admission limit 140
call admission cpu-limit 60
The above will allow 7 CPS (will drop new calls above that limit) and will drop new calls if the CPU load is above 60.
I think 7 CPS is a reasonable rate for C10K (I do not have any official information to support that, though).
Perhaps you can start exploring this with a config similar to the above.
Best regards.
04-30-2013 12:45 AM
Hi Olivier,
I'm not sure what may be happening with CoPP there. However, in order to protect the device from high CPU during periods of time where large amounts of sessions are trying to established, I would rather recommend to use call admission control (CAC). With CAC, the device is able to reject new calls when reaching certain thresholds so it can protect itself during stress circumstances. You can check more on CAC in the following link:
http://www.cisco.com/en/US/docs/routers/asr1000/configuration/guide/chassis/scaling.html#wp1119410
Please note that the link is for ASR1k but, as aggregation devices, the CAC principle applies also for C10K.
From Cisco side, we always recommend to implement CAC in broadband aggregation devices as a best practice.
Perhaps you want to explore CAC as an option to implement what you are trying to achieve here.
I hope it helps.
Best regards
Best regards.
04-30-2013 03:03 AM
Thanks for the update Manuel, interesting topic. I will probably try this, but i've been looking for some more verbose documentation about this feature (CAC) for pppoe sessions, and I do need more info before configuring it.
This is not clear to me the calculation behind the call admission limit value. How exactly this is calculated with call admission pppoe setting on a c10k?
let's say we have 12000 users to connect in 30 minutes. what would be the settings for limit and pppoe charge then?
thanks
04-30-2013 05:03 AM
Hi Olivier,
Indeed, I was also looking at some documentation in Cisco.com but couldn't find any document describing properly. I had some links stored in my bookmarks but they doesn't seem to be available anymore in the website.
In any case, to explain the configuration of CAC, this is basically what you need to know (when using call admission new-model):
- The call rejection is based on either, cpu-limit or charge limit, whichever is exceeded first.
- The “limit” command specifies the total session charge the system will accept before incoming calls will be rejected.
- You need to define the charge for each session. For example, “call admission pppoe 10 1” specifies the charge for a single PPPoE session:
10 = session charge, 1 = session lifetime
- Approximate CPS = (Limit) / (Session Charge * (Session Lifetime + 1))
- cpu-limit of X means CAC will drop incoming calls when the measured 5-second CPU utilization is X% or higher
You mentioned that you would like to have 12000 sessions to establish in 30 minutes. This would be a rate of aprox. 6.6 sessions per second (let's say 7). Based on the above, you would need to configure:
call admission new-model
call admission pppoe 10 1
call admission limit 140
call admission cpu-limit 60
The above will allow 7 CPS (will drop new calls above that limit) and will drop new calls if the CPU load is above 60.
I think 7 CPS is a reasonable rate for C10K (I do not have any official information to support that, though).
Perhaps you can start exploring this with a config similar to the above.
Best regards.
04-30-2013 05:15 AM
thanks Manuel, you rock!!
I will try this and fine tune then.
I see also from scarce documentation that there is a also a "call admission load" parameter, but from this doc, Cisco is mentionning not to tune the default setting without Cisco advice. I'll keep it untouched then.
Also I guess our c10k is using a PRE4 (from the back of my mind). Is this PRE more able to sustain a connection rate higher than 7 per second?
have fun
olivier
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide