Machine Congestion Level (MCL) Alarm on PGW2200

Muthurani Lavanya Paneerselvam · ‎02-09-2011

Introduction
PGW2200 Congestion Alarm

Introduction

This document provides Information about the Machine Congestion level (MCL) Alarms Observed on the PGW2200 Soft Switch with the SS7 PSTN and IP Interconnect. This covers the MCL alarms observed on Platform SunNetra 240 & SunNetra 440 with the PGW2200 MGC Version 9.7.

PGW2200 Congestion Alarm

What is MCL ?

Cisco PGW 2200 Softswitch maintains an internal measurement of its own current congestion level,which is referred to as Machine Congestion Level (MCL). The system dynamically alters its behavior based on the current MCL, in order to process the maximum number of calls while under overload conditions.

Mcl equals 0, 1, 2, 3, which has these associated meanings:

· MC0     No congestion
· MC1     Mild congestion
· MC2     Moderate congestion
· MC3     Severe congestion

MCL and Call Rejection on PGW2200

MCL level algorithms on the PGW 2200 are taken into account with various factors such as CPU utilization, Memory usage and Calls per second (CPS). If any of these factors reach a predefined threshold MCL is triggered by the PGW 2200 to limit the new call rate until the congestion drops to a pre−specified level. This inturn leads to Call rejection in the PGW2200 and hence there is a Call drop in the live traffic.There are three MCL call reject levels implemented on the PGW2200.

The MCLs are determined by the measurement of five threshold values, as listed below:

Call rate (callrate)—Measures the number of incoming call attempts per second.
CPU utilization (cpu)—Measures the percentage of CPU utilization.
Engine input queue length (queuelen)—Measures the number of messages waiting in the call engine input queue.
Memory address utilization (memoryaddress)—Measures the percentage of how much physical memory address space is in use.
Virtual memory address utilization (virtualmemory)—Measures the percentage of how much virtual memory address space is in use.

In a Network Scenario where PGW 2200 handles SS7 and H.323 calls, if the Calls per second (CPS) exceeds the predefined limits then the PGW enters in to the Machine Congestion Level (MCL) depending upon the load.

In order to capture MCL details during the congestion capture the platform.log and use the MML command rtrv−ne−health. The best way is to capture SS7/H.323 snooper information in combination with the MML command rtrv−ne−health, which gives details on current in progress calls and call attempts. In addition, this command provides information on CPU load.

See this sample output taken with MCL 0.

mgc-002 mml> rtrv-ne-health:

   MGC-02 - Media Gateway Controller 2010-10-30 16:25:56.104 CET
M COMPLD
   "Platform State:ACTIVE"
   "Machine Congestion Level = MCL 0 (No Congestion), Reason: not
applicable"
   "Current in progress calls = 2102, half calls = 0, full calls = 2102,
call attempts = 13 cps"
   "CPU 0 Utilization = 3 % CPU 1 Utilization = 5 %"
   "Memory (KB): 1261456 Free virtual, 8391896 Total virtual, 4194304
Total real"
   "Filesystem            kbytes    used   avail capacity Mounted on"
   "/dev/md/dsk/d3       1021735   55713 904718     6%    /"
   "/dev/md/dsk/d12      63326268 12364969 50328037    20%    /opt"

MCL Alarms on PGW2200

Alarm Details:

> rtrv-alms::cont

"ENG-01: 2010-10-30 18:30:33.837 EST,ALM=\"OverloadMedium\",SEV=MJ"

This alarm is normally seen when the platform is in a congestion state. Here is the exact description from the alarm guide.

OverloadMedium
Description: Overload medium condition exists.
Cause The engine has reached an overload condition because it has too many protocol messages. The condition will degrade performance and should be corrected as soon as possible.

Check the alarm log and platform logs corresponding to time of alarm

/opt/CiscoMGC/var/log/platform_<date>.log
/opt/CiscoMGC/var/spool/alm_<date>.csv

From the Platform_<date>.log file you get the following details,

PGW 2200 MCL Congestion with Various levels from Mcl 0 to Mcl 3
Overload with the CPS and Active Call details

Reasons for MCL Congestion

The MCL Congestion and Call failure on PGW2200 occurs because of the various factors,

High Traffic on SS7/H.323 calls above the PGW 2200 Platform CPS (Calls Per Second) capacity. If the Platform CPS capacity is 45, then try to maintain the CPS below the entitled during the Peak hours to avoid overload and congestion.
Recent changes on the Platform / MGC Application.
Sudden rise in network Traffic.
Provisioning of more SS7/H.323 interconnects in own network leads to high message flow and exchange of Messages/Events with the far end network. This results in Congestion (MCL).
Far end operator Network congestion inturn hit the PGW2200 with high no. of Messages/Events.
Any system related activities which increases the CPU Utilization and Memory Utilization.

How to Avoid Congestion ?

Monitor the SS7 / H.323 / SIP Percentage of Calls in your Network.
Monitor the CPS on the Network during the Peak hours.
Capacity Planning to be done based on the CPS capacity while increasing the SS7 / H.323/ SIP Interconnect/Trunks.
Maintain the CPU Utilization within limits.
If the Platform CPS exceeds the Capacity then Upgrade the Platform.