06-23-2010 08:20 PM - edited 03-12-2019 09:28 AM
Disclaimer: This document is written with CUCM 7.X in mind, but will be applicable to all CUCM versions with small amounts of tweaking.
Follow this excellent document for turning on CUCM traces. Pay particular attention to the H.225 and H.245 check boxes.
Get the calling party number, called party number, and time of the call.
Follow these instructions to download CallManager traces from all nodes in the cluster during the time of the call.
I prefer Notepad++, but any text editor with highlighting or searching will work. If you want to run unix2dos on the trace files beforehand you could even use the standard Windows Notepad. Downloading a tool like TripleComboTool or TranslatorX is basically cheating, but can be useful.
This is where all of the fun comes in. We now have folders full of trace files. You can download the traces I used for this example in the attached file H323Trace.zip. Somewhere in the potentially Gigabytes of files we collected is our call. Here's how to find it then track it.
I like to use a tool like WinGrep, or even Linux grep to just get an idea of which trace files to look at.
My example call has the following details:
Calling Party - 7021004
Called Party - 8011000
Calling Time (Taken from Placed Calls phone menu) - 11:45
The seach queries would be things like
cn="7021004
dd="8011000
These two strings will find the Digit Analysis line in CUCM. cn stands for Calling Number. dd stands for Dialed Digits. Let's take a look at using grep to find the trace file we're interested in.
jasburns@jasburns-gentoo ~/trace/H323Trace $ grep -REi --include "ccm*.txt" "dd=\"8011000" *
cucm7-sub1/2010-06-24_11-47-16/cm/trace/ccm/sdi/ccm00000002.txt:06/24/2010 11:45:32.095
CCM|Digit analysis: match(pi="2", fqcn="7021004", cn="7021004",plv="5", pss="",
TodFilteredPss="", dd="8011000",dac="0")
The grep command used the flags
-R recursive
-E extended regular expressions (in case we want to get fancy with regex)
-i case insensitive search (in case we want to get lazy)
--include to search through only files that had names like ccm*.txt
If we only knew the calling number we could change our search string accordingly. If we received no resutls we could remove the cn or dd portion and just search for the number. If searching for the 7 digit number didn't work we could search for just the last 4 digits until we found our call.
Based on the search we see that our file is located in
cucm7-sub1/2010-06-24_11-47-16/cm/trace/ccm/sdi/ccm00000002.txt
We also see some interesting things about the digit analysis line:
pss="" and TodFilteredPss="" means that the calling phone has a Calling Search Space set to <None>. This value is usually the ordered list of partition inside the calling party's CSS.
Let's open the trace file ccm*02.txt in our text editor and look at this line above.
If we track back up just a few lines we can see the SCCP phone that made the call
06/24/2010 11:45:32.089 CCM|StationInit: (0000003) SoftKeyEvent softKeyEvent=1(Redial)
The TCP Handle of that particular IP Phone is 0000003. That indicates this phone was the 3rd one to register since starting the CCM process on this node. We could do a grep for that particular TCP handle to get all SCCP messages sent to and from the phone.
StationInit - The phone sent this message to CUCM
StationD - The CUCM sent this message to the phone
Let's use Notepad++ to highlight this in the trace. Highlight the handle, right click, select "Using 1st Style". Now this will be light blue anywhere in the trace file.
Each call leg has a CallID. This is a unique identifier for that leg of the call. It's commonly referred to as a CI.
Each call also has a cdcc process. This is primary call control process for the call.
Each called party has a process associated with it. This is where CUCM is going to send the call.
We can learn all of these in the few lines after the Digit Analysis Block
06/24/2010 11:45:32.095 CCM|Digit analysis: insert daResEntry to daResCache.
KeyCi=42514739 ,PID:Cdcc(2,174,4)
Here we learn the Ci for the call 42514739, as well as the cdcc(2,174,4). It's helpful to highlight these in the trace as well.
Through the dmpidreq and dmpidres (Request and Response) we can get the Process ID (pid) of the party we're going to extend the call to:
06/24/2010 11:45:32.096 CCM|Digit analysis: wait_DmPidRes- Partition=[] Pattern=[801XXXX]
Where=[], cmDeviceType=[AccessDevice], OutsideDialtone =[1], DeviceOverride=[0],
PID=RouteListControl(1,100,61,2)
We see that the Route Pattern I matched was 801XXXX, and that this pattern points to RouteListControl. The Process ID for this is (1,100,61,2).
The Route List Control process exists on Node 1 (the publisher) and we're currently on the subscriber. It exists inside the CUCM process (100). This means the subscriber will now have to send a message to Route List Control on the publisher.
I typically match Node IDs to server names by looking at the SDL trace files. For example, here we can see that cucm7-sub1 is Node 2 (SDL002_*.txt)
Since we know the signal was sent to Node 1, and Node 2 is the subscriber, we can search the SDL trace folders for Node 1. Node 1 is always the publisher server (but your publisher might not always be Node1 based on your CCM version and whether or not you've activated and deactivated servces).
Let's open the CCM trace on the publisher at the time in question, 11:45:32.096
cucm7-pub\2010-06-24_11-47-15\cm\trace\ccm\sdi\ccm00000002.txt
06/24/2010 11:45:32.100 CCM|RouteListControl::idle_CcSetupReq - RouteList(ICT_RL)
Here is the inbound request from the sub to the pub. We can see the call is going to a Route List named "ICT_RL".
The Route List parses through the Route Groups, and you can see we pick a member of the RG in the following trace
06/24/2010 11:45:32.101 CCM|SMDMSharedData::findLocalDevice - Name=ICTto801
Key=005cee5b-ef72-4919-4855-5983ba8b23f2 isActvie=1 Pid=(1,153,7) found
Lucky for us the PID of this device inside the Route Group is also on node 1. If we scroll down a bit in the traces we should see the outbound H.323 call over this InterClusterTrunk.
Here we see the process created for this H.225 outbound session. The CUCM is trying to establish an outbound TCP connection.
06/24/2010 11:45:32.188 CCM|H225D::restart0_TcpConnectionInfo: H225Cdpc(1,100,154,3)
Next we'll see the actal H.323 outbound setup:
06/24/2010 11:45:32.193 CCM|SPROCRas - {
h323-uu-pdu
{
h323-message-body setup :
{
protocolIdentifier { 0 0 8 2250 0 5 },
sourceAddress
{
dialedDigits : "7021004",
h323-ID : {"7021004", {0, 0, 0, 0}, ...}
},
sourceInfo
{
vendor
{
vendor
{
t35CountryCode 181,
t35Extension 0,
manufacturerCode 18
},
productId '436973636F43616C6C4D616E61676572'H,
versionId '31'H
},
terminal
{
},
mc FALSE,
undefinedNode FALSE
},
destinationAddress
{
dialedDigits : "8011000"
},
activeMC FALSE,
conferenceID '807B41849C7D31C2030003010E302CCF'H,
conferenceGoal create : NULL,
callType pointToPoint : NULL,
sourceCallSignalAddress ipAddress :
{
ip '0E302C15'H,
port 1720
},
|<CLID::StandAloneCluster><NID::CUCM7-PUB><LVL::State Transition><MASK::0100>
06/24/2010 11:45:32.193 CCM|callIdentifier
{
guid '807B41849C7D31C2030003010E302CCF'H
The most important part of this message for tracking the rest of the call is the guid '807B41849C7D31C2030003010E302CCF'H. This is an identifier unique to the call. We can use grep or wingrep now to search on this guid. We can find out how many traces this guid appears in and then open all of these traces in our editor of choice.
Along with the exploded H.225 message body there is also a compact printout of the H.225 message:
11:45:32.193 CCM|Out Message -- H225SetupMsg -- Protocol= H225Protocol
11:45:32.193 CCM|Ie - H225BearerCapabilityIe IEData= 04 03 80 90 A2
11:45:32.193 CCM|Ie - H225CallingPartyIe IEData= 6C 09 00 81 37 30 32 31 30 30 34
11:45:32.193 CCM|Ie - Q931CalledPartyIe IEData= 70 08 80 38 30 31 31 30 30 30
11:45:32.194 CCM|IsdnMsgData2= 08 02 00 03 05 04 03 80 90 A2 6C 09 00 81 37 30 32
11:45:32.212 CCM|In Message -- H225CallProceedingMsg -- Protocol= H225Protocol
11:45:32.212 CCM|IsdnMsgData1= 08 02 80 03 02 7E 00 55 05 21 80 06 00 08 91 4A 00
This gives us an extremely succinct way to track all of the messages in a call. We can see the first message is an Outbound Setup and it contains the ASCII values of the called and calling numbers.
Calling 37 30 32 31 30 30 34
Called 38 30 31 31 30 30 30
Since these are in ASCII and they're digits all you need to do to get he numbers is just remove the leading 3 from each group of numbers. This is very handy for double checking which number gets sent to the far end H.323 device.
The second message is a Inbound Proceeding message.
We tie these messages together based on the ISDN identifier, which starts at the third octet.
The identifier portion is 0 03. The first character indicates direction. 0 stands for the outbound direction in this case (Outbound SETUP was 00 03). The inbound direction will be outbound + 8 (hex), or 8 in this case (Inbound CallProceeding was 80 03).
Messages like Setup, Proceeding, Alerting, Connect, and Release Complete will be exchanged over the H.225 protocol. These messages are for call control. There is a completely different protocol called H.245 that is used to negotiate the IP addresses, UDP Port numbers, and codec that will be used for the media streams of the call.
In either the Alerting, or Connect message the called endpoint will put in a section called H.245 address. This port triggers the calling party to setup a new TCP session to the called party for the purposes of exchanging H.245 messages.
I used Notepad++ to search for the guid in all trace files, then browsed through all of the H.225 messages until I find the one with the port:
Here you can see that the H.245 port is 58820 and it comes in the Connect message at 11:45:34 (when the called party answered). I've highlighted this port as it is crucial to our next step.
Now that we have the H.245 Port we can look for the process identifier that will allow us to find all H.245 messages for this call.
Note
This procedure below only applies to "Slow Start" calls. I will document "Fast Start" at another point in time. |
If the H.245 port comes on an Inbound H.225 message, search down in the traces for the port number. We have to do post processing to create the H.245 process.
If the H.245 port is sent on an Outbound H.225 message, search up in the traces for the port number. We have already done the processing to make the H.245 process and THEN we send out the message with the port number.
This is an Inbound H.225 message in our example, so we will search down for the H.245 port number until we see a line that looks like this:
06/24/2010 11:45:34.167 CCM|H245Interface(3)::start_Transition, (H245Client session)
ip = (14.48.44.80), port = 58820, TA provided by Callee
In this instance we see that the H.245 interface created has a process ID of 3 H245Interface(3). All H.245 message for this call will be exchanged on that process. Search down until you see a message like the following to get the full process ID:
06/24/2010 11:45:34.181 CCM|H245ASN - TtPid=(1,100,16,3) -Outgoing -value
MultimediaSystemControlMessage ::= request : terminalCapabilitySet
This is an Outbound TCS. The identifier that we'll use as our future search string is TtPid=(1,100,16,3). Go ahead and make this some other interesting color.
Use Notepad++ "Find in all Open Documents" (or similar search in your text editor) to get the full H.245 session output from the start of the call to the end:
Each side will advertise the supported capabilities in the Terminal Capability Set (TCS) message. One side will initially advertise all capabilities supported. The responding side will respond with the matching supported capabilities.
Here the outbound TCS advertises support for:
{
capabilityTableEntryNumber 3,
capability receiveAudioCapability : g711Ulaw64k : 40
},
{
capabilityTableEntryNumber 4,
capability receiveAudioCapability : g711Alaw64k : 40
},
{
capabilityTableEntryNumber 5,
capability receiveAudioCapability : g729wAnnexB : 6
},
{
capabilityTableEntryNumber 6,
capability receiveAudioCapability : g729AnnexAwAnnexB : 6
},
{
capabilityTableEntryNumber 7,
capability receiveAudioCapability : g729 : 6
},
{
capabilityTableEntryNumber 8,
capability receiveAudioCapability : g729AnnexA : 6
},
{
capabilityTableEntryNumber 9,
capability receiveAndTransmitUserInputCapability : dtmf : NULL
G.711U/A @ up to 40 msec packetization (4 data samples per frame with each sample taking 10msec)
G.729/A/B @ 6 data samples per frame (60 msec packetization, since each sample takes 10msec)
Note:
G.711 uses the msec packetization interval between RTP packets in the TCS G.729 uses the number of 10msec data samples per RTP packet inside the TCS
If you understand that each sample is 10msec in length then the conversion between the two is pretty simple.
The most common packetization is 20msec, or 2 voice samples per RTP packet. |
If we look at the capabilities Inbound we see that all of the same capabilities are supported.
When we go back to the subscriber traces, where the calling phone is registered, we can see that Region configurations are setup for G.711 (64kbps as printed in the traces)
06/24/2010 11:45:34.194 CCM|RegionsServer::MatchCapabilities -- kbps=64,
capACount=6, capBCount=8
On the publisher we tell the far end H.323 node we're using G.711 @ 20 msec
06/24/2010 11:45:34.246 CCM|H245ASN - TtPid=(1,100,16,3) -Outgoing -value
MultimediaSystemControlMessage ::= request : openLogicalChannel :
{
forwardLogicalChannelNumber 1,
forwardLogicalChannelParameters
{
dataType audioData : g711Ulaw64k : 20,
We also get an inbound message for G.711 @ 20msec
On the subscriber we can go back to our TCP Handle and see that the SCCP calling phone is told to Open a G.711 audio channel. The phone responds (StationInit) saying it will listen on UDP port 24418.
06/24/2010 11:45:34.255 CCM|StationInit: (0000003) OpenReceiveChannelAck Status=0,
IpAddr=IpAddr.type:0 ipAddr:0x0e302ccf000000000000000000000000(14.48.44.207),
Port=24418, PartyID=33554435
If we go back to the node where the H.245 session is ongoing, we see the following outgoing OpenLogicalChannelAck. Notice that the UDP RTP port number we send out on the H.323 leg is the exact port that the phone responded with in the SCCP ORCAck, 24418.
06/24/2010 11:45:34.257 CCM|H245ASN - TtPid=(1,100,16,3) -Outgoing -value
MultimediaSystemControlMessage ::= response : openLogicalChannelAck :
{
forwardLogicalChannelNumber 1,
forwardMultiplexAckParameters h2250LogicalChannelAckParameters :
{
sessionID 1,
mediaChannel unicastAddress : iPAddress :
{
network '0E302CCF'H,
tsapIdentifier 24418
},
The incoming OpenLogicalChannelAck says the called party will be listening on 23362
06/24/2010 11:45:34.259 CCM|H245ASN - TtPid=(1,100,16,3) -Incoming -value
MultimediaSystemControlMessage ::= response : openLogicalChannelAck :
{
forwardLogicalChannelNumber 1,
forwardMultiplexAckParameters h2250LogicalChannelAckParameters :
{
sessionID 1,
mediaChannel unicastAddress : iPAddress :
{
network '0E302CCC'H,
tsapIdentifier 23362
},
Going back to the publisher server where the Calling SCCP phone is registered we see CUCM instructing to send RTP to this new IP and port using the previously established codec.
06/24/2010 11:45:34.260 CCM|StationD: (0000003) startMediaTransmission
conferenceID=42514739 passThruPartyID=33554435 remoteIpAddress=IpAddr.type:0
ipAddr:0x0e302ccc000000000000000000000000(14.48.44.204) remotePortNumber=23362
milliSecondPacketSize=20 compressType=4(Media_Payload_G711Ulaw64k)
Further on in the call the Called party presses Hold, Resume, and End Call. You can use all of the previously discussed techniques to track through the behavior of these steps in detail.
Thank you Jason for this nice document.
It is very nice and helpful.
I just suggest to give us some troubleshooting scenarios, So we can see some unmoral situations rather than ideal situations.
Again thank you for your efforts and I am looking forward to see more and more documents.
In some scenarios, the H245Inteface instance number will not match up with TtPid instance number.
Example :
11/25/2009 10:17:58.111 CCM|H245Interface(2272)::start_Transition, networkFlag=7
From the SDL traces on the same node at that time,
018876035| 09/11/25 10:17:58.111| 001| SdlSig | MXInterfacePathEstablished | fastStart | H245Interface(1,100,42,2272) | MediaExchange(1,100,87,22662) | (1,100,11,14030).1-(*:*) | [R:NP - HP: 0, NP: 7, LP: 0, VLP: 0, LZP: 0 DBP: 0]LCN=2 Media Encryption Alogiorithm=0
018876036| 09/11/25 10:17:58.111| 001| Created | | | H245SessionManager(1,100,137,2284)| H245Interface(1,100,42,2272) | | NumOfCurrentInstances: 2
018876037| 09/11/25 10:17:58.111| 001| SdlSig | Start | start | H245SessionManager(1,100,137,2284)| H245SessionManager(1,100,137,2284)| (1,100,137,2284).1-(*:*) | [R:HP - HP: 0, NP: 8, LP: 0, VLP: 0, LZP: 0 DBP: 0]
018876038| 09/11/25 10:17:58.111| 001| Created | | | TranslateAndTransport(1,100,144,2284)| H245SessionManager(1,100,137,2284)| | NumOfCurrentInstances: 2
11/25/2009 10:17:58.126 CCM|H245ASN - TtPid=(1,100,144,2284) -Outgoing -value MultimediaSystemControlMessage ::= request : terminalCapabilitySet :
{
sequenceNumber 1,
protocolIdentifier { 0 0 8 245 0 3 },
multiplexCapability h2250Capability :
{
maximumAudioDelayJitter 60,
receiveMultipointCapability
{
multicastCapability FALSE,
----
As seen above, the H245Interface(1,100,42,2272) created H245SessionManager(1,100,137,2284), which in turn creates TranslateAndTransport(1,100,144,2284). So, in this example,
Instance number of H245Inteface - 2272
Instance number of H245ASN - TtPid - 2284
- Sriram
Another instance in which the H245Inteface instance number is greater than the TtPid instance number :
### translateandtransport(217701) created by H245Interface(217706)
053014438| 2010/10/08 16:40:02.342| 003| Created | | | H245SessionManager(3,100,26,217701)| H245Interface(3,100,143,217706) | | NumOfCurrentInstances: 154
053014439| 2010/10/08 16:40:02.342| 003| SdlSig | Start | start | H245SessionManager(3,100,26,217701)| H245SessionManager(3,100,26,217701)| (3,100,26,217701).1-(*:*) | [R:HP - HP: 0, NP: 11, LP: 0, VLP: 0, LZP: 0 DBP: 0]
053014440| 2010/10/08 16:40:02.342| 003| Created | | | TranslateAndTransport(3,100,15,217701)| H245SessionManager(3,100,26,217701)| | NumOfCurrentInstances: 154
### H245 call leg - TtPid=(3,100,15,217701)
10/08/2010 16:40:02.356 CCM|H245ASN - TtPid=(3,100,15,217701) -Outgoing -value MultimediaSystemControlMessage ::= request : terminalCapabilitySet :
{
sequenceNumber 1,
protocolIdentifier { 0 0 8 245 0 10 },
multiplexCapability h2250Capability :
----
Instance number of H245Inteface - 217706
Instance number of H245ASN - TtPid - 217701
- Sriram
Since there was a request for more trace reading exercises, here I'll post a few others over time.
Let's start with an easy one. Very easy but good.
Fictive Customer Symptom:
==========================
Yesterday evening around 4:30h we received notice that calls to our main reception desk number 555-4911 were failing.
We did some tests and all of a sudden the problem disappeared by itself!
This morning we were able to make calls, and around 9:12am we got notice again that ALL INBOUND calls to that number were failing!!
Both internal and external calls to our Attendant Console number 4911 are failing!! We suspect a severe bug in the attendant console software!
This time it doesn't fix itself so we are currently not reachable!! Please help out and treat this with the highest priority!
I'm uploading call manager and attendant console traces.
Test call info: 1104 calls 4911.
Trace analysis:
===========================
Step 1: Stay Calm :-)
Step 2: Unpack all the traces and don't focus yet on attendant console. Could very well be that the call doesn't even arrive there. So we start with the beginning - ie: where the call starts - the IP Phone with extension 1104.
Step 3: Grep/Search for dd="4911" in the ccm sdi traces. Alternatively you could also search for cn="1104". (dialed digits versus calling number)
Step 4: We found that the call is in ccm00000022.txt. Open the file, and search again for string dd="4911"
10/18/2010 11:26:09.859 CCM|Digit analysis: match(pi="2", fqcn="1104", cn="1104",plv="5", pss="BusinessClosed_TestTOD:internal", TodFilteredPss="BusinessClosed_TestTOD:internal", dd="4911",dac="0")|<CLID::StandAloneCluster><NID::172.16.9.20><CT::1,100,39,1.173788><IP::172.16.4.26><DEV::SEP0025459347FC><LVL::State Transition><MASK::0800>
Things to know here from the digit analysis output:
* pss is the Partition Search Space in an ordered list.
* TodFilteredPss is the Partition Search Space leaving out the partitions that are currently not active based on their time schedule.
The names of the partitions already point out something interesting: BusinessClosed_TestTOD. Seems they have Time of Day routing implemented and were doing some tests with it before.
Then, we see a Digit Analysis print-out. This means that the call has matched something.
10/18/2010 11:26:09.859 CCM|Digit analysis: analysis results|<CLID::StandAloneCluster><NID::172.16.9.20><CT::1,100,39,1.173788><IP::172.16.4.26><DEV::SEP0025459347FC><LVL::State Transition><MASK::0800>
10/18/2010 11:26:09.859 CCM||PretransformCallingPartyNumber=1104
|CallingPartyNumber=1104
|DialingPartition=BusinessClosed_TestTOD
|DialingPattern=4911
|FullyQualifiedCalledPartyNumber=4911
|DialingPatternRegularExpression=(4911)
|DialingWhere=
|PatternType=Translation
|PotentialMatches=ForegoPotentialMatches
|DialingSdlProcessId=(0,0,0)
|PretransformDigitString=4911
|PretransformTagsList=SUBSCRIBER
|PretransformPositionalMatchList=4911
|CollectedDigits=1101
|UnconsumedDigits=
|TagsList=SUBSCRIBER
|PositionalMatchList=4911
|VoiceMailbox=
|VoiceMailCallingSearchSpace=
|VoiceMailPilotNumber=
|RouteBlockFlag=BlockThisPattern
|RouteBlockCause=21
Interesting fields here:
PatternType = Translation. So we are reaching a Translation Pattern in the partition BusinessClosed_TestTOD.
It also appears from this printout that it is configured to "BlockThisPattern" with disconnect cause 21.
|RouteBlockFlag=BlockThisPattern
|RouteBlockCause=21
Whatever block cause 21 means is not important. Most important is that it is configured to be blocked.
Step 5: We go to the customer and show what we see for clarification:
- 1104 is making a call to 4911
- call is hitting a translation pattern with dn 4911 which is in partition BusinessClosed_TestTOD.
- the translation pattern is configured to block the patterns.
- all the calls are not even hitting attendant console, so there's no issue there.
Step 6: Customer says: "oh no… wait a minute… yesterday someone of the IT department was doing some tests with automatic rerouting after business hours… could it be he made a mistake?"
Step 7: Inspection of the Time Periods and schedules reveals that the configuration was done incorrectly. The engineer mistakenly had configured closed hours between 9:00h and 17:00h while that was supposed to be opening hours!
Step 8: Revise the concept "Nothing Changed" ;-)
Homework:
==========================
If you have some time, test this out in your lab and see how traces would look like when ToD enabled partition was not active.
Life changing!!!
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: