This resolved my issue.

cgeorgia4 · ‎07-24-2014

Hello,

We have IM&P pub and sub on 9.1.1-41900 and we have just moved the machines from physical hardware to a virtual environment.

Everything went fine apart from one issue. When the users are assigned to the primary node the "on a call" presence status does not work. When we move the user to the seconday server it works fine.

After a thourogh analysis I see that the sip publish leaves the CUCM to the primary node (sip proxy) and then sip proxy server tries to send it to the presence engine on port 5070 on the same server (primary presence) but it cannot establish tcp connection. It turns out we cannot telnet on port 5070 on that server but we can on the secondary (that's why the presence "on a call" works when the user is assigned to the secodary)

The server does not listen on that port even after a full server restart, or even a service restart. The service appears is running on both gui and cli. The service also listens to port 6603 and all other ports on 66XX range which means that it has connection to the presence datastore.

The sip proxy logs show this

17:55:46.207 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(1053) Creating connection with 10.204.65.9:5070, connid 4795 sock_fd 31

17:55:46.207 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(2562) setting timer for 10000 ms on connection connid: 4795, sock_fd 31

17:55:46.207 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(3944) sip_tcp received auth state as: 0 for connid: 4795 sockfd 31 flags 0 from sip_sm

17:55:46.207 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp: epoll event error on connected socket with connid 4795, sock_fd 31 remote_addr 10.204.65.9:5070, State Connect pending flags 0, 2 No such file or directory

17:55:46.207 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(1084) sip_tcp : Hard close/destroy of tcp connid 4795 sock_fd 31 flags 0

17:55:46.207 |Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(964) Freeing connection with connid 4795, sock_fd 31 remote_addr 10.204.65.9:5070, State Connect pending flags 0

17:55:46.208 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(928) sip_tcp : close() sock_fd 31:34

17:55:46.208 |[Tue Jul 22 17:55:46 2014] PID(3306) sip_tcp.c(778) sip_tcp is now sending failure pdu connid 4795, sock_fd 0 1 msgs

As I said restarting the server or the service does not help. So far we have all the users on the secondary node but we need to bring the primary up

The presence engine logs do not show any noticeable error.

Can somebody from cisco tell me if this is a know bug and if there is any recovery procedure ?

Edit: I also see this in the logs which might be relevant

"PE is currently disabled, will not process CN's until it is re-enabled"

Regards

cgeorgia4 · ‎08-05-2014

For the record the issue as due to this error when presence engine is starting

--------------------------------------------------------------------------------------------

11:38:43.223 |system.tls.config 1075611 WARNING Error loading private key from file: error code: 185073780 in x509_cmp.c line 406.

11:38:43.223 |debug.oam.fault.faultservice 1075611 INFO faultnotification: constructing notification : PETlsConfigError

11:38:43.223 |system.oam.faults 1075611 DEBUG CCMFaultModule::notify: Alarm name = PETlsConfigError

11:38:43.223 |system.oam.faults 1075611 DEBUG CCMFaultModule::notify: param 1 = TlsErrorMessage : Error loading private key from file: error code: 185073780 in x509_cmp.c line 406.

11:38:43.223 |GenAlarm: AlarmName = UNKNOWN_ALARM:PETlsConfigError, subFac = KeyParam = , severity = 3, AlarmMsg = TlsErrorMessage : Error loading private key from file: error code: 185073780 in x509_cmp.c line 406.

AppID : Cisco Presence Engine

ClusterID : StandAloneCluster36f38

NodeID : dc10pvups01

11:38:43.223 |createFile: fileName = GEN_ALARM_MAPFILE.8001000 _ElemTableSize =500 totalsize = 698016

11:38:43.224 |GenAlarm: Push_back offset 1 seq 1

11:38:43.224 |UNKNOWN_ALARM:PETlsConfigError - TlsErrorMessage:Error loading private key from file: error code: 185073780 in x509_cmp.c line 406.

App ID:Cisco Presence Engine Cluster ID:StandAloneCluster36f38 Node ID:dc10pvups01

11:38:43.224 |system.oam.faults 1075611 DEBUG CCMFaultModule::notify: retval= 0

11:38:43.224 |debug.oam.fault.faultservice 1075611 INFO faultnotification: destructing notification : PETlsConfigError

----------------------------------------------------------------------------

I regenerated the cup and the ipsec certificate, reloaded the presence server and the issue is fixed

snicklas · ‎01-14-2015

This resolved my issue. Thank you

IM&P Presence engine problems