cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
597
Views
0
Helpful
0
Comments
Sourav Jyoti Das
Cisco Employee
Cisco Employee

Introduction:

Recently one D9036 encoder (v1.10.30) was hit by a Software Error alarm which was related to AlarmCollector process crash. In this article we will discuss about this alarm, its behavior and the way out to resolve this alarm.

 

Problem:

While investigating the alarm file, we saw too many “Software Error / mvi01-vbi:AlarmCollector” alarms in the alarm history. It was looking like the AlarmCollector process was crashing multiple times and coming back again. Once came back the alarm was cleared and again crashed.

It was confirmed from customer that there was no such service degradation (in terms of input and output functions) due to this frequent alarm.

 

Problem Analysis:

We were worried about too many crashes of the AlarmCollector process; however Cisco investigated it and came with below observations:

 

-> There is a MON process that tracks running processes and reports an alarm.

-> Usually if a process crashes, it should restart automatically. However AlarmCollector process did not restart on crash as it is not standalone type of process i.e. it has dependencies on other processes that would need to be handled if process is restarted.

-> MON process has a bug where one of its processing paths makes incorrect determination of existence of the process when one or more of processes is/are down, resulting in incorrect clear of the alarm and alarm asserted again in the next MON query.

 

 

Solution/Workaround:

-> As confirmed by Cisco Engineering, they are looking to correct the MON bug in release v2.3

-> The cause of the AlarmCollector crash was not determined since there was no information available on the logs related to the crash (the information’s were overwritten by another alarms) and also Engineering was unable to reproduce it. However after a scheduled reboot of the encoder the alarm cleared and didn’t come back again.

-> Below the link to have more clarification about memory leak which in turn cause of various software error and we should suggest customer to upgrade encoder version incase they don't want frequent reboot.

http://www.cisco.com/c/dam/en/us/td/docs/video/headend/Digital_Encoders/D9036/Technical-Reference/Op...

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Quick Links