cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
17778
Views
13
Helpful
25
Comments
reswaran
Cisco Employee
Cisco Employee

We are pleased to release Cisco UCS Manager Plugin 0.9.4 for Nagios

Nagios is an open source computer system monitoring, network monitoring and infrastructure monitoring software application. Nagios offers monitoring and alerting services for servers, switches, applications, and services.


The solution provides end-user with two components.

  • The first is the Nagios monitoring plugin script (cisco_ucs_nagios) which will provide end-user with the capability of monitoring the components like blade servers, rack servers, fabric interconnects, chassis, IO Modules, fabric extenders in one or more UCS domains.
  • The second is an add-on to the Nagios, which will provide end-user with the capability to auto discover UCS domain and create the host definitions in Nagios . It also creates the service definitions for the services defined in the configuration file. By default, the addon is shipped with basic service definitions for each UCS component (aka host in Nagios) which use the cisco_ucs_nagios script to monitor their health. Refer to the user guide section 5.1 for the list of services defined by the addon. For components/hosts which are not healthy, it gives the associated faults and their details. For components/hosts which are healthy, it gives the inventory details.


Supported Nagios Versions:

This plugin is supported on Nagios Core version 3.2 and higher versions.


Supported Cisco UCS Manager Releases:

This plugin is supported on Cisco UCS Manager Releases 2.1, 2.2 and 3.0.


Software Requirements for Release 0.9(4):

This release of the plugin requires Python SDK 0.8.3 to communicate with UCS Manager. Click here to download Python SDK Release 0.8(3) for UCS Manager.


New Features in Release 0.9(4):

Plugin Enhancements:

  • Capabilities to generate performance statistics of different UCS components.
  • Check and display faults based on power state of the UCS servers.
  • "onlyFaults" flag has been changed to "faultDetails", this flag shows detailed fault info when run with inHierarchical flag.
  • Short options for parameters "--inHierarchical" i.e. “–R” , "--useSharedSession" i.e. “–S” and "--faultDetails" i.e. “-F” have been added.


Auto-Discovery Add-On Enhancements:

  • IP range can now be provided while discovering UCS domains.
  • Auto-Discovery script gets CLI arguments to provide UCS details.
  • Capabilities for user to create custom services for all classes or DN.
  • Option to keep previously discovered UCS domains when re-running Auto-Discovery.
  • UCS blades and racks are now getting discovered and named using service profile attached to it.
  • Flag to discover only the blades and rack servers which has a service profile associated with it.
  • CLI flag which provides option to disable the default host-group creation.
  • CLI flag which provides option to disable multiple host creation and in-turn create a single domain host.
  • Short options for multiple CLI parameters.


Important Note regarding Backward Compatibility:
If you are using the add-on and upgrading from an older release of plugin to release 0.9.4 , you must re-discover all the domains using the new add-on to create new service definition files as the 'onlyFaults' flag in 'cisco_ucs_nagios' script is changed to 'faultDetails'.


Upgrading from Release 0.9.2 to Release 0.9.3 or above:

  • Run the attached migrate.py to migrate from 0.9.2 release structure to the new structure.
  • Follow the installation procedure mention in the user guide to upgrade to release 0.9.3 or above.


New Features in Release 0.9(3):

  • Support for UCS Mini and UCS Manager Release 3.0(1)
  • Installer script for easy installation and uninstallation of the solution (plugin and add-on).

New Features in Release 0.9(2):

  • Multiple services for the same UCS domain can share a single session by using --useSharedSession option in CLI.
  • Better scale numbers: Upto 600 services per UCS domain
  • Optimized queries for faster response from UCS domain.
  • Better error handling and reporting in debug mode.
  • Updated wildcard filter for filtering class based services.

Refer to the attached user guide for installation/upgrade and usage details.

For any queries/feedback on Cisco UCSM Plugin for Nagios 3.x, please add a discussion to the Cisco Developed Integrations sub-space on Cisco UCS Communities.

Comments
arusrini
Cisco Employee
Cisco Employee

Louis,

This appears to be a standalone rack server. There are two issues - the plugin here is for UCSM managed servers. For standalone racks, you need the following

Cisco IMC Plugin for Nagios (version 0.9.2)

Even then, the min firmware it works with, is 1.5.x.

--Arun S

samducksworth1
Community Member

Is there anyway to filter out alerts such as this?

WARNING - sys/chassis-1/blade-4/board/memarray-1/mem-1-DIMM A0 on server 1/4 has an invalid FRU

==== Fault # 1 ====

Dn : sys/chassis-1/blade-4/board/memarray-1/mem-1/fault-F0502

Descr : DIMM A0 on server 1/4 has an invalid FRU

severity : warning

Cause : identity-unestablishable

Type : equipment

Created : 2013-11-01T16:56:53.872

I need to do this temporarily until we can update the inventory catalog.

--samd

pratripa
Level 1
Level 1

Hi Sam,

You need to update the variable SKIP_FAULT_LIST in the configuration file "cisco_ucs_nagios.cfg". You will find this file at the same location where your cisco ucs plugin is installed.

For your case you can add the 'Code' attribute to this list as Code:F0502 , like

SKIP_FAULT_LIST=Lc:suppressed,Type:fsm,Severity:info,Severity:condition,Code:F0502

This will filter out all the faults with fault code F0502

This is also mentioned in section 6.3 of user guide with heading as "Skipping Faults".

Regards

- Prateek

samducksworth1
Community Member

Thanks that fixed it! --samd

slowswitch
Level 1
Level 1

Hi,

I get this error running the script.

It is with

python 2.7

Nagios core 3.2.3

Redhat 5.5

#######

Traceback (most recent call last):

  File "./installer.py", line 467, in <module>

extracted_name = name_list[0] + "_" + name_list[1] + "_" + name_list[2]

IndexError: list index out of range

  [root@opsviewprim10 dave]#


##########


thanks

prvashis
Cisco Employee
Cisco Employee

Hello Ian,

It seems to be an issue with the directory from where you are running the installer script.

Can you please provide the listing of your directory from where you are executing the script.

The directory should be similar to below output.

[root@nagios-centos cisco-ucs-nagios-0.9.3]# ls -lrt

total 92

-rwxr-xr-x 1 root root 32379 Jan 29 15:01 installer.py

-rw-r--r-- 1 root root  3456 Jan 29 15:01 INSTALL

-rw-r--r-- 1 root root 49473 Jan 29 15:01 cisco-ucs-nagios-0.9.3.tar.gz

We need to execute the "installer.py" from this directory only not from outside. And the naming convention of the tar.gz present in the directory should not be changed.

thanks

FraserCampbell
Community Member

Thanks for this plugin, it is working quite well.  A couple of questions though ...


Is there a way to query multiple classes at a time?  I'd like to set up a service such as ucs_Servers - it doesn't matter to us if the server is Blade or Rack and we don't want to monitor every single server as an independent unit. I've tried many things but haven't come up with a way to query both in a single plugin call.  For example this would be nice:

    cisco_ucs_nagios -u user -p pass -H ucs-host -t class -q ComputeBlade,ComputeRackUnit --inHierarchical   --onlyFaults

Above doesn't work of course.  I have to set up 1 service for each type of server (ucs_Blades and ucs_Rack).

Also, in the plugin output there seems to almost always be many lines of inventory-type information, for example:

sys/rack-unit-14:OK - Model : UCSC-C240-M3L,Name : xxx


When there are over 30 servers in a UCS domain this becomes a bit verbose (especially when trying to pick out the single CRITICAL) that might be mixed in the middle.  Is there any way to suppress output other than an overall OK, or the specific WARNINGs and CRITICALs?

Thanks,

Fraser

prvashis
Cisco Employee
Cisco Employee

Hello Fraser,

Thanks for using the plugin and providing the valuable suggestions.

Query 1 : Using multiple classes at a time?

     The plugin has been designed, taking into consideration that it will work one class at a time. So this is not possible in the current plugin and hence we need to create different services for different classes.

     We will take a look on use cases where this multiple class thing can be useful.


Query 2: Is there any way to suppress output other than an overall OK, or the specific WARNINGs and CRITICALs?

     The plugin gives out the inventory information if there are no faults on the object. This inventory information is controlled by "Inv_<class_name>" entry in the plugin CFG file "cisco_ucs_nagios.cfg". If for a specific class this entry is not there then we display out all attributes for that class.

     If you don't want lengthy Inventory information then you can limit the attributes by providing an entry in the configuration file with minimal class attributes.

To SKIP specific WARNINGS and CRITICAL faults , we have provided  "SKIP_FAULT_LIST" in the plugin CFG file.

You need to mention an attribute of "FaultInst" class in the below format:


          SKIP_FAULT_LIST=<Class_Attribute>:<Attribute_Value>

          SKIP_FAULT_LIST=Lc:suppressed,Type:fsm,Severity:info,Severity:condition

Your suggestion, that the Warnings and Critical entries get mixed in between the "OK" entries if we have lot of objects for a specific class. We will surely take it up and look for how a better view can be given for the same.

Regards,

Prateek

FraserCampbell
Community Member

Thanks Prateek.

I think another option to accomplish some of what I want would be if the plugin accepted alternate config files.  I could then query everything but use CLASS_FILTER_LIST in a check-specific config file to control what specific hardware my check returns (i.e. ComputeBlade,ComputeRackUnit for "servers" check).


Regards,

Fraser


saffarah.mendez
Level 1
Level 1

Hello,

I am hoping that someone can help me with this.

I am using the plugin in icinga. But I got this output:

Error is of Type : URLError Message >> <urlopen error [Errno 13] Permission denied> Error while trying to run the UCS Nagios monitoring service. Check for Nagios logs as it may help finding error details. Exception: ------------------------------------------------------------ Traceback (most recent call last): File "/usr/lib64/nagios/plugins/cisco_ucs_nagios", line 1221, in args_object.proxy) File "/usr/lib/python2.7/site-packages/UcsSdk/UcsHandle.py", line 362, in Login response = self.AaaLogin(username, password, dumpXml) File "/usr/lib/python2.7/site-packages/UcsSdk/UcsHandle.py", line 2373, in AaaLogin response = self.XmlQuery(method, WriteXmlOption.Dirty, dumpXml) File "/usr/lib/python2.7/site-packages/UcsSdk/UcsHandle.py", line 223, in XmlQuery f = opener.open(req) File "/usr/lib64/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/usr/lib64/python2.7/urllib2.py", line 449, in _open '_open', req) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open raise URLError(err) URLError: 

 

I used the v8.4 ucs python sdk hoping it would still work.

Thank you in advance!

 

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Quick Links