09-06-2011 04:31 AM - edited 08-30-2017 10:28 PM
Introduction
Setting up an LMS server to manage your network can be a daunting task. LMS has hundreds of features and options to choose from and sooner or later things will go wrong. Here is a guide to configuring the basic network management tasks on your LMS server. This guide will explain how to archive your device configurations, monitor the network performance and faults and how to troubleshoot some (most common) of the problems you may encounter along the way.
Before we get started, we need to check if the server is set up correctly.
The LMS services use the server hostname to talk to each other. If your name resolution is slow or broken, your LMS server will be slow or broken.
# hostname
# nslookup <server ip address>
# nslookup <server hostname>
Perform the same check for the network devices. If the LMS server cannot perform the forward and reverse name resolution of the devices, strange things will happen.
# nslookup <device ip address>
# nslookup <device hostname>
LMS needs a swap file that is twice the amount of RAM.
# net stop crmdmgtd
# netstat –noab
LMS uses a certificate that has an expiration date, so we need to make sure that the server date is correct.
Check the data
# date /T
# time /T
Problem | Solution |
Users cannot log into LMS | If the date was incorrect during the LMS installation, the certificate may no longer be valid after you correct the date. To resolve this, you can recreate the certificate after correcting the date: # net stop crmdmgtd # cd CSCOpx\MDC\Apache\conf\ssl # del server.* # cd CSCOpx\MDC\Apache\ # perl ConfigSSL.pl -disable # perl ConfigSSL.pl -enable (Enter the certificate data) # net start crmdmgtd |
We first need to add the devices to the LMS device repository before LMS can manage the network. For large scale deployments (100+ devices), you can have the LMS server discover automatically. If fewer devices need to be discovered, you can skip to “Add the devices manually” below.
1. Go to Admin > Network > Discovery Settings > Settings > Configure
2. Click Module Settings: Configure
3. The discovery modules that you need to select here depend on what works best for your network. Here is a description on what the advantages and disadvantages are for each module.
Module | Advantage | Disadvantage |
Address Resolution Protocol (ARP) | -No device side configuration is required. | -Can use a lot of resources on the network devices if the arp tables are big. -Network devices that do not originate traffic (like switches) may be missing from the arp table. |
Border Gateway Protocol (BGP) | -Uses few network and device resources. | -Switches or routers that are not BGP neighbors will not get discovered |
Open Shortest Path First Protocol (OSPF) | -Uses few network and device resources. | -Switches or routers that are not BGP neighbors will not get discovered |
Routing Table | -No device side configuration is required. | -Can use a lot of resources on the network devices if the routing tables are big. - Switches and routers that are not a next hop in the routing table will not get discovered. |
Cisco Discovery Protocol (CDP) | -Uses few network and device resources. | -CDP needs to be enabled on the interfaces. Tip: only enable cdp on interfaces that are directly connected to network devices that you own. Interfaces that are connect to your users or your service provider should not have cdp enabled. |
Ping Sweep on IP Range | -No device side configuration is required. | -Uses a lot of LMS server and network resources if the IP ranges are big. - Can take a lot of time to complete (hours or days on large deployments) - IPS may see the ping sweeps as attacks and can deny the LMS server access to the server. |
Cluster Discovery Module | -Uses few network and device resources. | - Routers and switches that are not cluster members will not get discovered. |
Hot Standby Router Protocol (HSRP) | -Uses few network and device resources. | - Switches and routers that are not part of an HSRP group will not get discovered. |
Link Layer Discovery Protocol (LLDP) | -Uses few network and device resources. | -LLDP needs to be enabled on the devices. |
4. Click Next
5. Click on each discovery module and add and enter at least one IP address of a device that can be used to start the discovery.
6. Check the “Use DCR as Seed List” and “Jump Router Boundaries” boxes
7. Next
8. Select SNMPv2 or SNMPv3
9. Click Add
10. Enter Target: *.*.*.*
11. For SNMPv2, you can find out the read only community string with:
# sh run | i community
12. For SNMPv3, you conf find out the Auth and Privacy Algorithm with:
# sh snmp user
13. Finish
14. Go to Inventory > Device Administration > Discovery > Launch / Summary
15. Click Start Discovery
Refresh the Discovery Summary page. When the discovery status changes from running to finished, click on the “Reachable Devices:” link and check if all the devices have been discovered.
If any devices are still undiscovered after the discover, you can add them manually.
# nslookup <device ip address>
# nslookup <device hostname>
LMS only works correctly when the forward and reverse name resolution of the devices is correct. Add the DNS records and PTR records to the DNS server if they do not match.
Problem | Solution |
LMS reports duplicate devices | -Go to Inventory > Device Administration > Add / Import / Manage Devices -Click on the Export button and export “All Devices” to csv file. -Check in the csv file if the device ip address, hostname or display name has already been assigned to another device. |
Devices are not discovered or are listed as unreachable | - Check and update the snmp credentials in Admin > Network > Discovery Settings > Settings > Configure - Rediscover - If that does not resolve the problem, try adding the devices manually (Step 2.2). |
LMS needs to know the Telnet/SSH and SNMP credentials before it can manage the devices.
# telnet foo.cisco.com
Username:
Password:
foo> enable
foo#
# sh run | i community
For SNMPv3, you conf find out the Auth and Privacy Algorithm with:
# sh snmp user
You will want to make a backup of your hard work at this point in case something goes wrong later on. To save your work:
To restore the device list and credentials when things go wrong, go to Inventory > Device Administration > Add / Import / Manage Devices and click the “Bulk Import” button.
Problem | Solution |
Device Credentials Verification shows failed devices. | -You can use the LMS Packet Capture tool to troubleshoot snmp or login problems.
|
We will now make sure that LMS collects all the hardware, software, serial number, etc data that are required for the Inventory reports.
If the LMS server does not accept the scheduled date, or if you would like to exclude certain devices from the Inventory Collection, you can also configure these jobs manually from the job browser:
We are now ready to run the Inventory reports.
You should be able to see the device and card hardware types, descriptions, serial numbers, etc.
Problem | Solution |
Inventory Collection job fails with “Transport session to device failed.” Error | -Run a Inventory > Job Browsers > Device Credentials Verification report. -Make sure LMS has the correct SNMP read only credentials. |
Inventory Collectionjob fails with generic error. |
ENTITY-MIB::entPhysicalDescr.1 = STRING: 3640 chassis, Hw Serial#: 1234567, Hw Revision: 0x00 ENTITY-MIB::entPhysicalDescr.2 = STRING: 3640 Chassis Slot ENTITY-MIB::entPhysicalSerialNum.1 = STRING: 1234567 ENTITY-MIB::entPhysicalSerialNum.2 = STRING: ENTITY-MIB::entPhysicalName.1 = STRING: 3640 chassis ENTITY-MIB::entPhysicalName.2 = STRING: 3640 Chassis Slot 0 ENTITY-MIB::entPhysicalSoftwareRev.1 = STRING: 1.45 ENTITY-MIB::entPhysicalSoftwareRev.2 = STRING: ...
|
We will now configure LMS to periodically make a backup of the device configurations in case we need to replace a device in the network or someone messes up our device configs.
Note: TFTP requires SNMP write access to the devices, as the TFTP transfer is triggered by an snmpset request, so make sure LMS has the correct SNMP write credentials.
Note: The Periodic Polling polls the CISCO-CONFIG-MAN-MIB to find out if the device configuration changed since the last archive. Periodic Polling only archives the configuration if the device reports that a configuration change took place, so you can use a short polling interval.
Note: The Periodic Collection is a backup in case the Periodic Polling fails, so it can be scheduled at a longer interval.
You should see the device configuration.
Problem | Solution |
Config Archive job shows “Partially Failed Devices” Error. | “Partially failed” means that LMS was able to archive either the startup, the running or the vlan configuration, but not all three.
Note: some configurations like the vlan configuration (vlan.dat) can only be archived using TELNET or SSH, so make sure LMS have the correct TELNET/SSH credentials. |
Config Archive job shows “config Fetch Operation failed for TFTP.” Error. |
# netstat –noab Proto Local Address Foreign Address State PID UDP 0.0.0.0:69 *:* 1252 [crmtftp.exe]
The process that listens to 69/udp should be crmtftp.exe. If another process is listening to the tftp port, uninstall the other tftp application and restart the CWCS tftp service.
# cd CSCOpx\tftpboot # echo > testconfig
# copy startup-config tftp Address or name of remote host []? <enter the LMS server IP address> Destination filename []? testconfig !!! 6677 bytes copied in 0.148 secs (45115 bytes/sec)
|
Config Archive job shows “Failed to establish TELNET connection” error. |
# telnet foo.cisco.com Username: Password: foo> enable foo# show privilege Current privilege level is 15 foo# terminal length 0 foo# terminal width 512 foo# show running-config Building configuration...
|
Note: LMS allows you to monitor up to 100,000 objects (cpu, memory, interfaces, etc.). You probably won’t have 100,000 cpus in your network, but for the interfaces, this limit can easily be reached. Monitor the inOctets, OutOctets, inErrors, OutErrors, on 250 devices with 100 interfaces each and you’ve already reached the limit.
Problem | Solution |
The pollers are showing a “with errors” link. |
RFC1213-MIB::ifIndex.1 = INTEGER: 1 RFC1213-MIB::ifIndex.2 = INTEGER: 2 RFC1213-MIB::ifDescr.1 = STRING: "Ethernet0/0" RFC1213-MIB::ifDescr.2 = STRING: "Port-channel1" RFC1213-MIB::ifSpeed.1 = Gauge32: 10000000 RFC1213-MIB::ifSpeed.2 = Gauge32: 1544000 RFC1213-MIB::ifInOctets.1 = Counter32: 1082348318 RFC1213-MIB::ifInDiscards.1 = Counter32: 14928 RFC1213-MIB::ifInErrors.1 = Counter32: 12518 … In this example, the device does not have the ifInOctets, ifOutOctets, ifInDiscards and ifInErrors counters for "Port-channel1", so LMS cannot monitor the Interface Utilization and Interface Errors.
Note: If the ifspeed is greater than 20,000,000, also check the 1.3.6.1.2.1.31.1.1.1.6 (ifHCInOctets) and 1.3.6.1.2.1.31.1.1.1.10 (ifHCOutOctets) OIDs. LMS uses the ifHCInOctets and ifHCOutOctets counters to calculate the utilization on high speed interfaces to make sure that it does not miss a counter wrap.
|
LMS frequently polls the cpu, memory, temperature, fan status, etc. MIB objects to find out if any faults have occurred in the network. As the default polling interval for most of these objects is 4 minutes, it may take a few minutes for an alarm to appear in LMS. To make the alarms immediate, we will now configure the network to notify the LMS server through SNMP traps that a fault has occurred.
# snmp-server host <ip address of the LMS server> <community string>
+ snmp-server host 1.1.1.1 public
LMS performs a separate device inventory collection to discover the objects that it needs to monitor. We will now check if all the device have been discovered correctly.
Problem | Solution |
Fault Monitoring shows devices under “All Unknown Devices in Inventory Services” | LMS does not know the device type.
|
Fault Monitoring shows devices under “All Questioned Devices in Inventory Services” | Either LMS was not able to resolve the device hostname, the device was ICMP unreachable or SNMP unreachable.
# nslookup <device ip address> # nslookup <device hostname>
|
Fault Monitoring shows devices under “All Learning Devices in Inventory Services” | LMS has started the discovery, but the discovery is not finished yet.
If the device is stuck in the “learning” state,
|
Problem | Solution |
I’m getting a “Cannot connect to ANI Server” error when I open Topology Services |
|
LMS keeps prompting me to “Please launch Topology Services again to work properly”. | Clear the java cache:
|
The “Layer 2 View” does not show my links. | Make sure that CDP is enabled on the devices. LMS uses CDP to discover the links.
# show cdp neighbors
(config)#cdp run (config)#interface <interface that connects to the unconnected device> (config-if)#cdp enable |
The “Layer 2 View” does not show my devices. | Check if your devices are hiding in the “Unconnected Device View”. |
My devices are listed in the “Unconnected Device View” | “Unconnected Device” means that LMS did not discover any neighbors on the device that LMS manages.
# show cdp neighbors
(config)#cdp run (config)#interface <interface that connects to the unconnected device> (config-if)#cdp enable |
My device icon is red | The device is unreachable or LMS does not have the correct SNMP read-only credentials.
|
My device icon has a green question mark | LMS was able to connect to the device, but the device type is not recognized. Check if the device is listed in the supported device list: http://www.cisco.com/en/US/docs/net_mgmt/ciscoworks_lan_management_solution/4.1/device_support/table/lms41sdt.html |
Problem | Solution |
The Host MAC addresses are not discovered. |
Note: LMS only supports Cisco access switches.
# show mac-address-table
Note: LMS ignores any host that is connected to a trunk as it assumes that the port is part of the backbone.
|
The Host IP Addresses are not discovered. |
Note: LMS only supports Cisco default gateways.
# show ip arp
Note: LMS does not support show arp vrf
|
The Host Names are not discovered. |
# nslookup <host ip address>
|
The User Names are not discovered. |
|
The hosts reports show false duplicates |
Note: Make sure that the LMS server can ping the hosts when using this option. The DHCP discovery relies on ICMP to learn which IP addresses are new and which IP addresses can be ignored.
# show mac-address-table
|
Now that we’ve configured our LMS server, we will want to make sure that it runs correctly for some time. Here are some steps that will make sure that the server does not reach its capacity limit and that we have a backup in case things go wrong.
Note: the backup requires twice the amount of space that is used in your CSCOpx directory (once the amount for the temporary tar file and once the amount for the backup itself). Reduce the number of generations if needed.
[<date><time>] Backup completed: at [<date><time>]
Problem | Solution |
The backup does not start |
# at Status ID Day Time Command Line ------------------------------------------------------------------------------- 1 Each M T W Th F S Su 5:00 AM C:\PROGRA~1\CSCOpx\objects\logrot\logrotsch.bat 2 Each M T W Th F S Su 12:00 AM C:\PROGRA~1\CSCOpx\conf\backupsch.bat
If the backupsch.bat is not listed, you can manually edit the backupsch.bat and add it to Administrative Tools > Task Scheduler
# dir C:\PROGRA~1\CSCOpx\backup.LOCK
The LMS backup creates a backup.LOCK in C:\PROGRA~1\CSCOpx to make sure that no two backups are run at the same time. If the previous backup did not create this backup.LOCK file, then no new backup can be performed. Delete the backup.LOCK if no backup is currently running. |
The backup runs slow | LMS schedules the backup in the Windows Scheduler with priority 7. On large scale deployments this can cause the backup to take more than 24 hours. You can increase the priority in the Windows Scheduler:
<Priority>7</Priority>
into:
<Priority>4</Priority>
|
LMS stores a new configuration file for every configuration change it detects in CSCOpx\files\rme\dcma\devfiles. If you have a lot of devices and a lot of configuration changes, these config files can quickly fill up your file system. To make sure that LMS does not fill up our file system, we will now configure the config archive purging.
Every Inventory Collection, Configuration Archive, Software Archive etc. that LMS performs, results in a new job. Over the years, this can easily add up to thousands of jobs. To make sure that the Job Browser won’t take too long to load, we will now configure the job purging.
LMS first adds each syslog to the CSCOpx\log\syslog.log file before it adds them to its CSCOpx\databases\rmeng\SyslogFirst.db, SyslogSecond.db and SyslogThird.db databases. As the managed devices can sometimes send hundreds of syslogs per second, the syslog.log and syslog databases can quickly reach their capacity limits. We will now make sure LMS purges the old syslogs.
Note: do not set the purge to more than 13 days. LMS rotates the syslog database updates between the SyslogFirst.db, SyslogSecond.db and SyslogThird.db databases every 7 days. If you use a purge that is greater than 13 days, all three databases will be used at once and you will not be able to reclaim the database space with the CSCOpx\MDC\tomcat\webapps\rme\WEB-INF\debugtools\dbcleanup\DBSpaceReclaimer.pl script.
Note: do not check the “Restart Daemon Manager” box. Only the stdout.log log rotation requires a restart. However, tomcat has its own log rotation so there should be no need for this.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: