cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
37147
Views
20
Helpful
5
Replies

Basic Troubleshooting Steps Before Calling Support

mdsprague
Level 1
Level 1

This is going to sound like a seriously noob question because it is.

We have two Ironport C160 devices configured in a cluster in our enviornment.  Both seem to purr along nicely but one unit has "crashed" a couple of times in the past few months and I'm trying to figure out what to check, what to do, on the device(s) with regards to basic diagnostics/troubleshooting to find out what has happened.  I'd like to go to support have some basic ideas before calling support for assistance.  Any thoughts?  I'm wondering are there log files to check that might indicate a problem or something else I can look at to see what is going on?

Any input would be greatly appreciated.

1 Accepted Solution

Accepted Solutions

Greetings,

This is actually a great question.  Performing some initial diagnosis (information gathering) prior to contacting support can go a long way into speeding up  determination of root cause. It can also improve the accuracy of the analysis.  The first thing we have to do is determine what "Crash" means. Did the system become totally unresponsive or did is just become unresponsive to network requests? This reason this is important is that a system that is not responding to commands (locked up) may be encountering a different type of fault situation than say a system that is just not responding to network requests.   For example a system that is totally locked up and not responding to any commands from the console may be encountering an problem with the Raid controller or memory (hardware) while a system that is accessible via the console, but not via SSH or HTTPS may simple be over burdon with connections or may be experiencing DNS related issues.

Before powering down the appliance or rebooting it, try connecting to the serial port first. In a majority of cases that we see, the appliance is operational but has encountered a network related issue that is preventing or limiting connections, including those to the GUI or via SSH. Console access will help you determine if the issue goes further than just a networking problem.  As mentioned the system logs can be helpful in a situation like this. The Status logs can also help you look at any trends related to a number of parameters, such as CPU, Memory, Connections in and out, as well as disk I/O. 

Going further, if you do contact support we prefer that you do so Prior to rebooting the appliance. If you have to power cycle the appliance its possible to loose any logging details related to the event, therefore we would prefer to see the appliance in the fault state.

I am including some detailed information below that outlines some basic diagnostic procedures that we recommend for events like this. If you are seeing such events on a regular basis it would be advisable to contact support so we can perform a more detailed analysis.

Environment: Cisco IronPort Email Security Appliance (ESA), Security Management Appliance (SMA), all versions of AsyncOS

Symptoms: You are unable to connect to your ESA or SMA appliance over the  network.  You have attempted to connect using the web interface and the  CLI via SSH and the appliance does not appear to be answering the  requests.

In a majority of cases the appliance is not actually  locked up. It may simply be in a state that is preventing it from  responding to network requests in the usual manner.  Below are some  guidelines that can help you diagnose the problem, and possibly get your  system back up and running or at least in a state you can work with.

It is very important that you do not power cycle the system unless advised to do so by technical support. Power cycling  the appliance can cause data corruption which can result in lost  messages, database corruption, lost logging data as well as damage to  the file system.   When you power cycle the appliance it is not able to  unmount the file systems cleanly. For this reason you should always use  the 'shutdown' or 'reboot' command from the CLI, or the Shutdown/Reboot  option listed under the system administration tab in the GUI.

So what if you rebooted the appliance correctly, and still can not gain access via the network?

  • Check the indicator lights on the appliance, are any lights on
  • Are the lights for the hard drives on? Are they flashing?
  • Are there any status codes on the front of the appliance?
  • Did the appliance issue any audible codes when it started up (Beeping).

In many cases simply swapping out the network cable or moving to another port on the switch can resolve the connectivity issue.

  • Check the status of the indicator lights on the switch port if they are available.
  • Check the status of the lights on appliance. Are they on, are they flashing?
  • Are you able to connect directly to the appliance using a network cross over cable?

A  network crossover cable will allow you to connect directly to the  Ethernet ports on the appliance. You will however, have to configure the  connecting host to be on the same subnet as the interface your  connecting to. Using a network crossover cable can be helpful in  diagnosing situations related to your LAN. One such issue is having  another host with the same IP address on the same subnet.

  • Is  your appliance not responding to network requests or is it simply not  responding to service requests?  You can determine this using ping. If  you can ping the appliance but you can not SSH to it, then we know its  listening via ICMP and the SSH service is not responding or accessible.
  • Have  you tested all network interfaces? Check to see if you can connect to  one of the other interfaces on the appliance using the process described  above.

If your system is not responding to network requests  and immediate access is needed, you can connect to the serial port  located on the rear of your appliance. This port is a standard DB9  connector and can be utilized with the serial cable that came with your  appliance. If you do not have the serial that came with your appliance  you will need to obtain one that is configured as a null modem cable.  Optionally you can use a standard serial cable with a null modem  adapter. Once you have connected the cable to the appliance you can then  connect the other end of the cable to another system, such as a laptop.  You will need a terminal program like Hyperterm, or Procom. You will  need to configure your terminal program for 9600 Baud 8N1. Once you have  started your terminal program, you should be able to connect and get a  login.  In the event that the serial port is not responding you may want  to verify that the cable is connected and the unit is  powered on. If  you still cannot get a login it is advisable to contact customer support  for further assistance.

If you are able to obtain access via the  serial port issue the command status, check to see if the appliance is  listed as being "Online".

    mail.example.com > status detail

     Status as of:                  Mon Jan 04 12:48:31 2010 CST
     Up since:                      Tue Jul 14 16:50:50 2009 CDT (173d 20h 57m 41s)
     Last counter reset:            Never
     System status:                 Online
     Oldest Message:                24 weeks 16 hours 30 mins 48 secs
     Feature - Centralized Tracking: 833 days
     Feature - Centralized Reporting: 833 days
     Feature - IronPort Centralized Configuration Manager: 60 days
     Feature - Incoming Mail Handling: Perpetual
     Feature - Centralized Spam Quarantine: 833 days

If the status detail command does not respond or produces an error, contact customer support.

Use the "Version" command to check the RAID status.

      mail.example.com > version

     Current Version
     ===============
     Model: M660
     Version: 6.5.2-101
     Build Date: 2009-05-28
     Install Date: 2009-07-14 17:04:32
     Serial #: 002C999999-J999999
     BIOS: 2.4.3I
     RAID: 1.21.02-0528, 2.01.00, 1.02-014B
     RAID Status: Optimal
     RAID Type: 10
     BMC: 1.77

If  the RAID is degraded its possible the appliance is encountering other  that may or may not be related to the apparent lock up. If the Version  command will not respond or provide any data contact customer support.

Check your network configuration using the command etherconfig.

     mail.example.com > etherconfig

     Choose the operation you want to perform:
     - MEDIA - View and edit ethernet media settings.
     - VLAN - View and configure VLANs.
     - LOOPBACK - View and configure Loopback.
     - MTU - View and configure MTU.
     []> media

     Ethernet interfaces:
     1. Data 1 (Autoselect: )) 00:22:19:b0:03:c4
     2. Data 2 (Autoselect: )) 00:22:19:b0:03:c6
     3. Management (Autoselect: <1000baseTX full-duplex>) 00:10:18:4e:29:88

     Choose the operation you want to perform:
     - EDIT - Edit an ethernet interface.
     []>


     Choose the operation you want to perform:
     - MEDIA - View and edit ethernet media settings.
     - VLAN - View and configure VLANs.
     - LOOPBACK - View and configure Loopback.
     - MTU - View and configure MTU.
     []> MTU

     Ethernet interfaces:
     1. Data 1 default mtu 1500
     2. Data 2 default mtu 1500
     3. Management default mtu 1500

     Choose the operation you want to perform:
     - EDIT - Edit an ethernet interface.
     []>

Recent network changes can have an impact on connectivity to the appliance.

Use the command "interfaceconfig" to verify your interface settings.

   mail.example.com > interfaceconfig


     Currently configured interfaces:
     1. Management (192.168.1.33/24 on Management: downside.hometown.net)
     2. outbound_gloop_ISQ_notify (192.168.1.34/24 on Management: inside.hometown.net)

     Choose the operation you want to perform:
     - NEW - Create a new interface.
     - EDIT - Modify an interface.
     - GROUPS - Define interface groups.
     - DELETE - Remove an interface.

     []>

     Try flushing out all the network related cache.

     mail.example.com > diagnostic


     Choose the operation you want to perform:
     - RAID - Disk Verify Utility.
     - DISK_USAGE - Check Disk Usage.
     - NETWORK - Network Utilities.
     - REPORTING - Reporting Utilities.
     - TRACKING - Tracking Utilities.
    []> network


    Choose the operation you want to perform:
    - FLUSH - Flush all network related caches.
    - ARPSHOW - Show system ARP cache.
    - SMTPPING - Test a remote SMTP server.
    - TCPDUMP - Dump ethernet packets.
    []> flush

    Flushing LDAP cache.
    Flushing DNS cache.
    Flushing system ARP cache.
    10.92.152.1 (10.92.152.1) deleted
    10.92.152.18 (10.92.152.18) deleted

    Network reset complete.

    Choose the operation you want to perform:
    - FLUSH - Flush all network related caches.
    - ARPSHOW - Show system ARP cache.
    - SMTPPING - Test a remote SMTP server.
    - TCPDUMP - Dump ethernet packets.
    []>

If any of the network related commands fail to respond, contact customer support.

Once  you have performed these steps, if you are still unable to gain access  via the network it would be advisable to contact customer support for  further assistance.

Christopher C Smith
CSE

Cisco IronPort Customer Support 

View solution in original post

5 Replies 5

Eduardo Aliaga
Level 4
Level 4

In GUI, you can go to "System Administration > Alerts" and configure your Ironport to send system alarms and hardware alarms via e-mail.

Also, in CLI you can read the "system_logs", the next example uses "grep" to show only the "system_logs" with the word "Sep"

ironport> grep -e "Sep" system_logs

Wed Sep  1 2010 Critical: Could not issue an SNMP trap: Cannot find module (SNMPv2-MIB): At line 0 in (none)
Wed Sep  1 2010 Warning: Received an invalid DNS Response: rcode=ServFail data="'\\xbd\\x8b\\x81\\x82\\x00\\

Thanks for the feedback and suggestions Eduardo.

Greetings,

This is actually a great question.  Performing some initial diagnosis (information gathering) prior to contacting support can go a long way into speeding up  determination of root cause. It can also improve the accuracy of the analysis.  The first thing we have to do is determine what "Crash" means. Did the system become totally unresponsive or did is just become unresponsive to network requests? This reason this is important is that a system that is not responding to commands (locked up) may be encountering a different type of fault situation than say a system that is just not responding to network requests.   For example a system that is totally locked up and not responding to any commands from the console may be encountering an problem with the Raid controller or memory (hardware) while a system that is accessible via the console, but not via SSH or HTTPS may simple be over burdon with connections or may be experiencing DNS related issues.

Before powering down the appliance or rebooting it, try connecting to the serial port first. In a majority of cases that we see, the appliance is operational but has encountered a network related issue that is preventing or limiting connections, including those to the GUI or via SSH. Console access will help you determine if the issue goes further than just a networking problem.  As mentioned the system logs can be helpful in a situation like this. The Status logs can also help you look at any trends related to a number of parameters, such as CPU, Memory, Connections in and out, as well as disk I/O. 

Going further, if you do contact support we prefer that you do so Prior to rebooting the appliance. If you have to power cycle the appliance its possible to loose any logging details related to the event, therefore we would prefer to see the appliance in the fault state.

I am including some detailed information below that outlines some basic diagnostic procedures that we recommend for events like this. If you are seeing such events on a regular basis it would be advisable to contact support so we can perform a more detailed analysis.

Environment: Cisco IronPort Email Security Appliance (ESA), Security Management Appliance (SMA), all versions of AsyncOS

Symptoms: You are unable to connect to your ESA or SMA appliance over the  network.  You have attempted to connect using the web interface and the  CLI via SSH and the appliance does not appear to be answering the  requests.

In a majority of cases the appliance is not actually  locked up. It may simply be in a state that is preventing it from  responding to network requests in the usual manner.  Below are some  guidelines that can help you diagnose the problem, and possibly get your  system back up and running or at least in a state you can work with.

It is very important that you do not power cycle the system unless advised to do so by technical support. Power cycling  the appliance can cause data corruption which can result in lost  messages, database corruption, lost logging data as well as damage to  the file system.   When you power cycle the appliance it is not able to  unmount the file systems cleanly. For this reason you should always use  the 'shutdown' or 'reboot' command from the CLI, or the Shutdown/Reboot  option listed under the system administration tab in the GUI.

So what if you rebooted the appliance correctly, and still can not gain access via the network?

  • Check the indicator lights on the appliance, are any lights on
  • Are the lights for the hard drives on? Are they flashing?
  • Are there any status codes on the front of the appliance?
  • Did the appliance issue any audible codes when it started up (Beeping).

In many cases simply swapping out the network cable or moving to another port on the switch can resolve the connectivity issue.

  • Check the status of the indicator lights on the switch port if they are available.
  • Check the status of the lights on appliance. Are they on, are they flashing?
  • Are you able to connect directly to the appliance using a network cross over cable?

A  network crossover cable will allow you to connect directly to the  Ethernet ports on the appliance. You will however, have to configure the  connecting host to be on the same subnet as the interface your  connecting to. Using a network crossover cable can be helpful in  diagnosing situations related to your LAN. One such issue is having  another host with the same IP address on the same subnet.

  • Is  your appliance not responding to network requests or is it simply not  responding to service requests?  You can determine this using ping. If  you can ping the appliance but you can not SSH to it, then we know its  listening via ICMP and the SSH service is not responding or accessible.
  • Have  you tested all network interfaces? Check to see if you can connect to  one of the other interfaces on the appliance using the process described  above.

If your system is not responding to network requests  and immediate access is needed, you can connect to the serial port  located on the rear of your appliance. This port is a standard DB9  connector and can be utilized with the serial cable that came with your  appliance. If you do not have the serial that came with your appliance  you will need to obtain one that is configured as a null modem cable.  Optionally you can use a standard serial cable with a null modem  adapter. Once you have connected the cable to the appliance you can then  connect the other end of the cable to another system, such as a laptop.  You will need a terminal program like Hyperterm, or Procom. You will  need to configure your terminal program for 9600 Baud 8N1. Once you have  started your terminal program, you should be able to connect and get a  login.  In the event that the serial port is not responding you may want  to verify that the cable is connected and the unit is  powered on. If  you still cannot get a login it is advisable to contact customer support  for further assistance.

If you are able to obtain access via the  serial port issue the command status, check to see if the appliance is  listed as being "Online".

    mail.example.com > status detail

     Status as of:                  Mon Jan 04 12:48:31 2010 CST
     Up since:                      Tue Jul 14 16:50:50 2009 CDT (173d 20h 57m 41s)
     Last counter reset:            Never
     System status:                 Online
     Oldest Message:                24 weeks 16 hours 30 mins 48 secs
     Feature - Centralized Tracking: 833 days
     Feature - Centralized Reporting: 833 days
     Feature - IronPort Centralized Configuration Manager: 60 days
     Feature - Incoming Mail Handling: Perpetual
     Feature - Centralized Spam Quarantine: 833 days

If the status detail command does not respond or produces an error, contact customer support.

Use the "Version" command to check the RAID status.

      mail.example.com > version

     Current Version
     ===============
     Model: M660
     Version: 6.5.2-101
     Build Date: 2009-05-28
     Install Date: 2009-07-14 17:04:32
     Serial #: 002C999999-J999999
     BIOS: 2.4.3I
     RAID: 1.21.02-0528, 2.01.00, 1.02-014B
     RAID Status: Optimal
     RAID Type: 10
     BMC: 1.77

If  the RAID is degraded its possible the appliance is encountering other  that may or may not be related to the apparent lock up. If the Version  command will not respond or provide any data contact customer support.

Check your network configuration using the command etherconfig.

     mail.example.com > etherconfig

     Choose the operation you want to perform:
     - MEDIA - View and edit ethernet media settings.
     - VLAN - View and configure VLANs.
     - LOOPBACK - View and configure Loopback.
     - MTU - View and configure MTU.
     []> media

     Ethernet interfaces:
     1. Data 1 (Autoselect: )) 00:22:19:b0:03:c4
     2. Data 2 (Autoselect: )) 00:22:19:b0:03:c6
     3. Management (Autoselect: <1000baseTX full-duplex>) 00:10:18:4e:29:88

     Choose the operation you want to perform:
     - EDIT - Edit an ethernet interface.
     []>


     Choose the operation you want to perform:
     - MEDIA - View and edit ethernet media settings.
     - VLAN - View and configure VLANs.
     - LOOPBACK - View and configure Loopback.
     - MTU - View and configure MTU.
     []> MTU

     Ethernet interfaces:
     1. Data 1 default mtu 1500
     2. Data 2 default mtu 1500
     3. Management default mtu 1500

     Choose the operation you want to perform:
     - EDIT - Edit an ethernet interface.
     []>

Recent network changes can have an impact on connectivity to the appliance.

Use the command "interfaceconfig" to verify your interface settings.

   mail.example.com > interfaceconfig


     Currently configured interfaces:
     1. Management (192.168.1.33/24 on Management: downside.hometown.net)
     2. outbound_gloop_ISQ_notify (192.168.1.34/24 on Management: inside.hometown.net)

     Choose the operation you want to perform:
     - NEW - Create a new interface.
     - EDIT - Modify an interface.
     - GROUPS - Define interface groups.
     - DELETE - Remove an interface.

     []>

     Try flushing out all the network related cache.

     mail.example.com > diagnostic


     Choose the operation you want to perform:
     - RAID - Disk Verify Utility.
     - DISK_USAGE - Check Disk Usage.
     - NETWORK - Network Utilities.
     - REPORTING - Reporting Utilities.
     - TRACKING - Tracking Utilities.
    []> network


    Choose the operation you want to perform:
    - FLUSH - Flush all network related caches.
    - ARPSHOW - Show system ARP cache.
    - SMTPPING - Test a remote SMTP server.
    - TCPDUMP - Dump ethernet packets.
    []> flush

    Flushing LDAP cache.
    Flushing DNS cache.
    Flushing system ARP cache.
    10.92.152.1 (10.92.152.1) deleted
    10.92.152.18 (10.92.152.18) deleted

    Network reset complete.

    Choose the operation you want to perform:
    - FLUSH - Flush all network related caches.
    - ARPSHOW - Show system ARP cache.
    - SMTPPING - Test a remote SMTP server.
    - TCPDUMP - Dump ethernet packets.
    []>

If any of the network related commands fail to respond, contact customer support.

Once  you have performed these steps, if you are still unable to gain access  via the network it would be advisable to contact customer support for  further assistance.

Christopher C Smith
CSE

Cisco IronPort Customer Support 

Thanks Christopher!  This is exactly what I was looking for.

Chris, AWESOME post.  Thank you for the time invested in writing it.  I added it to my personal notes, not only for how to troubleshoot issues but also on how to write troubleshooting documentation.   : )     Emoticons not working on my IE 9 browser.  : (

Happy Holidays IronPort Nation!

Jason

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: