09-20-2010 10:20 PM
Hi All,
I have CSMARS configured for my enterprise network. In one of the major incidents, one of the line card of my 6509 went faulty with following syslog,
09-19-2010 09:59:40 UTC Local0.Error 192.168.228.3 150: Sep 19 15:19:32 IST: %EARL-SP-3-RESET_LC: Resetting module in slot 1. (Errorcode 1)
09-19-2010 09:59:40 UTC Local0.Error 192.168.228.3 151: Sep 19 15:19:32 IST: %PF_ASIC-SPSTBY-3-ASIC_DUMP: [0:0x20C] ME_AR_P2MMU_FREE_TAIL = 0x28E
However this syslog message was not captured by the CSMARS, or may be i am not getting a way to locate this error in the incidents tab.
Please help me in understanding if CSMARS captures all the events or not. Or i have to enable some events to be forwarded to CSMARS. Or if the log is registered, how can i find this log in the MARS.
Solved! Go to Solution.
09-20-2010 11:32 PM
EDIT:
I just noticed the attachment in your last message. It looks like you've mis-configured the device type in MARS.
If you are running Native IOS on your 6509 (such as 12.2SXH or SXI), the device type should be "Cisco Switch-IOS 12.2" to parse the logs correctly. The device type "Cisco IOS 12.2" is for routers running IOS 12.2.
----------------------------------------------------
----------------------------------------------------
I'm going to assume the faulty line card is not in the critical path between this switch and your MARS server (correct?). Otherwise, halijenn's comment applies.
Anyway, have you verified that you're receiving logs from that switch in MARS? Have you verified they are being parsed correctly? The easiest way is to run a query in MARS.
- Run a query for the last 7 or more days
- "Result Format" should be "All Matching Events" (or all matching sessions)
- Under "Reporting Device", select the switch in question
This will return any events from that switch, and verify that it's reporting (and being parsed) properly.
If that's successful, I would run a second query.
- Change the "Result Format" to "All Matching Event Raw Messages"
- Limit the time frame to an hour before and after the timestamp on the log you pasted above
- Under "Keyword", add "EARL\-SP\-3\-RESET\_LC" (without quotes), and set "Operation" to "OR"
- In the second field, enter "PF\_ASIC\-SPSTBY\-3\-ASIC\_DUMP" (no quotes)
This is a regular expression that should match the logs you're looking for. Apply the settings and run the query. This should tell you if MARS at least received the log. If it did, then more work will need to be done to figure out why it didn't report properly.
------------------------------
------------------------------
Just FYI -- it's very possible that MARS could not completely parse that specific log, which happens with a lot of messages from the 6509s. It often reports them as "Generic IOS Syslog" or something similar.
09-20-2010 10:29 PM
You would need to configure the switch to send the syslog to MARS, if switch is not configured to send the syslog to MARS, MARS will not have gotten the syslog messages.
On MARS, you would also need to add switch as a network device accordingly.
09-20-2010 10:39 PM
The Switch is already added in the MARS. Is there something else to do.
09-20-2010 10:48 PM
Switch needs to be configured to send the syslog messages to MARS as well.
09-20-2010 10:57 PM
I have the syslog logging command configured,
Noida-SF-CORE-SW-1#show runn | in logg
Noida-SF-CORE-SW-1#show runn | in logging
logging buffered 4096 debugging
logging enable
logging trap notifications
logging facility local0
logging 10.216.16.70
logging 10.216.16.106
ntp logging
Noida-SF-CORE-SW-1#
10.216.16.106 is the CSMARS box.
Attached is the screenshot of device configured in MARS,
09-20-2010 11:01 PM
Well, the line card went faulty, hence the line card is not able to send the syslog messages to MARS, that is the reason why MARS is not getting the syslog messages as the line card itself is faulty.
09-20-2010 11:48 PM
I have both Kiwi Syslog and CSMARS configured.
Kiwi captured the syslog sent by 192.168.228.3, however MARS didnt.
Also i think the line card will not generate the syslog, instead the Sup will observe the event and tell the servers (Syslog/Mars), correct me if i am wrong.
I am not understanding why the same syslog was not captured by MARS, but was captured by Syslog.
09-20-2010 11:32 PM
EDIT:
I just noticed the attachment in your last message. It looks like you've mis-configured the device type in MARS.
If you are running Native IOS on your 6509 (such as 12.2SXH or SXI), the device type should be "Cisco Switch-IOS 12.2" to parse the logs correctly. The device type "Cisco IOS 12.2" is for routers running IOS 12.2.
----------------------------------------------------
----------------------------------------------------
I'm going to assume the faulty line card is not in the critical path between this switch and your MARS server (correct?). Otherwise, halijenn's comment applies.
Anyway, have you verified that you're receiving logs from that switch in MARS? Have you verified they are being parsed correctly? The easiest way is to run a query in MARS.
- Run a query for the last 7 or more days
- "Result Format" should be "All Matching Events" (or all matching sessions)
- Under "Reporting Device", select the switch in question
This will return any events from that switch, and verify that it's reporting (and being parsed) properly.
If that's successful, I would run a second query.
- Change the "Result Format" to "All Matching Event Raw Messages"
- Limit the time frame to an hour before and after the timestamp on the log you pasted above
- Under "Keyword", add "EARL\-SP\-3\-RESET\_LC" (without quotes), and set "Operation" to "OR"
- In the second field, enter "PF\_ASIC\-SPSTBY\-3\-ASIC\_DUMP" (no quotes)
This is a regular expression that should match the logs you're looking for. Apply the settings and run the query. This should tell you if MARS at least received the log. If it did, then more work will need to be done to figure out why it didn't report properly.
------------------------------
------------------------------
Just FYI -- it's very possible that MARS could not completely parse that specific log, which happens with a lot of messages from the 6509s. It often reports them as "Generic IOS Syslog" or something similar.
09-21-2010 12:49 AM
Thanks Michael, i can see the logs in the query i issued as you told. Attached is the same,
However they are coming as Generic IOS Syslog error, How can i modify the error description?
Also tell me the method of modifying the IOS version/Device type in Mars device list as my device version is
Noida-SF-CORE-SW-1#show ver
Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXH7, RELEASE SOFTWARE (fc3)
09-21-2010 06:48 PM
rashidsiddiqui wrote:
Also tell me the method of modifying the IOS version/Device type in Mars device list as my device version is
Cisco IOS Software, Version 12.2(33)SXH7, RELEASE SOFTWARE (fc3)
Glad you were able to find the logs - at least you know everything is (generally) working like it should. The only way to change the device type is to delete the device in MARS, and re-import it as the correct type. A device can be modified in MARS for a change in version, but not for device type.
rashidsiddiqui wrote:
However they are coming as Generic IOS Syslog error, How can i modify the error description?
First, I would try re-importing the device to see what kind of "Generic IOS Syslog" events are being reported. It's possible that MARS might have parsed the logs correctly, if it was expecting the correct device type.
However, in cases like this, the only way to modify the reported event is to utilize a custom parser to extend the types of logs processed by MARS. The procedure for doing this is outlined in Ch. 15 of the Local/Global Controller User Guide:
I haven't personally used this feature (yet), so I can't really comment on how it works or the difficulty of using it.
09-21-2010 02:52 AM
09-21-2010 07:13 PM
In the attachment, you had screenshots (attached here) showing rules related to CPU utilization alerts. You also asked these questions:
- "Please let me know what type of Event these rules will trigger?"
- "Specifically I want to know the meaning behind “Count”. If I increase the count field to 2 or 3 what will it do?"
- "Also the time I configured is 10 minutes. Is it like it will check the condition for a time period of 10 minutes? What is the significance of time field here?"
I believe these alarms are related to the SNMP-based resource monitoring features available for some platforms (mostly Cisco). In the first rule, any time a CPU utilization event is received, this rule will be triggered. It only takes one reported event (count) over any 10 minute window (time) to trigger the rule. So, essentially, every CPU utilization report will trigger the rule.
The second rule won't be triggered until the usage reported is over 50%. It only has to be reported over 50% ONE time (count) over 10 minutes (time) to be triggered. In this case, if you changed the count to 3, it would take 3 reported events of CPU over 50% within a 10 minute window to trigger. So, a one-time very short spike in CPU wouldn't trigger it.
As for rule 3, I can't say for sure, but it's possible it has to do with reported CPU vs. the established baseline for that device. I don't have much insight to that area of MARS, so I couldn't really comment on it.
Hope that clears it up a little bit.
09-23-2010 09:51 PM
Thanks Michael,
This was great learning discussion with you. Let me try a little more with MARS, and if needed i will ping you again.
My major issues have been addressed, thanks again. Have a good day.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide