12-19-2010 06:06 AM - edited 03-11-2019 12:24 PM
Hi,
two firewall modules on two 6500 shows a strange problem. status on fsm-1 shows as active for itself and secondary as failed.
fsm-2 shows itself as active and secondary as failed, but fsm-1 is the one handling active role currently.
interfaces shows as not-monitored, whereas my knowledge says it should be normal (waiting) status.
various command status has been pasted here in the notepad.
a strange error also shows up in fsm-2 output :ERROR: np_logger_query request for FP Stats failed
i am not certain why both show as active.
Please help,Thanks.
Solved! Go to Solution.
12-19-2010 07:59 AM
The BugToolkit is where you would find all of the published details, but it sounds like you've already found that:
NP1/2 Lockup on standby FWSM | |
Symptom: After a period of time the threads may lock up for either NP1 or NP2 on the FWSM. Because this affects the standby unit, the issue may not be immediately noticed. After a longer period of time the module will eventually crash and reload. The state of the NP threads can be verified with the command "show np pc". If the NPs are locked, all the threads will occupied as seen below for NP2. ------------------ show np pc ------------------ THREAD:PC(NP1/NP2/NP3) 0:0000/2c6d/0000 1:0000/2b65/0000 2:59c5/426e/598e 3:28c1/28c1/0000 4:0000/2c6d/0000 5:0000/2c6d/0000 6:0000/2c6d/0000 7:0000/2b65/0000 8:0000/2c6d/0000 9:0000/2da5/0000 10:0000/2c6d/0000 11:0000/2c6d/0000 12:0000/2c6d/0000 13:0000/2c6d/0000 14:0000/2c6d/0000 15:0000/2b65/0000 16:0000/2c6d/0000 17:0000/2c6d/0000 18:0000/2c6d/0000 19:0000/2c6d/0000 20:0000/2c6d/0000 21:0000/2c6d/0000 22:0000/2c6d/0000 23:0000/5ad3/0000 24:0000/2b65/0000 25:0000/2c6d/0000 26:0000/2c6d/0000 27:0000/2da5/0000 28:0000/2c6d/0000 29:0000/2b65/0000 30:0000/2c6d/0000 31:0000/2b65/0000 Conditions: The FWSM is running either 4.0(9) or 3.2(15) and later releases. Workaround: Downgrade below 4.0(9) or 3.2(15) |
-Mike
12-19-2010 07:21 AM
Hello,
Does the output of 'show np pc' on FSM-2 show many non-zero threads, something that looks like this?:
------------------ show np pc ------------------
THREAD:PC(NP1/NP2/NP3)
0:0000/2c6d/0000 1:0000/2b65/0000 2:59c5/426e/598e 3:28c1/28c1/0000
4:0000/2c6d/0000 5:0000/2c6d/0000 6:0000/2c6d/0000 7:0000/2b65/0000
8:0000/2c6d/0000 9:0000/2da5/0000 10:0000/2c6d/0000 11:0000/2c6d/0000
12:0000/2c6d/0000 13:0000/2c6d/0000 14:0000/2c6d/0000 15:0000/2b65/0000
16:0000/2c6d/0000 17:0000/2c6d/0000 18:0000/2c6d/0000 19:0000/2c6d/0000
20:0000/2c6d/0000 21:0000/2c6d/0000 22:0000/2c6d/0000 23:0000/5ad3/0000
24:0000/2b65/0000 25:0000/2c6d/0000 26:0000/2c6d/0000 27:0000/2da5/0000
28:0000/2c6d/0000 29:0000/2b65/0000 30:0000/2c6d/0000 31:0000/2b65/0000
It sounds like one or more of the NPs on FSM-2 are locked up and as a result it is unable to process failover messages. I would suggest opening a TAC case so that can be investigated in detail, but one possible bug is this:
CSCtg35889 - NP 1/2 Lockup on standby FWSM
That bug is fixed in 4.0.11.1 and higher, so you may want to consider an upgrade to the latest 4.0.x image to get the fix for this bug and see if the issue persists.
Hope that helps.
-Mike
12-19-2010 07:46 AM
yes , it is showing similar to that but different in hex elements. even fsm-1 has same output of these characters.
does it mean both are having same problem & does the not-monitored status relate to this.
thanks for your help.
12-19-2010 07:50 AM
Hello,
It can still be a bug even with different hex values, but the key is whether or not they are changing. Check the output several times back-to-back and see if the values are changing or stuck on the same hex values. If many threads are all stuck on the same value, my last message would still apply. Opening a TAC case will increase your chances of getting a true root cause for this issue, but an upgrade to the latest 4.0.x image will also probably solve the problem if you're not able to open a case.
The "not monitored" status just means that you don't have interface monitoring configured with the 'monitor-interface' command.
Hope that helps.
-Mike
12-19-2010 07:57 AM
Thanks & Appreciate your help. is there any link explaining this character/bug, i tried to search but it only links to a caveat which just references it.
12-19-2010 07:59 AM
The BugToolkit is where you would find all of the published details, but it sounds like you've already found that:
NP1/2 Lockup on standby FWSM | |
Symptom: After a period of time the threads may lock up for either NP1 or NP2 on the FWSM. Because this affects the standby unit, the issue may not be immediately noticed. After a longer period of time the module will eventually crash and reload. The state of the NP threads can be verified with the command "show np pc". If the NPs are locked, all the threads will occupied as seen below for NP2. ------------------ show np pc ------------------ THREAD:PC(NP1/NP2/NP3) 0:0000/2c6d/0000 1:0000/2b65/0000 2:59c5/426e/598e 3:28c1/28c1/0000 4:0000/2c6d/0000 5:0000/2c6d/0000 6:0000/2c6d/0000 7:0000/2b65/0000 8:0000/2c6d/0000 9:0000/2da5/0000 10:0000/2c6d/0000 11:0000/2c6d/0000 12:0000/2c6d/0000 13:0000/2c6d/0000 14:0000/2c6d/0000 15:0000/2b65/0000 16:0000/2c6d/0000 17:0000/2c6d/0000 18:0000/2c6d/0000 19:0000/2c6d/0000 20:0000/2c6d/0000 21:0000/2c6d/0000 22:0000/2c6d/0000 23:0000/5ad3/0000 24:0000/2b65/0000 25:0000/2c6d/0000 26:0000/2c6d/0000 27:0000/2da5/0000 28:0000/2c6d/0000 29:0000/2b65/0000 30:0000/2c6d/0000 31:0000/2b65/0000 Conditions: The FWSM is running either 4.0(9) or 3.2(15) and later releases. Workaround: Downgrade below 4.0(9) or 3.2(15) |
-Mike
12-19-2010 08:16 AM
Thanks Mike. that was the kit i was referring to. I will have the customer support open TAC for this to be recorded, as per your good suggestion.
A general question on bugs, since it is fixed in other release trains, by what time frame does the bug affect a platform.
going by the bug kit, the platform should have shown the symptoms long ago, as this is being used by the client for over an year now.
But it never turned up earlier.
thanks.
12-19-2010 08:25 AM
Unfortunately, the bug details don't mention any trigger, so it's unclear why you haven't seen this issue before. Perhaps something has changed in the network, such as traffic profile or load, that caused the FWSM's NPs to lock up.
In general, you should not experience the same bug in code versions that are higher than the one it was fixed in, since all past fixes are rolled into subsequent releases. In the case of CSCtg35889, this was actually a regression caused by the fix of a different bug, so it only affects 4.0(9) through 4.0(11).
-Mike
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide