cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
698
Views
13
Helpful
7
Replies

secondary as active

suthomas1
Level 6
Level 6

Hi,

two firewall modules on two 6500 shows a strange problem. status on fsm-1 shows as active for itself and secondary as failed.

fsm-2 shows itself as active and secondary as failed, but fsm-1 is the one handling active role currently.

interfaces shows as not-monitored, whereas my knowledge says it should be normal (waiting) status.

various command status has been pasted here in the notepad.

a strange error also shows up in fsm-2 output :ERROR: np_logger_query request for FP Stats failed

i am not certain why both show as active.

Please help,Thanks.

1 Accepted Solution

Accepted Solutions

The BugToolkit is where you would find all of the published details, but it sounds like you've already found that:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtg35889

NP1/2 Lockup on standby FWSM

Symptom:
After a period of time the threads may lock up for either NP1 or NP2 on the FWSM.  Because this
affects the standby unit, the issue may not be immediately noticed.  After a longer period of time
the module will eventually crash and reload.

The state of the NP threads can be verified with the command "show np pc".  If the NPs are
locked, all the threads will occupied as seen below for NP2.

------------------ show np pc ------------------

THREAD:PC(NP1/NP2/NP3)
0:0000/2c6d/0000  1:0000/2b65/0000  2:59c5/426e/598e  3:28c1/28c1/0000
4:0000/2c6d/0000  5:0000/2c6d/0000  6:0000/2c6d/0000  7:0000/2b65/0000
8:0000/2c6d/0000  9:0000/2da5/0000 10:0000/2c6d/0000 11:0000/2c6d/0000
12:0000/2c6d/0000 13:0000/2c6d/0000 14:0000/2c6d/0000 15:0000/2b65/0000
16:0000/2c6d/0000 17:0000/2c6d/0000 18:0000/2c6d/0000 19:0000/2c6d/0000
20:0000/2c6d/0000 21:0000/2c6d/0000 22:0000/2c6d/0000 23:0000/5ad3/0000
24:0000/2b65/0000 25:0000/2c6d/0000 26:0000/2c6d/0000 27:0000/2da5/0000
28:0000/2c6d/0000 29:0000/2b65/0000 30:0000/2c6d/0000 31:0000/2b65/0000


Conditions:
The FWSM is running either 4.0(9) or 3.2(15) and later releases.

Workaround:
Downgrade below 4.0(9) or 3.2(15)

-Mike

View solution in original post

7 Replies 7

mirober2
Cisco Employee
Cisco Employee

Hello,

Does the output of 'show np pc' on FSM-2 show many non-zero threads, something that looks like this?:

------------------ show np pc ------------------

THREAD:PC(NP1/NP2/NP3)
0:0000/2c6d/0000  1:0000/2b65/0000  2:59c5/426e/598e  3:28c1/28c1/0000
4:0000/2c6d/0000  5:0000/2c6d/0000  6:0000/2c6d/0000  7:0000/2b65/0000
8:0000/2c6d/0000  9:0000/2da5/0000 10:0000/2c6d/0000 11:0000/2c6d/0000
12:0000/2c6d/0000 13:0000/2c6d/0000 14:0000/2c6d/0000 15:0000/2b65/0000
16:0000/2c6d/0000 17:0000/2c6d/0000 18:0000/2c6d/0000 19:0000/2c6d/0000
20:0000/2c6d/0000 21:0000/2c6d/0000 22:0000/2c6d/0000 23:0000/5ad3/0000
24:0000/2b65/0000 25:0000/2c6d/0000 26:0000/2c6d/0000 27:0000/2da5/0000
28:0000/2c6d/0000 29:0000/2b65/0000 30:0000/2c6d/0000 31:0000/2b65/0000

It sounds like one or more of the NPs on FSM-2 are locked up and as a result it is unable to process failover messages. I would suggest opening a TAC case so that can be investigated in detail, but one possible bug is this:

CSCtg35889 - NP 1/2 Lockup on standby FWSM

That bug is fixed in 4.0.11.1 and higher, so you may want to consider an upgrade to the latest 4.0.x image to get the fix for this bug and see if the issue persists.

Hope that helps.

-Mike

yes , it is showing similar to that but different in hex elements. even fsm-1 has same output of these characters.

does it mean both are having same problem & does the not-monitored status relate to this.

thanks for your help.

Hello,

It can still be a bug even with different hex values, but the key is whether or not they are changing. Check the output several times back-to-back and see if the values are changing or stuck on the same hex values. If many threads are all stuck on the same value, my last message would still apply. Opening a TAC case will increase your chances of getting a true root cause for this issue, but an upgrade to the latest 4.0.x image will also probably solve the problem if you're not able to open a case.

The "not monitored" status just means that you don't have interface monitoring configured with the 'monitor-interface' command.

Hope that helps.

-Mike

Thanks & Appreciate your help. is there any link explaining this character/bug, i tried to search but it only links to a caveat which just references it.

The BugToolkit is where you would find all of the published details, but it sounds like you've already found that:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtg35889

NP1/2 Lockup on standby FWSM

Symptom:
After a period of time the threads may lock up for either NP1 or NP2 on the FWSM.  Because this
affects the standby unit, the issue may not be immediately noticed.  After a longer period of time
the module will eventually crash and reload.

The state of the NP threads can be verified with the command "show np pc".  If the NPs are
locked, all the threads will occupied as seen below for NP2.

------------------ show np pc ------------------

THREAD:PC(NP1/NP2/NP3)
0:0000/2c6d/0000  1:0000/2b65/0000  2:59c5/426e/598e  3:28c1/28c1/0000
4:0000/2c6d/0000  5:0000/2c6d/0000  6:0000/2c6d/0000  7:0000/2b65/0000
8:0000/2c6d/0000  9:0000/2da5/0000 10:0000/2c6d/0000 11:0000/2c6d/0000
12:0000/2c6d/0000 13:0000/2c6d/0000 14:0000/2c6d/0000 15:0000/2b65/0000
16:0000/2c6d/0000 17:0000/2c6d/0000 18:0000/2c6d/0000 19:0000/2c6d/0000
20:0000/2c6d/0000 21:0000/2c6d/0000 22:0000/2c6d/0000 23:0000/5ad3/0000
24:0000/2b65/0000 25:0000/2c6d/0000 26:0000/2c6d/0000 27:0000/2da5/0000
28:0000/2c6d/0000 29:0000/2b65/0000 30:0000/2c6d/0000 31:0000/2b65/0000


Conditions:
The FWSM is running either 4.0(9) or 3.2(15) and later releases.

Workaround:
Downgrade below 4.0(9) or 3.2(15)

-Mike

Thanks Mike. that was the kit i was referring to. I will have the customer support open TAC for this to be recorded, as per your good suggestion.

A general question on bugs, since it is fixed in other release trains, by what time frame does the bug affect a platform.


going by the bug kit, the platform should have shown the symptoms long ago, as this is being used by the client for over an year now.

But it never turned up earlier.

thanks.

Unfortunately, the bug details don't mention any trigger, so it's unclear why you haven't seen this issue before. Perhaps something has changed in the network, such as traffic profile or load, that caused the FWSM's NPs to lock up.

In general, you should not experience the same bug in code versions that are higher than the one it was fixed in, since all past fixes are rolled into subsequent releases. In the case of CSCtg35889, this was actually a regression caused by the fix of a different bug, so it only affects 4.0(9) through 4.0(11).

-Mike

Review Cisco Networking for a $25 gift card