cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
746
Views
0
Helpful
12
Replies

PI 1.4 Crashing

lbadman
Level 1
Level 1

We have a nightly crashing nightly. We have PI 1.4 instances, just upgraded from 1.3, and each is crashing in the middle of the night. I suspect it may have to do with backing up of the server, but only because that's unique to the wee hours when it stops (NMS Server stops). Is anyone else seeing 1.4 crashing?

This is an amazingly maddening platform.

1 Accepted Solution

Accepted Solutions

Rob Johnson
Cisco Employee
Cisco Employee

We in the TAC have seen problems with Prime backups causing server crashes.  This was mostly due to backing up to an SFTP repo....

Obviously try disabling the backup for a couple of runs to see if the server stays up.

When the NMS server process crashes there is in fact a core file left on the localdisk that can help lead to the cause of the crash...

View solution in original post

12 Replies 12

Stephen Rodriguez
Cisco Employee
Cisco Employee

When you upgraded, did you do a full upgrade and import your DB?  Or did you spin up a new instance and just import your maps and relearn templates?

HTH,
Steve

------------------------------------------------------------------------------------------------
Please remember to rate useful posts, and mark questions as answered

HTH,
Steve

------------------------------------------------------------------------------------------------
Please remember to rate useful posts, and mark questions as answered

Hi Steve-

Ap per Cisco upgrade docs, we did the in-place upgrade. Works great (as measured by PI standards) by day. Stopping at night.

-Lee

Meh....that inplace upgrade.....just not a good idea. 

Try this....

1.) export the maps

2.) spin up a new VM and load 1.4 clean

3.) add all the WLC back

4.) import the maps

5.) discover templates

let that run and see if it still crashes at night.  if it doesn't then you can rehost the license(s).

HTH,
Steve

------------------------------------------------------------------------------------------------
Please remember to rate useful posts, and mark questions as answered

HTH,
Steve

------------------------------------------------------------------------------------------------
Please remember to rate useful posts, and mark questions as answered

Steve- thanks, and I know that you didn't make the world and are just living in it so please don't take this personally. If the in place upgrade isn't a good idea, then WHY IN THE HELL does Cisco provide that as a course of action? Which is a cousin question to WHY IN THE HELL does Cisco put out crappy WLC code that end users only find out also isn't a good idea to use by coming to these forums?

It's just freakin' maddening, the whole culture of "tolerated public-facing suck" but to get the real story you need to find a backchannel voice of the company. This seems to be the hallmark of the wireless business unit these days. It's just bewildering.

Now that I have that out of my system- thank you again. I'll consider devoting yet anouther couple of man-days to what you recommend, and will make plans to buckle up for the absolutely bizarre ride that is "rehosting of licenses" with Cisco- which usually involves these sorts of messages: "we sent you something and assume you are now tickled pink. If we don't hear from you in the next 17 miliseconds, we're closing this issue so you'll likely have to start the whole exercise in futilty over tomorrow becuase our SLA requires we bungle the licenses no less than three times before getting right, thus wasting our time and yours as an added convenience to you."

Again- it's nothing against you. People can only get burned so many times by this product set and the odd climate at Cisco that goes with it before they start getting pissed.

-Lee

No worries man.  This is just what I've found works the best for me, from doing upgrades at customers, my lab and speaking with my colleagues.

HTH,
Steve

------------------------------------------------------------------------------------------------
Please remember to rate useful posts, and mark questions as answered

HTH,
Steve

------------------------------------------------------------------------------------------------
Please remember to rate useful posts, and mark questions as answered

Thanks, again.

Lee,

It's good to see others express their concerns. Like Steve, we have to deal with multiple clients. Believe me... It's not good when we do our first few upgrades and it fails. We learn and pass out input on the forums for other to seek out. I will not do another upgrade within a lot of risk added to the project. With any VM that I have to touch, snapshot will save your behind!!!! So when an upgrade fails, revert back and learn.... BETTER OFF BRINING UP A NEW VM:)

It's not just PI that we all have work around for. Doing this day in day out helps us know what really works and what doesn't. There are many different variables out there and we don't have all the answer for everything.

I don't have any issues with licensing. I talk to them directly but I update the ticket with the vUDI of the new VM. I make sure the license works before I get off the phone with them. Of course I would Have the VM spun up and at the GUI where you add the license.

Just say what I say... My VM died and I need to rehost my license:)

Sent from Cisco Technical Support iPhone App

-Scott
*** Please rate helpful posts ***

Thanks, Scott- the information is appreciated. It feels like Cisco's developers/doc writers and TAC are frequently miles apart. If in-place upgrade is bound to be a likely problem by TAC's estimation, why not get with the developers and remove that as a recommended option? Same same with iffy WLC code. Frequently engineers express that a specific version of code is problematic, yet it stays up on the site for download. Would be nice if: 1. this product set wasn't so buggy and laborious to keep up 2. the stuff that Cisco engineers seem to know that the developers and SEs don't could somehow make it out front faster, rather than having a system where customers have to get burned first before they make it to places like the forums to find "the real" way things should done.

Again- thanks.

Lee

Rob Johnson
Cisco Employee
Cisco Employee

We in the TAC have seen problems with Prime backups causing server crashes.  This was mostly due to backing up to an SFTP repo....

Obviously try disabling the backup for a couple of runs to see if the server stays up.

When the NMS server process crashes there is in fact a core file left on the localdisk that can help lead to the cause of the crash...

Thanks Rob- we suspected similiar, had it confirmed with TAC. We still have other issues after the upgrade (no AVC/select report data showing on 2 of our 3 PI boxes, post upgrade and thousands of new MFP alerts from WLC showing despite no controller changes being made) but hopefully the crashing is done. But why SFTP OK on 1.3, but not on 1.4?

So you are backing up to an SFTP repo? I would put money that's what's causing the crash...

As for why the SFTP issue was fixed in 1.3 and didn't carry over to 1.4?  Beats me.... and if it is still happening, TAC would certainly need to reopen or open a new bug for 1.4

As for the mgmt frame protection alerts.  I would probably identify a few controllers where the alerts are coming from, remove the WLCs then add them back in then see what happens w/ the alerts

Also, as the guys noted above::

1) export your maps

2) run ncs db reinitdb so as to wipe the old stuff out

3) add the WLCs

4) import the maps

This way you are starting fresh and don't have a bunch of patch crap and upgrade crap in the way so if you can do this, by all means then do it.

If you concentrate on just that task and not 57 other things one morning like we always have to do, you could have the system completely rebuilt in just a few hours and be starting anew----

Yes- we were backing up into an SFTP repo all through 1.3, and saw in nothing about needing to change that when prepping for 1.4. Is definately the cause of the crash. We changed to FTP and have been good for last two nights. On the MFP alerts- seems to be a common issue https://supportforums.cisco.com/thread/2234717 and at least one other thread about it. With classes starting tomorrow again, I'm not really keen on doing much else to upset our apple cart in the name of experimentation, will wait for clear guidance from TAC.

Here's the kicker- if Cisco would be clear about making the fresh install the preferred method to go to new versions, we'd have made time to do that. But instead, we dicked around following directions provided to get to 1.4.1 AFTER late release of WLC code needed for 3700 APs, then get to suffer the usual bugs, TAC cases, and mixed messages to get to the point where we settle for whatever's left because our maintenance window is blown. It just doesn't feel like what you should get from "the market leader". I don't even mean to be snarky or mean, it's just incredulous what you have to contend with as a customer of Cisco Wireless network management. After years of dealing with this, there is zero trust to ever put switches in the same framework in the name of being unified.

At the same time, the support forums are appreciated. I truly don't what what the point of opening support tickets is anymore, as the answers come faster and more reliably here.

-Lee

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card