So, I have an EMC VNX array in which I have a hung system task trying to pull inventory from it. It appears that it's been "running" for about a month now. Short of booting the appliance, what sort of troubleshooting methods should I be looking at doing? This is the first time I've ran into an issue where a system task has "hung" like this.
I guess I can always log a TAC case on it, but figure maybe someone's got some good things that I should go look for, like killing a particular process on the appliance.
There's an open bug for this.I think its CSCut90279. I had the same problem with VMWare Inventory Collector. It ended up being that UCSD doesnt recover gracefully if theres a network problem during a running task.
My problem was due to UCSD being in one datacenter, and a vCenter in another. The advice from engineering was to go to a multi-node setup and put a service node in each datacenter to avoid networking hiccups.
Lovely. Considering the fiasco I had trying to get a multi-node approach stable, I'm not sure if that's the direction I'm wanting to go without a major amount of testing.
That being said, replace the variables here: VNX connector with the VMware connector and it's likely I just recreated the same bug. Considering the VNX connector needs to have a SSH connection to a device running some EMC software on to communicate with it's arrays, I'm likely staring at the same problem.
To clear this, I assume you just cycled the services or did you have to fully reboot the appliance?
I restarted the services using shelladmin menu. It's entirely possible that only one of the services needs to be restarted, but since I'm pretty much the only user it's not a problem.
I shoved in a service cycle in the last minute and the task ran without an issue this time. Will need to keep an eye out for that, especially with remote sites to deal with. I'll have to continue to look into the multi-node setup and see if it's gotten better since the last time I tried to get it to work.