We rebooted many times trying

chancao01 · ‎10-26-2014

Hi All,

We have a perplexing problem with our EMC VNX storage, 9148, Emulex, and Sparc T5 Solaris 11 configuration and don't know where to look.

When we provision a storage group from the VNX to a two node cluster -- without any clustering software -- we observe that "format" would take over 5 minutes for 1200 paths/140 LUNs. This problem bounces back and forth between the two server. The same command would take 4 seconds on the other node.

When we put Veritas SFRAC on, the problem gets worse. It causes all sort of isssues from slow startup, booting, shutting down server, and formatting disks.

It must be something simple and we've gone through testing numerous scenarios trying to isolate the problem. We've engaged all vendors we have contact with.

I've tried forcing all interfaces to 8Gb but that didn't help either. Connect speed is 8Gb when interfaces are set to auto as is the default.

Appreciate any suggestion you may have.

Chan

Walter Dey · ‎10-26-2014

Hi Chan

You seem to have a serious performance problem ? correct ?

- VNX is active / standby per lun ?

- how is vnx connected to MDS

- how many paths does the server have to the lun

- what kind of FC multipathing software are you using

Cheers

Walter.

-

chancao01 · ‎10-26-2014

Hi Walter,

This is a new installation and the issue seems widespread. The slow "format" problem is only an indication that something is wrong, before we install any multipathing or clustering software. If we install those, they simply work marginally.

The VNX is ALUA mode 4. Each LUN has 8 paths, 2 switches, 2 Emulex ports from two HBA's.

Walter Dey · ‎10-27-2014

Can you please post a diagram of the setup !

Have you tested the performance with just one path ? (could it be path trashing ?).

Certain hosts such as Windows, Solaris and AIX will require the system to rediscover their disks in order for ALUA to be enabled. It is recommended that the system be rebooted once the change is made.

chancao01 · ‎10-27-2014

We rebooted many times trying to fix this. I reduced the paths to 4 and the problem remain. We uninstalled Powerpath and it didn't help. We did without SFRAC 6.1.1 and didn't help.

When we had problem with the 'format' command on either node, the trick is to take the node out of the storage group and add it back in. Then at some point later the problem comes back.

Walter Dey · ‎10-27-2014

- is it correct, that you have a dual fabric, with one MDS connecting the host as well as the storage ?

- and the storage is dual homed to both MDS ? and if yes, why ?

- do you really believe, that MP with 8 links is necessary ? MP sometimes creates more problems than it solves.

- I repeat again: do baselining with one link, eliminating MP !

In summary: I am 100% convinced, that this problem has nothing to do with MDS; it is related to storage controller and/or host and/or MP (EMC Powerpath).

chancao01 · ‎10-27-2014

The diagram goes something like this

SPA/SPB --- FE 0a,2a <----> 9148 (A) <----> HBA0

--- FE 0b,2b

SPA/SPB --- FE 3a,5a <-----> 9148 (B) <-----> HBA1

FE 3b,5b

Even FE ports go to switchA, odd ports go to switchB. This server has a lot of storage so we want FE ports for IOPs.

When I turn off all SAN ports, the servers do "format" and return quickly.

When format runs slow, the DTrace shows in the user stack trade gives very low count for various function calls

CPU ID FUNCTION:NAME

326 83159 :tick-5sec

libc.so.1`ioctl+0x8

format`do_search+0x250

format`main+0xa8

format`_start+0x108

66

In this case, it takes about 5 minutes to show 140 LUNs with 1200 paths.

The other node took 4 seconds to return and the number is not 66 but somewhere around 6000.

Thanks for the suggestion. Will try to strip down to just one path.

Update: Symantec gave us the following to try in /etc/system and the situation improves somewhat.

set ssd:ssd_retry_on_reservation_conflict=0x0

Walter Dey · ‎10-27-2014

If I understand you correctly, it means, that controller A is dual homed to MDS fabric A resp. B; and the same applies for Controller B.

My understanding of ALUA is, that you mix optimzed and non optimized path to a lun.

I would therefore try yet another setup:

SPA --- FE 0a,2a,3a,5a <----> 9148 (A) <----> HBA0

SPB --- FE 0b,2b,3b,5b <-----> 9148 (B) <-----> HBA1

chancao01 · ‎10-28-2014

No, SPA's FEs shouldn't be all connected to switchA because it wouldn't provide redundancy. There are 4 paths from each SP to two switches.

Even numbered ports go to switchA and odd ports, switch B.

The LUNs are half and half spreaded among the two SP's.

I'm too convinced that something is amiss in the setup before we load any clustering or MP software. We got rid of PowerPath because ALUA/ASL/DMP can spread I/O among optimized and non optimized paths.

Due to the setting set ssd:ssd_retry_on_reservation_conflict=0x0 that Symantec suggested, the situation has improved somewhat and my colleague is happy. This saps the winds off my sail a bit but I'll continue to look because I want to get to the root of the matter.

Walter Dey · ‎10-28-2014

I was aware of the redundant design; btw. have a look at the Cisco Validated Design for UCS and Netapp, which in Fig 2 shows exactly your setup (its N5k instead of MDS)

http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/Virtualization/flexpod_deploy.html#wp569152

chancao01 · ‎10-30-2014

Hi Walter,

I think we're still having problem with storage because installation of RAC/CRS is not going well. Nodes can't control the voting disks.

I ran the command found on Joerg's http://www.c0t0d0s0.org and got some errors:

kstat -p | grep -i ",err" | grep "sd" | grep "Hard" | cut -f 2 | awk '{sum+=$1} END {print sum}'
4866
kstat -p | grep -i ",err" | grep "sd" | grep "Hard" | cut -f 2 | awk '{sum+=$1} END {print sum}'
3653

There are also Invalid Tx Word Count errors on the HBA.

The later has something to do with sync protocol packets. The LUN errors may possibly be cabling issue, loose connections.

Chan

Walter Dey · ‎10-30-2014

Hi Chan

Can you please post the exact error message for

There are also Invalid Tx Word Count errors on the HBA.

The later has something to do with sync protocol packets. The LUN errors may possibly be cabling issue, loose connections.

Walter.

MDS 9148 Configuration