cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1016
Views
0
Helpful
13
Replies

SAN Resiliency and Stability

aijazbeigh
Level 1
Level 1

We are having HDS Disk Array and Cisco MDS Environment to cater SAN Requirement for UNIX Production Servers.we are having 2 no.s 9506, 2 no.s 9509 & 2 no.s 9216.we have HDS 9585, NSC55 Disk Arrays for Production requirements.we dedicately use HDS 9585 for our main business critical application (BCA) and NSC55 for rest of Production Environment.

BCA:-

We have 2 DB Servers with 2 Emulex HBA (single port) each.hds 9585 has 2 Controllers with 4 FC ports each.each of 9509 & 9506 have 2 blades each with 16 fc ports.

Now, 2 no.s 9509, 1 server & 1 hds 9585 in 1 building & 2 no.s 9506, 1 server & 1 hds 9585 in another building.

HDS Configuration is exactly similar in both the buildings.

BCA Configuration :

1st fabric/ 1st VSAN consistes of:-

9509 (1) <------> 9506 (1)

9585s(C0a,b,c,d)

Servers (HBA A)

2nd Fabric/ 2nd VSAN consists of:-

9509 (2) <------> 9506 (2)

9585s(C1a,b,c,d)

Servers (HBA B)

_________________________________________

9216s Switches were used for rest of Production servers but bcoz of some reason we had to... and we connected both 9216 & all 9506 & 9509 SAN Switches via ISL Connections but Production VSAN was different from rest of BCA VSANs.

_________________________________________

Now, it's single Fabric but with multiple VSANs. can anybody suggest if this in anycase wud affect our SAN Resiliency and Stability of our Production Fabric ?

any suggestions wud be highly appreciated.

or does anybody think that Dual Fabric wud have been much better choice than Single fabric with multiple VSANs ? and why ?

13 Replies 13

stephen2615
Level 3
Level 3

We went through the same discussions when I put our switches in. I wanted a single fabric purely for management reasons but our "old school provisioning" people were very against this concept. They wanted multiple fabrics because that was what they had used since before time began.

Cisco were consulted but they did not really have any reasons not to use a single fabric in a management environment. My Cisco training even promoted the concept of single fabrics.

Even though I have a single fabric, no VSANS (except VSAN 0001) are going across our ISLs. Eg, VSAN 100 is on one switch and VSAN 101 is on another switch. The only traffic across the ISL is management.

This is going to change very shortly when I start a new project. I will be VSANing across a 9509 and 9216A (two each) but I will not be using the VSAN across two 9509's (yet). So in theory, even though Fabric Manager reports one fabric, both paths are completely seperate.

I would love to get a definitive answer on the question you proposed (which is very similiar to my layout) but so far no one is completely willing to say it is a good or bad idea. I put it down to yes for 99.95 percent of the time, it will be ok but we (the vendor) wont accept responsibility for anything that goes wrong in the other 0.05% of the time if you choose to do it your own way.

That may seem a bit cynical but I have a Sun 25K that was purchased because of its dynamic reconfiguration abilities but Sun (in a consultancy role) kept saying that I should bring the domains down to configure in new storage. If I wanted to do that, why spend millions of dollars for high availability. I completely ignored them and configured in a HDS USP 1100 fully configured. Nothing went wrong but I know what I am doing.

I believe the only problem you may have if something goes horribly wrong is related to the FCNS but it will automatically restart.

Stephen

Hi Stephen ! Thanks for your reply. we are also using VSAN0001 for ISL Connectivity in whole single fabric.all ISL connections belong to this very VSAN.but we have to keep Rest of Production fabric utilising 6 SAN Switches different from BCA & VSAN0001...what is the best way possible ? I guess you are left with no option but connect all switches through ISL and create a Single fabric...or let it be different fabrics and create Production VSANs in each of the fabric.I just can't understand why & how Single fabric with multiple VSANs would be a threat to SAN Stability & Resiliency.rather VSAN is itself means Virtual SAN or Virtual Fabric...

Can anybody from Cisco as well reply to our query or throw some light on this situation.it will be much appreciated.

Thanks...

G'day,

Having seperate vsans means you have seperate fabric services anyway! (virtually :)

The key thing to ensure is that you have redundancy between your switches.

You should have multiple E ports bundled to create a portchannel incase you lose a physical path or a single failure wont cause a reconverge of your fabric/s.

One key thing to keep in mind is having multiple fabics protects you when you want to upgrade firmware on switches.

There could also be corner cases where faulty equipment may cause CPU to go through the roof on a fabric due to flooding etc.

My preference is to create two physically seperate fabrics and then create vsans inside the two and then create a special ISL from the two fabrics on an exclusive vsan for management.

This has saved me in the past when the provisioning team blew away zonesets... :\

Cheers

Andrew

MATE!!!

I would love to see a best practice publication on this very topic. Surely someone will have enough experience to completely comment on this and perhaps made a few dollars on the side by putting out a book.

When you say multiple fabric protects you when you want to upgrade the switches, what exactly do you mean? Surely the redundant supervisor cards on the 9500 series would overcome any problems with that?? The edge switches with only one sup card are obviously an issue there.

All our systems have alternate paths to our storage through seperate switches. So in the off chance we loose a switch in one VSAN, there is an alternate route through the other switches in the other VSAN.

However, this is where its becomes a mystery to me. The fabric has the same name as one of our switches. If that switch dies, what (if any) impact would that switch have on the fabric if each of the switches use their own VSAN's which are in the switches config?

I would be interested in knowing how you have two physically seperate fabrics but still have the management VSAN. Each switch had it own fabric and when I joined them together through the ISL through VSAN0001, fabric manager created one fabric.

Sometimes I feel as though I don't exactly know what I am doing with connecting the core switches together. This is quite different to our older Qlogic switches when life was simple.

Cheers

Stephen (in sunny Canberra)

"Surely the redundant supervisor cards on the 9500 series would overcome any problems with that??"

> Well not always. There have been some cases where due to a s/w bug all linecards in a 9500 have rebooted. So best practice is to at least have two switches with multipathing on the hosts.

"The fabric has the same name as one of our switches"

> The fabric name is derived from prinicpal switch. There will be one PS per VSAN and its quite likely that the PS for all your VSANs are on one particular switch. If that switch dies it is no drama, the other switch will become the PS through a Build Fabric (BF) process. This is non-disruptive for the other switch, ie data frames are switched even while a BF is in progress.

"I would be interested in knowing how you have two physically seperate fabrics but still have the management VSAN."

> any VSAN that is common on both switches and allowed across the ISL trunk could be regarded as a 'management VSAN' if there are no hosts/storage in that VSAN. Typically VSAN 1 is the common VSAN because it is the one that ships on MDS from factory. Hosts/storage are usually put in other user created VSANs, so therefore people often use VSAN 1 as a management only VSAN (for fabric manager to see both switches on same map). With VSAN, you can be assured that any disruptive error in VSAN 1 will not propogate into your host/storage VSANs. But as I said, VSAN does not solve everything and will not protect against an entire switch failing or linecards going down. Two switches are better than one!

Dallas

Sunny Sydney TAC

Howdy,

Or worst case... vsan's wont save you from storage admins with sticky fingers! :)

cheers

Andrew (from a not so sunny Geelong :)

hi, gr8 day!!!here are my few queries abt Cisco:-

1) If your PS is down, then BF will choose another PS but fabric will still be segmented at that switch.but are we assure that traffic will not be affected in other VSANs which do not involve that PS.

2) I know this is nothing to do with Cisco but we are using HDS 9585 and we didn't set Failover n clustering Parameters on Host Group options, will that anyway affect host side failover until we have maaped same LUNs on both controllers...we are using Vxdmp multipathing at host side.

3) also can a port/host/storage be member of a multiple VSAN ? I don't think so...but have we got any tech in Cisco abt that...

4)

1) traffic will not be affected in other VSANs.

3) Any pwwn can belong to only one VSAN. If you have host with two HBA (two pwwn) you could put one in VSAN X and other in VSAN y. Same rule for storage.

G'day,

Point 1)

If your principal switch dies and you do not have switch priority set, the switch with the lowest wwn will become the principal.

When the principal switch has been selected a RSCN (Registered State Change Notification). This is not really a problem these days except for tape :(... Tape being sequential doesnt like to have any interuptions :-(

Point 2) No idea on HDS... sorry :)

Point 3)

A port can only be in one VSAN at any one time. If you have the enterprise license you can setup IVR (Inter Vsan Routing) which will allow targs/inits from one vsan to talk to other init/targs in other vsans.

1)In an Single Fabric Environment with multiple VSANs, is there any component or configuration, which can bring all VSANs down ? I don't mean that but just for sake on information.

2)Also, in our earlier Brocade setup, when we do rezone/reconfiguration, it makes few servers to loose the disk visibility for few seconds but that doesn't appear to happen with Cisco SAN Switches? any specific reason...can anybody throw some light on this...

Thanks,

Sandeep

I think that there may be some way to cause a failure in a single fabric but it probably relates to user incompentancy more than anything else. As I mentioned earlier there is a known bug relating to FCNS and CPU going balistic but it is apparently quite rare. I seriously doubt that Cisco would be prepared to say yes or no on this.

Any reconfiguration of a fabric/zoneset/zone will more than likely cause some degree of disruption but it all depends on how heavily utilised the fibre is. I have done these changes on systems but with multipathing, our systems are not prone to problems. More often than not, changing a zoneset has not caused any disruption at the host level and I am talking about 16 fibre pairs that run about 80 MBs constantly on a single host. I expect something to happen but so far nothing has. Any changes using the previous Qlogic switches was measured in micro seconds of disruption but our systems were designed for it.

Single pathed instances obviously are at risk of disruption for a short time.

If you dont trust a single fabric, don't use it. I have a single fabric because it suits my management style and the expansion of our (very mission critical) systems. I dont really need a single fabric but it makes my life easier and it is my responsibility.

Stephen

Hi,

Thanks for that information but what do you think abt the following situation:-

you have 2 9509s in 1 building and 2 9506s in 2nd building and you connect both pairs(i.e. 1 9509 and 1 9506 by 2 ISLs) and create a port channel.you create a dual fabric env. but single VSAN on each of the fabric include both initiators, targets and ISLs?

how would you rate this set up and what are the repurcussions taking in consideration that all servers have dual HBAs and connecting to different fabrics.

Thank you for all your help ....

Regarding single points of failure in the single fabric topology - if you are using IVR, you have *one* IVR zoneset per fabric. Even if you have multiple service groups, I believe it is still one zoneset (can someone confirm that?) If you get a problem with an IVR zoneset activation for whatever reason, you are affecting the whole fabric. The zoneset activation only sends RSCNs to affected end devices so this is fine - assuming there are no problems. It may seem old fashioned, but if you have daylight between your two fabrics you *know* nothing within the switches can bring both sides down.

Review Cisco Networking for a $25 gift card