Re: Nexus 5548 Optimal Fail Over / Convergence design

kstamandk · ‎02-27-2011

New to Nexus in the Datacenter and looking for real life experiences and / or suggestions.

Environment;

- Dual SUP720 (8 port 10G blade in each) 6500 Distribution Layer - no VSS

- 2 N5K 5548

- 4 N2K 2232

Design;

- 2 Uplink from each N5K, 1 to each 6500 configured as vPC - 6500 links to N5K are Port-Channels

- 2 Uplink from N2K into single N5K, basically forming an upside down "U"

- 2 Uplink from server, 1 to each N2K (AIX servers, no Teaming / no LACP - design is based on "Paired VIO Servers", 1 Primary and 1 Backup. Hosted LPARS (Clients in IBM speak). Clients traffic fails over and back from Primary to Backup VIO server if Physicl Network Link to Primary VIO Server fails and then recovers)

- Client Servers (LPAR) have multiple Virtual NICs - 1 has a default gateway, other 2 do not have default gateway configurations (IBM Recommendation)

Tests;

- Continuous Pings from PC connected outside of Distribution Layer to N2K connected Servers

Observations and Questions;

- Observation 1 - Link Failures from N2K to Primary VIO server (Server Cable Pull, SFP+ Pull, FEX Reload, Port-Channel to FEX Shut - No Shut)

>> Fail Over from Primary VIO to Secondary VIO connection - Ping tests show only 1 Ping Loss to all 3 addresses assigned to Client Servers

>> Recovery back to Primary VIO from Secondary connection - 1 Ping Loss to Client Server's address configured with Default Gateway, but other 2 loose communications for up to 5 minutes before they recover.

###> In these failed scenario, we have found on the interface (from N2K "show mac address-table") to the Secondary VIO Server that the MAC Address for the failing Client Servers are not being released. Clear ARP or any kind of timeout configuration on Nexus does not change things

- Question 1 - is this a Nexus configuration / design issue or Server configuration / design issue?

>> Based on MAC Address advertisement staying with Secondary VIO Server even after the Primary interfaces is up and running again, belief is this is a Server issue (Server team feels differently)

<><><><>

- Observation 2 - N5K Failures (Reload, Software Upgrade Reset), Power Pull)

>> Single Ping Loss when N5K goes down, but Intermitent Ping losses of (1 Ping Loss, 2 - 3 Ping Loss) until the N5K fully recovered.

Question 2 - would we better off having uplinks from N5K go to a single 6500? Faster recovery times?

>> Goal is to have a design where taking an N5K out for maintenance (software upgrades, hardware failure / replacements, ...) does not disrupt application communication to downstream servers.

Attached is the drawing and associated configs.

Advice / Feedback appreciated.

rmeans · ‎05-30-2012

Did you find a solution?

I am having some what similar issues with IBM VIO Server.

deshongs · ‎05-30-2012

Do you have multiple VLAN's extended to the IBM physical boxes? If so I would think you would have to use a default gateway for each VIO that is on a VLAN different from that of the physical adapter.

Surya ARBY · ‎05-30-2012

Can you ask to disable preemption on VIOs ?

It's not strange to see the mac address persistent on the 2k; as the port always remain "up". The mac address will be flushed on the second N2k when the first VIO will start a communication with a machine attached on the second N2k in the same Vlan; as it will trigger then a mac flapping due to the mac address being learnt through the uplink of one 6500.

ppflaum12 · ‎06-01-2012

If VIO supports lacp I would prefer to see a vpc from the 2k to the VIO to eliminate Arp changes because of a failover

Sent from Cisco Technical Support iPhone App