cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1867
Views
15
Helpful
12
Replies

Vmotion issue when using vPC

watkins.david
Level 1
Level 1

i have an issue when running a 40 gig port- channel using Vpc between a pair of 5596s to esx 5.5 hosts. We can get lacp running properly where I can see on a sh port-chan summary that all 4 connections (2 on one, 2 on the other) are all properly bundled in the 'P' state. A sh Vpc brief shows there is no problem with Misconfiguration.

 

 

the problem is when we Vmotion the guest VM drops off-line sometimes for seconds sometimes for several minutes. I've read that the destination esx host will send out a RARP broadcast to advertise the VM has moved. But it doesn't seem to be doing that, even though we have the option enabled. Sometimes one 5k will learn the new Mac location and sometimes neither will. It also doesn't appear to be dependent on the Mac timer.

 

has anyone seen this before? I really hate the idea of removing Vpc and going with a 2x20 lacp port-channel, 1 for Vmotion/admin and 1 for guest traffic. I want a nice big fat port channel for everything!

1 Accepted Solution

Accepted Solutions

Keny Perez
Level 8
Level 8

I saw something like this a few days ago and it was a problem with FWM (Forwarding Manager) struggling with another issue unrelated to the ports we were working on.

In my case, there was  port-channel config to use LACP on the Nexus side while the other end didnt support LACP, causing a loop that FWM could not handle.

Do one more test and then do a "show log | i STM" and see if you have a loop being detected or something else that could be keeping FWM busy enough to become unable to handle the MAC move.

 

-Kenny

View solution in original post

12 Replies 12

Walter Dey
VIP Alumni
VIP Alumni

Which virtual switch are you using ? vswitch, DVS,.....

Which load balancing algorithm do you use ?
 

Dvswitch with a lag group.

 

source-dest-IP is the load-balance algorithm set on both sides.

Keny Perez
Level 8
Level 8

I saw something like this a few days ago and it was a problem with FWM (Forwarding Manager) struggling with another issue unrelated to the ports we were working on.

In my case, there was  port-channel config to use LACP on the Nexus side while the other end didnt support LACP, causing a loop that FWM could not handle.

Do one more test and then do a "show log | i STM" and see if you have a loop being detected or something else that could be keeping FWM busy enough to become unable to handle the MAC move.

 

-Kenny

while I did see STM messages, since they weren't involving the ports i was interested in i discounted them. I thought the tac engineer was on glue when he suggested something similar. I'll see if I can't remove LACP configuration from ports that aren't running it and see what I get.

If they dont support LACP you end up having them in "I" (Individual) state which may cause loops and the processing of those packets is done by fwm and that is when you may see these issues

 

-Kenny

ugh, you just ruined my day :). We have mountains of servers that don't have LACP enabled.

This is usually ports on the FI or the Upstream Nexus switch not the blades... just wanted to be sure we are in the same page...

 

-Kenny

all links between our cisco stuff has LACP enabled properly :)

The assumption was that anything plugged downstream into 5ks would be capable of running LACP, but that isn't always the case, nor did we always have time to implement.

Although this does beg the question as to why we haven't seen this as an issue earlier. Perhaps something in esx 5.5 is a bit different?

 

Trying to plug in our test servers to a 5k that has nothing on it, so there aren't any loop pauses and we'll see what we get.

cool, let us know and get back with feedback for other users with the same issue.. meanwhile, if you found any of the answers here useful, rate them.. that helps other users determine what is helpful and what is not based in your experience with the problem.

 

I found everything here very valuable :)

 

-Kenny

plugging the 2 esx servers with one big fat vpc with lacp into a test pair of 5ks is working....

 

So now the problem is how do we remove the STM messages in our production 5ks. I could not get these messages to occur 'on the fly' to validate if removing stale port-channel configuration to servers that dont support fixes it.

 

Then i saw this:

http://www.cisco.com/c/en/us/support/docs/switches/nexus-5000-series-switches/116200-qanda-nexus5000-00.html

 

apparently, these messages are based on switch-wide threshold behavior. Which means i'll have a really hard time replicating it in the test environment. It also means that i can't just fiddle with one esx server in prod either. It sounding like, IF removing port-channel config is the answer, its an all or nothing exercise.

show mac addr notification mac-move.

 

this will show the number of mac-moves on a switch. The issue here was that all the esx hosts were set to IP hash, and that algorithm will dictate that some IP conversations will go down one link, others will go down another. If the port-channel isn't up (which because we assume lacp, it doesnt) the switch will see this as a mac move between its local physical interface and the vpc-peerlink

Without VPC this isn't really an issue, in the sense that both switches should never be in the STM message state at the same time. But when we do use port-channeling, with or without lacp, the 5ks share forwarding information. As soon as we moved a 4.1 esx host into a raw portchannel it also had a vmotion issue.

 

The resolution was to remove the 'channel group xxxx passive' to 'channel group xxxx on'. Across the entire switch pair.

 

As soon as the last host was done the stats on the show mac addr notification mac-move went down significantly, STM messages stopped appearing and our 40 gig lacp port-channel to the esx 5.5 environment works!

Thanks for the feedback... that's how a simple misconfig can give you misleading symptoms that take you all over the place :)

 

 

-Kenny

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: