10-28-2011 08:27 AM - edited 03-07-2019 03:06 AM
Hi everyone,
I'm having trouble finding any information on what these logs could mean:
2011 Oct 27 16:17:48 XALBCVMNX01 %FWM-2-STM_LOOP_DETECT: Loops detected in the network among ports Po3 and Po4 vlan 100 - Disabling dynamic learn notificat
ionsfor 180 seconds
2011 Oct 27 16:18:11 XALBCVMNX01 %KERN-3-SYSTEM_MSG: SSE call for cmd = 3 failed. rc = -1076428946[bfd6ff6eH] - kernel
2011 Oct 27 16:20:48 XALBCVMNX01 last message repeated 5 times
2011 Oct 27 16:20:48 XALBCVMNX01 %FWM-2-STM_LEARNING_RE_ENABLE: Re enabling dynamic learning on all interfaces
2011 Oct 27 16:28:11 XALBCVMNX01 %KERN-3-SYSTEM_MSG: SSE call for cmd = 3 failed. rc = -1076428946[bfd6ff6eH] - kernel
Would this cause any issues?? Downtime, performance?
I have 2 Nexus 5000s configured in VPC, and Po 3 and Po 4 connect to an ESX host (VMWare). These logs are showing up on pretty much all my switches that have the same configuration. Po3 & 4 are configured as edge trunk ports... See config below. Port 4 is configured the same way.
Ethernet1/3 on Switch 1 is configured as a port channel (VPC) with E1/3 on Switch 2. Same thing for E1/4 on Switch 1 in a VPC with E1/4 on Switch 2.
All these ports go to 1 ESX host with 4 CNA ports.
interface port-channel3
switchport mode trunk
vpc 3
spanning-tree port type edge trunk
interface Ethernet1/3
switchport mode trunk
channel-group 3
XALBCVMNX01# sh int po 3
port-channel3 is up
vPC Status: Up, vPC number: 3
Hardware: Port-Channel, address: 0005.9b73.41ca (bia 0005.9b73.41ca)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is trunk
full-duplex, 10 Gb/s
Beacon is turned off
Input flow-control is off, output flow-control is off
Switchport monitor is off
Members in this channel: Eth1/3
Last clearing of "show interface" counters never
30 seconds input rate 8725464 bits/sec, 792 packets/sec
30 seconds output rate 8766648 bits/sec, 1089 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 6.97 Mbps, 594 pps; output rate 3.38 Mbps, 594 pps
RX
17265037196 unicast packets 826703 multicast packets 7425410 broadcast packets
17273289309 input packets 22486323485032 bytes
11186880177 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
16836762780 unicast packets 181757250 multicast packets 12933897 broadcast packets
17031453927 output packets 10978157509678 bytes
4582723828 jumbo packets
0 output errors 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble
97689001 Tx pause
2 interface resets
XALBCVMNX01# sh vpc 3
vPC status
----------------------------------------------------------------------------
id Port Status Consistency Reason Active vlans
------ ----------- ------ ----------- -------------------------- -----------
3 Po3 up success success 1,99-100,50
0
XALBCVMNX01# sh int vfc 3
vfc3 is up
Bound interface is port-channel3
FCF priority is 128
Hardware is Virtual Fibre Channel
Port WWN is 20:02:00:05:9b:73:41:ff
Admin port mode is F, trunk mode is on
snmp link state traps are enabled
Port mode is F, FCID is 0xbd0004
Port vsan is 500
5 minute input rate 3479256 bits/sec, 434907 bytes/sec, 99 frames/sec
5 minute output rate 65736 bits/sec, 8217 bytes/sec, 29 frames/sec
3275825238 frames input, 5289641626032 bytes
0 discards, 0 errors
5554313000 frames output, 9557291820036 bytes
0 discards, 0 errors
Interface last changed at Sun Feb 6 08:08:43 2011
Any ideas what could be causing these logs?
Thanks for the help.
10-28-2011 09:51 AM
I had a similar issue on my N5ks... This does in fact impact performance. After you get the all clear message "
Re enabling dynamic learning on all interfaces", according to some docs I read on Cisco's site, the switch does a MAC table FLUSH. meaning it has to relearn all MAC addresses. While its doing this, traffic is broadcast.
Check the config on the ESX Host and make 100% certain you have the correct ports in the ether channels. If they are, check your load-balancing scheme.
I had this problem with an IBM AIX Virtualization box trying to do Load-balancing. The Hypervisor was configured for redundancy, but the VM was configured for Active/Active etherchannel... that setup caused all sorts of problems on my network and I had the same behavior until I disabled one of the switchports going to the Hypervisor.
HTH
11-18-2011 08:10 AM
Hi,
We have sort of the same problem here, except that in my case the MACs are flip-flopping between one vPC member and the vPC peer-link, which is strange because the 5000 should not complain about viewing the same MAC on both sides of the vPC...
Have you had any luck in solving your issue ? In your case it sounds like a mismatch between the ESX physical ports and the mappings to the Virtual Switch.
Cheers,
Vincent.
01-10-2012 11:12 AM
Hi,
Sorry for taking forever to reply, but in the end it turns out it looks like it's normal behavior.
I had TAC on a webex and they couldn't see why this was showing up in the logs. Combined with the fact that I had no issues whatsoever with this environment, it doesn't seem to be affecting anything so I left it at that for now.
Thanks again for replying.
06-14-2012 03:49 AM
Hi,
same here after a Peer Keepalive Link failure for 5 seconds (peer timeout) of one N5k.
The impact was really huge.
The Loop came and went till we reloaded the N5k on the other end of the Peer Link.
It's still unclear if it was a bug or hardware issue.
Recording to Cisco the Peer Keepalive Link shouldn't affect the Peer Link in that way!
system: version 5.1(3)N1(1)
If you gathered any new information please share them.
Update:
Had a totally different problem, sry.
One N5k lost for a few seconds connection to every peer (including Peer Keepalive Link, Peer Link, vPC Memberports, non vPC Memberports ...)
The Peer Link and Peer Keepalive Link never got up again properly!
So the N5k thought to be the only active one and produced a nice Loop!
Nachricht geändert durch Manuel Muetsch
06-14-2012 05:02 AM
Hi,
FYI, this problem disappeared for me after upgrading to 5.0(3)N1(1c).
More details at this post : https://supportforums.cisco.com/message/3659814#3659814
You're not in the same release train as I am but you may very well be afftected by the same bug CSCto34674 although it doesn't state whether your version is affected.
Hope this helps,
Vincent.
10-25-2013 01:39 AM
Hi Mahbvh,
I am on the version you upgraded to but the problem persists. it is currently not service affecting.
06-17-2012 12:02 AM
Hi
These message logs mean that some mac address flapping between ports po3 and po4 in vlan 100 very quickly. So switch consider this as a network loop and stops learning addresses for some time to protect its control plane.
There can be many reasons for that.
First try to reconfigure both you port-channles from static to LACP mode.
Then check yoir EXS loadbalance algorithm - if both vpc port-chanels are connected to the same ESX and EXS sends traffic for different destinations through different links - same source mac may appear on both port-channels.
Check following commans to see how many mac moves occur between interfaces.
sh mac address-table notification mac-move
If none of the above won't help,
You may also need to open a service requet with the TAC
HTH,
Alex
04-26-2016 10:31 AM
Thanks Alex!. ESX load balance was the solution for us.
Environment:
Multiple Dell M1000e blades chassis with multiple Dell MIO aggregate modules in each chassis. Each Dell MIO aggregate module having two 10GB interfaces in a port-channel running to a pair of Nexus 3Ks (6.0.2.A6.5).
Issue:
Repeatedly logging the two errors below every few minutes.
%FWM-2-STM_LOOP_DETECT:
Disabling dynamic learning notifications for a period between 120 and 240 seconds on vlan
%FWM-2-STM_LEARNING_RE_ENABLE_VLAN
Solution:
Only one of the Dell Chassis had the issue.
e.g. Nexus N3K only logged errors on two specific port-channels connected to the same specific Dell blade chassis. After looking at Vmware Vswitch configurations the chassis where the issue was originating from had a vmware host with different hashing configured on its Vmware Vswitch than the other Vmware hosts.
After changing the hosts VMware VSwitch load balancing from "Route based on IP hash" to “Route based on the originating virtual port ID" the issue has not returned. No more logs or disabled learning.
01-14-2013 05:14 AM
ESX VLAN Beacon Probing can cause up-port flooding behavior if the vSwitch looses beacons. This is called 'shotgunning' in VMware's terminology.
When we hooked up our HP blade centers to Nexus, we had occassional events when DRS would vMotion a VM and it would seem to land on a new blade, cause a Nexus LOOP_DETECT, and the VM would go off-net for 180 sec.
Disabling Beacon Probing on our vSwitch and vDS up-ports seems to have resolved the problem.
While this is passing thru HP Virtual Connect, the real issue seems to be an interoperability issue between Nexus Loop Detect and ESX Beacon Probing.
01-16-2013 04:22 AM
Dean is right
This is recent behvior of ESX
You can get more info on the link below:
HTH,
Alex
01-16-2013 07:04 AM
Your mileage with this problem will vary depending on your network topology. If your connecting an ESX Server to Nexus on a single portchannel you probably won't ever see a probelm even with Beacon Probes enabled. If you have dual portchannels like we do from the top of the Virtual Connect Switch, then Beacon Probing is likely to cause LOOP_DETECT events (see: http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c02656171/c02656171.pdf Scenario 3)
Looking back we now believe we had been getting this periodic flooding behavior on our old switch plant, we do not think this is new to ESX. We would see sudden jumps in discard events and we now suspect beaconing was briefly flooding all along. Hooking our blade centers to Nexus introduced new loop prevention logic and made the flooding more noticeable.
We have a lot of VLANs in our ESX infrastructure (~80). Originally we used Beacon Probing in our old switch plant to make sure higher level switches were functioning on all VLANs all the way to the router. Nexus changes the nature of that problem and the probes are no longer as valueable.
08-20-2020 08:20 AM
what type of loop prevention config can we configure to the switch to prevent this kind of issue on vm switch? we are having issue on the same vm
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide