04-10-2013 01:13 AM - edited 03-01-2019 10:58 AM
Morning All,
Hoping someone can help, I have a strange situation that occurs with my new UCS installation.
Configuration is:
UCS Chassis connected to a pair of 6248UP Fabric Interconnects, using 2208XP IO Modules.
I connect from this to a distribution 3750 stack, I have configured 2 etherchannels on the 3750 stack, and on the UCS Manager, these are configured to pass all the relevant VLAN's, I have three blades (B200M3) installed into the chassis with VMWare ESXI 5 installed.
Once they have all been set up they can communicate both ways to and from the network without problem, both the vmware networks and the LAN.
Overnight (without any changes) 2 of the blades (slots 2 and 3) stop communicating to the network. I can get them working again by making a few changes to the network settings and all will be ok until the next day.
I am at a loss as to what can be causing this.
Any help would be great.
Thanks
Chris
Solved! Go to Solution.
04-10-2013 08:00 AM
That should fix your issue. UCS will not forward unknown unicast, so if a UCS blade/VM MAC address ages out on your 3750's, the outside world will not be able to reach it. Under normal/production operation servers are normally chatty enough to keep the aging timers from depleting so you'll likely only see this at this time during the install when there are few/no VMs sending/receiving. Another option is to increase the aging timers on the 3750.
Let me know it goes tomorrow.
Regards,
Robert
04-10-2013 05:05 AM
Please connect to the UCSM CLI and collect the following output:
connect nxos
show cdp neighbor
show port-c sum
show int trunk
Paste here.
Regards,
Roberrt
04-10-2013 05:18 AM
hi, Thanks for your reply, please see the information requsted
trl-secure-A(nxos)# show cdp neighbors
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device ID Local Intrfce Hldtme Capability Platform Port ID
trl-secure-A(nxos)# show port
port port-channel port-profile port-security
trl-secure-A(nxos)# show port-channel summary
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
S - Switched R - Routed
U - Up (port-channel)
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------
7 Po7(SU) Eth LACP Eth1/27(P) Eth1/28(P)
1284 Po1284(SU) Eth NONE Eth1/1/5(P) Eth1/1/7(P)
1285 Po1285(SU) Eth NONE Eth1/1/6(P) Eth1/1/8(P)
1286 Po1286(SU) Eth NONE Eth1/1/1(P) Eth1/1/3(P)
1287 Po1287(SU) Eth NONE Eth1/1/2(P) Eth1/1/4(P)
1290 Po1290(SU) Eth NONE Eth1/1/9(P) Eth1/1/11(P)
1291 Po1291(SU) Eth NONE Eth1/1/10(P) Eth1/1/12(P)
trl-secure-A(nxos)# show in
in-order-guarantee incompatibility install interface inventory
trl-secure-A(nxos)# show interface tr
transceiver trunk
trl-secure-A(nxos)# show interface trunk
--------------------------------------------------------------------------------
Port Native Status Port
Vlan Channel
--------------------------------------------------------------------------------
Eth1/15 1 trunking --
Eth1/27 1 trnk-bndl Po7
Eth1/28 1 trnk-bndl Po7
Po7 1 trunking --
Veth693 2 trunking --
Veth698 1 trunking --
Veth699 1 trunking --
Veth701 2 trunking --
Veth703 1 trunking --
Veth705 1 trunking --
Veth707 2 trunking --
Veth709 1 trunking --
Veth711 1 trunking --
Eth1/1/33 1 trunking --
--------------------------------------------------------------------------------
Port Vlans Allowed on Trunk
--------------------------------------------------------------------------------
Eth1/15 1-2,5,200
Eth1/27 1-2,5,200
Eth1/28 1-2,5,200
Po7 1-2,5,200
Veth693 2
Veth698 1
Veth699 2,200
Veth701 2
Veth703 1
Veth705 2,200
Veth707 2
Veth709 1
Veth711 2,200
Eth1/1/33 1-2,5,200,4044,4047-4049
--------------------------------------------------------------------------------
Port Vlans Err-disabled on Trunk
--------------------------------------------------------------------------------
Eth1/15 none
Eth1/27 none
Eth1/28 none
Po7 none
Veth693 none
Veth698 none
Veth699 none
Veth701 none
Veth703 none
Veth705 none
Veth707 none
Veth709 none
Veth711 none
Eth1/1/33 none
--------------------------------------------------------------------------------
Port STP Forwarding
--------------------------------------------------------------------------------
Eth1/15 1-2,5,200
Eth1/27 none
Eth1/28 none
Po7 1-2,5,200
Veth693 2
Veth698 1
Veth699 2,200
Veth701 2
Veth703 1
Veth705 2,200
Veth707 2
Veth709 1
Veth711 2,200
Eth1/1/33 1-2,5,200,4044,4047-4049
--------------------------------------------------------------------------------
Port Vlans in spanning tree forwarding state and not pruned
--------------------------------------------------------------------------------
Eth1/15 --
Eth1/27 --
Eth1/28 --
Po7 --
Veth693 --
Veth698 --
Veth699 --
Veth701 --
Veth703 --
Veth705 --
Veth707 --
Veth709 --
Veth711 --
Eth1/1/33 --
trl-secure-A(nxos)#
04-10-2013 05:22 AM
Do you have CDP enabled on the 3750 stack interfaces? No output in the show CDP neighbors.
I'd also like to see the interface config for the 3750 ports (both Port Channel and member interfaces).
show run int x/y
show run int po x
Robert
04-10-2013 05:55 AM
Thanks again, no CDP isn't enabled by default on this one, I have enabled it to facilitate a fix.
Here's the info.
Firstly the Neighbors report
trl-secure-A(nxos)# show cdp neighbors
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device ID Local Intrfce Hldtme Capability Platform Port ID
U12CoreRed.trlsecure.l Eth1/27 124 R S I WS-C3750G-24T Gig1/0/15
U12CoreRed.trlsecure.l Eth1/28 125 R S I WS-C3750G-24T Gig2/0/18
trl-secure-A(nxos)#
--------------------------------------------------------------------------------------------------------------------------
Port-channel 7 Info:
U12CoreRed#show int port-channel 7
Port-channel7 is up, line protocol is up (connected)
Hardware is EtherChannel, address is 0022.916c.7492 (bia 0022.916c.7492)
Description: UBS FabA
MTU 1500 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is auto, media type is unknown
input flow-control is off, output flow-control is unsupported
Members in this channel: Gi1/0/15 Gi2/0/18
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 3000 bits/sec, 4 packets/sec
3850680 packets input, 3822146418 bytes, 0 no buffer
Received 254378 broadcasts (172854 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 172854 multicast, 0 pause input
0 input packets with dribble condition detected
14766733 packets output, 7313073697 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
U12CoreRed#show int g1/0/15
GigabitEthernet1/0/15 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 0022.916c.730f (bia 0022.916c.730f)
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:26, output 00:00:04, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 2000 bits/sec, 4 packets/sec
234954 packets input, 99988405 bytes, 0 no buffer
Received 166587 broadcasts (86599 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 86599 multicast, 0 pause input
0 input packets with dribble condition detected
6851946 packets output, 600462501 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
54387 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
U12CoreRed#show int g2/0/18
GigabitEthernet2/0/18 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 0022.916c.7492 (bia 0022.916c.7492)
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:12, output 00:00:33, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
3615893 packets input, 3722198373 bytes, 0 no buffer
Received 87958 broadcasts (86411 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 86411 multicast, 0 pause input
0 input packets with dribble condition detected
7924984 packets output, 6714119929 bytes, 0 underruns
0 output errors, 0 collisions, 5 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
-------------------------------------------------------------------------------------------------------------------------
Port-channel 8 info
U12CoreRed#show int port-channel 8
Port-channel8 is up, line protocol is up (connected)
Hardware is EtherChannel, address is c8f9.f9e3.118f (bia c8f9.f9e3.118f)
Description: UBS FabB
MTU 1500 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is auto, media type is unknown
input flow-control is off, output flow-control is unsupported
Members in this channel: Gi1/0/16 Gi3/0/15
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 4000 bits/sec, 6 packets/sec
212497 packets input, 65688627 bytes, 0 no buffer
Received 182191 broadcasts (173635 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 173635 multicast, 0 pause input
0 input packets with dribble condition detected
9493059 packets output, 930235990 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
U12CoreRed#show int g1/0/16
GigabitEthernet1/0/16 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 0022.916c.7310 (bia 0022.916c.7310)
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:13, output 00:00:01, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 2000 bits/sec, 3 packets/sec
117044 packets input, 32002148 bytes, 0 no buffer
Received 95561 broadcasts (87266 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 87266 multicast, 0 pause input
0 input packets with dribble condition detected
5234624 packets output, 480601569 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
54421 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
U12CoreRed#show int g3/0/15
GigabitEthernet3/0/15 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is c8f9.f9e3.118f (bia c8f9.f9e3.118f)
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:20, output 00:00:21, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 1000 bits/sec, 1 packets/sec
95525 packets input, 33704654 bytes, 0 no buffer
Received 86702 broadcasts (86441 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 86441 multicast, 0 pause input
0 input packets with dribble condition detected
4262744 packets output, 450231499 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
04-10-2013 05:58 AM
I need the "show run int x/y" for the interfaces, not the "show int x/y"
Robert
04-10-2013 06:32 AM
oops
please find the running config for the interfaces. thanks again.
U12CoreRed#show run interface g1/0/15
Building configuration...
Current configuration : 191 bytes
!
interface GigabitEthernet1/0/15
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,5,200
switchport mode trunk
channel-protocol lacp
channel-group 7 mode active
end
U12CoreRed#show run interface g2/0/18
Building configuration...
Current configuration : 191 bytes
!
interface GigabitEthernet2/0/18
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,5,200
switchport mode trunk
channel-protocol lacp
channel-group 7 mode active
end
U12CoreRed#show run interface g1/0/16
Building configuration...
Current configuration : 191 bytes
!
interface GigabitEthernet1/0/16
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,5,200
switchport mode trunk
channel-protocol lacp
channel-group 8 mode active
end
U12CoreRed#show run interface g3/0/15
Building configuration...
Current configuration : 191 bytes
!
interface GigabitEthernet3/0/15
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,5,200
switchport mode trunk
channel-protocol lacp
channel-group 8 mode active
end
04-10-2013 06:39 AM
Enable Port Fast on the 3750 interfaces. Not the main cause of your issue, but it's recommended when connected to UCS. Should be something like "spanning-tree portfast edge trunk"
As for the connectivity issues, the only thing I can think of is the blades MAC address is aging out fo the 3750 MAC table.
What is the OS of the blades having the issue?
How are the blades OS interfaces configured? (Teamed together, or sepearate interfaces, vSwitch etc?)
When you say they lose connectivity, does that mean from the blade you can't ping outwards or from the network you can't reach your blades?
Robert
04-10-2013 06:59 AM
I have enabled the portfast as recommended.
The MAC addresses aging out might be a possible answer as the entry on the ARP table has cleared and only now that I try to ping the devices it shows as incomplete. is there a way of stopping it "aging out"?
The end O/S is ESXi, when the MAC address drops from the 3750 netiher end can ping each other (ie LAN to blade and blade to LAN).
The OS config is pairs of virtual nics (one from each Fabric) going to a vSwitch, vlan'd.
it's one of those annoying issues that just frustrates
04-10-2013 07:05 AM
There's a config problem somewhere. The VM should always be able to ping outward, even if the "quiet VM" effect occurs and the MAC ages out, as soon as the VM initiates communication it will arp out for the gateway which should rebuild the ARP table upstream.
Are the vSwitch uplinks conifgured as default teaming options (Route based on virtual Port ID)? Hopefully you're not trying to use an IP Hash on the vSwitch uplinks, which will not work with UCS.
Robert
04-10-2013 07:21 AM
There are no VM's yet on the servers, all three are configured as the default image using the custom Cisco build for UCS servers. (i.e. Route based on virtual port, second nic in "standby adapter").
A simple restart of the management network is "usually" enough to kick the link back to life, the odd thing is that server 1 (slot 1) almost NEVER has this problem (only once and I believe it was misconfigured at the time). the servers in slot 2 and 3 are always getting this issue overnight.
I totally agree that the symptoms point to a misconfiguration, but the fact that server one is ok, and the others have profiles cloned from the first it's a real mystery.
04-10-2013 07:25 AM
Next time you get a host in this state - can you leave it like this? That would be the best time to tshoot this. From what I gather the UCS side sounds fine, I'd be sniffing around ESX/vSwitch for the config issue. If you simply restart Mgmt service on the host to get things back online, that means UCS pinning and VLANs are all fine. This is a host-side issue.
Any reason why you're using one NIC as standby and not using both A/A?
Robert
04-10-2013 07:31 AM
Hi Robert,
It was in this state the whole time during this, I think you might have hit the answer as on the third server rather than restarting the interface I simply ran the test from the management interface and this seemed to sort the problem immediately, I wonder if perhaps as these are in a default install, the servers are just too quiet and they simply age out.
I am installing a base windows server on all boxes as frankly they chat non stop - that might solve the issue and in a live environment the problem will simply not be there (ie heartbeats etc).
I'll update tomorrow when the servers have a quiet night and see if this doesn't cure the problem.
thanks a lot for all your help so far.
Chris
04-10-2013 08:00 AM
That should fix your issue. UCS will not forward unknown unicast, so if a UCS blade/VM MAC address ages out on your 3750's, the outside world will not be able to reach it. Under normal/production operation servers are normally chatty enough to keep the aging timers from depleting so you'll likely only see this at this time during the install when there are few/no VMs sending/receiving. Another option is to increase the aging timers on the 3750.
Let me know it goes tomorrow.
Regards,
Robert
04-11-2013 06:44 AM
Well, this didn't entirely fix the problem - the management MAC's for the same 2 servers dropped again from the MAC table, however the virtual servers I added remained pingable, so I take that as a minor victory.
I have now added the ESX hosts to the cluster, I believe that the HA services will be more than enough to keep the MAC from aging.
I will close this down tomorrow assuming this has now resolved the issue.
Chris
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide