cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Join Customer Connection to register!
1319
Views
0
Helpful
11
Replies
IngGerardo013
Beginner

Virtual Servers go down network problem?

Hello guys,

We are having troubles with some virtual servers (virutal machines) these servers are shuting down with out any advice while are working! we are cheking the network, some day a Cisco TAC guy told me about show proccess show processes cpu history, somebody can help me to underestand better the output? this is the output:

 

SW-WSC4506E#show processes cpu history

    2222222222223333322222222222222222222222223333322222222222
    6611111555557777733333777773333355555333330000044444333335
100
 90
 80
 70
 60
 50
 40             *****
 30 **     **********     *****     *****     *****          *
 20 **********************************************************
 10 **********************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5
               CPU% per second (last 60 seconds)

    3334333333255644544334342423654555232344332584244423252323
    0934581878993873091848279998588123888951289411837099869997
100
 90
 80                                             *
 70              *              *               *
 60            * *              **              *        *
 50            *##* **   * * *  ******    *    **   *    *
 40  * *** *** ###****** * * * **##*#* * *** * *** *** * * * *
 30 ***#****#**########**#*#*#**######****#**#*###*###*#*#*#*#
 20 ##########################################################
 10 ##########################################################
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5
               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%

    8888788888778887798998878768777867788778788888888788789687888978877668
    1075925845590321972411819192072078986766546255006015551777753384189687
100                  *
 90   **  ** *       * ** * *          **  *  * **  *  * ** * ** *       *
 80 *************** ******* *  * * * **************** ***** ***********  *
 70 **********************************************************************
 60 **********************************************************************
 50 **********************************************************************
 40 **********************************************************************
 30 ##*#***************###**#*##******************************************
 20 ######################################################################
 10 ######################################################################
   0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
             0    5    0    5    0    5    0    5    0    5    0    5    0
                   CPU% per hour (last 72 hours)
                  * = maximum CPU%   # = average CPU%

SW-WSC4506E#

 

 

 

11 REPLIES 11
william.riley
Beginner

Are they shutting down or just become unreachable via ping to the esx host etc? If they are becoming unreachable, are you using HP servers with broadcom network cards? If so there is a patch or you can log into your vcenter and on the hosts effected turn off netq. This general affects gen 8 servers with broadcom nic's.  We have netq disabled on all our servers. Only do this if you are not using 10g nics , just 1 gig nic cards. If you are using 10gig nics then apply the patch.  I don't enough information than to just guess what you might be experiancing at this point.

Virtual Machine servers go down IT staff people has to power on the machines, they have 2 or 3 physical servers, they think that possibly this be a network problem, but swicht core never go down, just it showed this loggs: 

 

17w2d: %C4K_EBM-4-HOSTFLAPPING: Host E0:63:E5:0D:69:D5 in vlan 104 is flapping between port Gi2/1 and port Gi2/4
17w2d: %C4K_EBM-4-HOSTFLAPPING: Host E0:63:E5:0D:69:D5 in vlan 104 is flapping between port Gi2/4 and port Gi2/1
17w2d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/4 and port Gi2/3
17w2d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/3 and port Gi2/4
17w2d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/4 and port Gi2/3
17w3d: %C4K_EBM-4-HOSTFLAPPING: Host 28:BA:B5:48:BF:03 in vlan 104 is flapping between port Gi2/3 and port Gi2/4
17w3d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/4 and port Gi2/3
17w4d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/4 and port Gi2/3
17w4d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/3 and port Gi2/4
17w4d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/3 and port Gi2/4
17w4d: %C4K_EBM-4-HOSTFLAPPING: Host 84:00:D2:E2:1E:30 in vlan 104 is flapping between port Gi2/4 and port Gi2/3

.

.

.

.

That's strange, I have never heard of a network issue causing a virtual machine to power off. I would have them look through the logs in vcenter. I have heard of network issues causing them to become unresponsive to pings etc but not to power off . Sounds more of a virtual environment issue.

 

 

Do you think that thinks like:

               Overcome cpu switch core process 

               Loops on the network

               Security ataccks 

 

Can shut down some virtual machines (not all)

 

These logs seem pretty old in regard to your event.

Can you correlate the exact shutdown of a virtual machine with some of the switch's log.

I suggest you to use the same NTP server so we can know exactly what happened when.

Also post the configuration of both: Gi2/3 and Gi2/4.

Configuration on these  interfeces are the next:

 

interface GigabitEthernet2/3
 switchport trunk encapsulation dot1q
 switchport mode trunk
end

SWICORE#sh  int gi 2/3
GigabitEthernet2/3 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet Port, address is 0016.c76f.9d6a (bia 0016.c76f.9d                                             6a)
  Description:
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 3/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, link type is auto, media type is 1000BaseSX
  input flow-control is off, output flow-control is off
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:02, output never, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 1552000 bits/sec, 1151 packets/sec
  5 minute output rate 12962000 bits/sec, 1505 packets/sec
     7786792460 packets input, 1452385158566 bytes, 0 no buffer
     Received 45223763 broadcasts (17736448 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 input packets with dribble condition detected
     11400561168 packets output, 11872955123001 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier
     0 output buffer failures, 0 output buffers swapped out
SWCORE#

 

SWCORE#sh  int gi 2/4
GigabitEthernet2/4 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet Port, address is 0016.c76f.9d6b (bia 0016.c76f.9d6b)
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, link type is auto, media type is 1000BaseSX
  input flow-control is off, output flow-control is off
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:01, output never, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 309
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 390000 bits/sec, 360 packets/sec
  5 minute output rate 3234000 bits/sec, 457 packets/sec
     8986605362 packets input, 1502670511048 bytes, 0 no buffer
     Received 163219493 broadcasts (149605658 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 input packets with dribble condition detected
     8978472091 packets output, 4465607504694 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier
     0 output buffer failures, 0 output buffers swapped out
SWCORE#sh run int gi 2/4
Building configuration...

Current configuration : 150 bytes
!
interface GigabitEthernet2/4
 switchport trunk encapsulation dot1q
 switchport mode trunk
end

 

on these interfaces we have switches connected

Gi2/3 and Gi2/4 are both used for connecting ESXIs to your infrastructure, right?

I believe this is more to troubleshoot of how you configured your VSwitches? I would focus more on that now.

Meanwhile did you experience those issues?

 

No, these interfaces are connected to switches, Servers are connected to trunking ports, with the minimal required configuration I mean putting switchport mode trunk and switport trunk encapsulation dot1q, I mentioned these logs(flaping...) because there was the only logs that appear when virtual machines go down... In fact I dont know if these logs was there before the problem or appearing during the problem...

IT staff people has a excel file where does not show any loop, they have very well register this part.

Hi,

apparently you have one ore more Virtual Machine with two vNICs in bridging mode...So yo have a bridging loop in your network...
 

1977bjorn
Beginner

start with the "show proc cpu sort" instead if you think the switch is having problems. Then you see what process are taking up much cpu. But as other commented, start with the vm:s - Make sure you have set up the network nics correctly. Check the logs in the vm:s - what are they saying?

 

//Björn