Re: Is there any function built in Nexus 5010 to detect intermit - Page 2

DennisLee1 · ‎11-24-2011

Hi experts,

I found intermittent link down(20~40 seconds average) occurred about 1~10 times every month. SAP reported a lot of active connections are disconnected and I used a batch to ping and found "requested time out" about 30 seconds.

And Windows, SQL server, Nexus 5010 do not show any errors. We run cluster and cluster does not fail over.

And I don't know which cables or nics cause this issue. When it happened, almost all servers are unreachable. For example, SQL server 1 -> SQL server 2, IBM HS22-1 -> SQL server 1. However, some connections are not dropped sometimes. It varies each time.

PS: I run this topology last year without any problems but it started intermittent link down from 2011/1/7. Because there is no errors in Nexus 5010, it is difficult to troubleshoot. Cisco TAC recommended us to implement virtual port channel yesterday.

Could I use "errdisable detect cause" to detect what caused the intermittent link down? Is there any error logs or switch parameters/status can use to troubleshoot?

Alexander Maroukian · ‎12-15-2011

Hi Dennis,

The connections between servers(192.168.28.11, 192.168.28.12, 192.168.28.110, 192.168.28.115) looks OK from the second capture.

Best regards,

Alex

DennisLee1 · ‎12-15-2011

Hi Alex,

Yes, I have discussed 2 times these issue with Microsoft. I look forward to your opinion.

--- first mail --

Hi Marty,

What did you find out? I think intermittent link down happened between dl980-1 <=> nexus-1 or nexus-2

Because if you investigate tshark.cap in tccap40, it is ok from tccap40 to Nortel(95), Nortel(96), nexus(251), nexus(252). That’s to say, we should focus on nexus or dl980 teaming driver? Any opinions?

Dl980-2 => dl980-1

Wireshark: missing and out of order

Tccap40 => dl980-1

Tccap40 => nexus(251) ok

 Nexus (252) ok

 Nortel(95) ok

 Nortel(96) ok

--- second mail --

Hi Ellis,

Please confirm with Ted how to setup ether channel with portfast or edge port correctly.

We will schedule the downtime for enabling portfast tomorrow.

Question: Why does my team loose connectivity for the first 30 to 50 seconds after the Primary adapter is restored (fallback)?

Answer: Because Spanning Tree Protocol is bringing the port from blocking to forwarding. You must enable Port Fast or Edge Port on the switch ports connected to the team.

Here list 4 steps and please make sure you did it all.

http://www.cisco.com/en/US/tech/tk389/tk213/technologies_configuration_example09186a008089a821.shtmlHere

Here list portfast command for Nexus 5010. The Ethernet interface must be configured as PortFast (use the spanning-tree port type edge trunk command).

DennisLee1 · ‎12-15-2011

Hi Alex & all,

I have uploaded the latest config of Nexus 5010. Would you please check if we have followed best practice(http://www.cisco.com/en/US/tech/tk389/tk213/technologies_configuration_example09186a008089a821.shtml ) to setup ether channel for HP network configuration utility(Teaming). I really appreciate youe help. I upadloed "show running-config" and "show techsupport". The file name is config.zip, FYI.

ftp://ftp01.quantatw.com/

user: sapftp password: wju123

DennisLee1 · ‎11-28-2011

Hi experts,

additional info. We are running Broadcom smart load balancing teaming in IBM HS22. Every support staff keep told me it is impossible to run active/active in one blade. However, please check this link. (http://support.dell.com/support/edocs/network/P29352/English/teamsvcs.htm

----

Smart Load Balancing (SLB)

Smart Load Balancing™ provides both load balancing and failover when configured for Load Balancing, and only failover when configured for fault tolerance. It works with any Ethernet switch and requires no trunking configuration on the switch. The team advertises multiple MAC addresses and one or more IP addresses (when using secondary IP addresses). The team MAC address is selected from the list of load balancing members. When the server receives an ARP Request, the software-networking stack will always send an ARP Reply with the team MAC address. To begin the load balancing process, the teaming driver will modify this ARP Reply by changing the source MAC address to match one of the physical adapters.

Smart Load Balancing enables both transmit and receive load balancing based on the Layer 3/Layer 4 IP address and TCP/UDP port number. In other words, the load balancing is not done at a byte or frame level but on a TCP/UDP session basis. This methodology is required to maintain in-order delivery of frames that belong to the same socket conversation. Load balancing is supported on 2-8 ports. These ports can include any combination of add-in adapters and LAN-on-Motherboard (LOM) devices. Transmit load balancing is achieved by creating a hashing table using the source and destination IP addresses and TCP/UDP port numbers.The same combination of source and destination IP addresses and TCP/UDP port numbers will generally yield the same hash index and therefore point to the same port in the team. When a port is selected to carry all the frames of a given socket, the unique MAC address of the physical adapter is included in the frame, and not the team MAC address. This is required to comply with the IEEE 802.3 standard. If two adapters transmit using the same MAC address, then a duplicate MAC address situation would occur that the switch could not handle.

Receive Load Balancing is achieved through an intermediate driver by sending Gratuitous ARPs on a client by client basis using the unicast address of each client as the destination address of the ARP Request (also known as a Directed ARP). This is considered client load balancing and not traffic load balancing. When the intermediate driver detects a significant load imbalance between the physical adapters in an SLB team, it will generate G-ARPs in an effort to redistribute incoming frames. The intermediate driver (BASP) does not answer ARP Requests; only the software protocol stack provides the required ARP Reply. It is important to understand that receive load balancing is a function of the number of clients that are connecting to the server via the team interface.

SLB Receive Load Balancing attempts to load balance incoming traffic for client machines across physical ports in the team. It uses a modified Gratuitous ARP to advertise a different MAC address for the team IP Address in the sender physical and protocol address. This G-ARP is unicast with the MAC and IP Address of a client machine in the target physical and protocol address respectively. This causes the target client to update its ARP cache with a new MAC address map to the team IP address. G-ARPs are not broadcast because this would cause all clients to send their traffic to the same port. As a result, the benefits achieved through client load balancing would be eliminated, and could cause out of order frame delivery. This receive load balancing scheme works as long as all clients and the teamed server are on the same subnet or broadcast domain.

When the clients and the server are on different subnets, and incoming traffic has to traverse a router, the received traffic destined for the server is not load balanced. The physical adapter that the intermediate driver has selected to carry the IP flow will carry all of the traffic. When the router needs to send a frame to the team IP address, it will broadcast an ARP Request (if not in the ARP cache). The server software stack will generate an ARP Reply with the team MAC address, but the intermediate driver will modify the ARP Reply and send it over a particular physical adapter, establishing the flow for that session.

The reason is that ARP is not a routable protocol. It does not have an IP header and therefore is not sent to the router or default gateway. ARP is only a local subnet protocol. In addition, since the G-ARP is not a broadcast packet, the router will not process it and will not update its own ARP cache.

The only way that the router would process an ARP that is intended for another network device is if it has Proxy ARP enabled and the host has no default gateway. This is very rare and not recommended for most applications.

Transmit traffic through a router will be load balanced as transmit load balancing is based on the source and destination IP address and TCP/UDP port number. Since routers do not alter the source and destination IP address, the load balancing algorithm works as intended.

Configuring routers for Hot Standby Routing Protocol (HSRP) does not allow for receive load balancing to occur in the adapter team. In general, HSRP allows for two routers to act as one router, advertising a virtual IP and virtual MAC address. One physical router is the active interface while the other is standby. Although HSRP can also load share nodes (using different default gateways on the host nodes) across multiple routers in HSRP groups, it always points to the primary MAC address of the team.

Is there any function built in Nexus 5010 to detect intermittent link down?

Smart Load Balancing (SLB)