Catalyst 4006, Microsoft Network Load Balancing, Compaq Network Teaming

sgercas · ‎10-18-2001

Hello,

I hope, that anybody can help me.

So, there are three components, which works together but with unwanted side effects.

From Cisco - two Catalyst 4006 (with WS-X4232-L3).

From Microsoft - Windows 2000 Advanced

server with Network Load Balancing (aka clustering solution).

From Compaq - two servers with dual port NICs and Compaq Network Teaming configured (aka Adapter Fault Tolerance solution).

Servers attaches to both Catalyst (to the same VLAN).

NLB can operate in unicast and multicast mode.

With NLB in unicast and without Compaq Network Teaming - OK. (without Compaq Network Teaming - because it requires multicast mode).

NLB in multicast + Network Teaming + Catalyst (with cluster virtual IP and MAC address entered in WS-X4232-L3 ARP table) - works, but...

As sniffer shows, four ICMP Echo request from client (client and servers are on different VLAN's) to virtual cluster IP address generates more than 70

Echo Replay from cluster! Ping to dedicated cluster address - as expected four to four.

I tried TCP connection. Sniffer showed many TCP packets (zero length) with the same sequence number (?!) coming back to PC from cluster IP.

Any ideas what to check?

How to be sure that this not an issue with Cisco misconfiguration, o lets say NLB behavior?

Any suggestions?

Saulius.

sgercas · ‎10-18-2001

Some more details about this issue.

So, things go crazy, then clusters virtual IP and MAC address is entered in both 4232-L3.

I am trying to explain:

1. Compaq Network Teaming keeps one NIC port active.

2. Both servers NICs active ports are connected to the same switch (lets say "A").

3. If clusters virtual IP and MAC address is entered in 4232-L3 of the "A" - everything is OK.

Just add IP and MAC to second 4232-L3 - and packets get duplicated/bouncing.

I looked at that about ping thinks W2K.

I started "ping" with "-t" key:

1. Ping'ing to NICs IP - W2K performance monitor shows 1 Echo packet per second;

2. Ping'ing to cluster IP - jumps from 1 to 127 Echo packets per second;

Definitely this relates to TTL=127.

One more "ping -t" - Echo p/s jumps near to 255.

It looks like looping packets.

But where and why?

What I am missing?

Best regards,

Saulius.

ggehle · ‎10-23-2001

Saulius,

Can you provide details on how you have configured your switches and routers for multicast?

Thanks,

-Greg

sgercas · ‎10-26-2001

Hello Greg,

Multicast configuration was "by default".

I opened the case with TAC, and got the right answer.

Excerpt from the answer:

"..what seems to be happening is that the l2 switches seem to be treating the dest packets for the multicast mac address as broadcast packets and seem to be flooding them across the network. since the adjoining switch receives it, it sends it to the l3 blade which then resends it to the switch and now this switch l2 side broadcasts it across to the first switch and the cycle continues.

the way to fix this i believe would be to define static or permanent multicast mac address entries and tie them to the cluster ports and the etherchannel ports going across to the other switch.."

So, with "set cam permanent .." I tied multicast mac address to the cluster ports and the etherchannel ports going across to the other switch, and traffic dropped to normal.

But, I have another issue.

Cluster integrity breaks if servers connects to different switches.

I have four servers, which forms the cluster.

If servers active connections goes to the same switch – they can form the cluster.

If one server active connection goes to another switch than last three – it can’t join the cluster.

It looks like “set cam” blocks cluster heartbeat from switch to switch.

I am still researching that’s going on.

Saulius.

jhumeston · ‎11-07-2001

I am running into a similar issue. Our demima here is that we would like to build each server of the cluster into its own fault tolerance. I would like to be able to put one NIC into one switch, while the other NIC is in the other switch. This elliminates teaming, and etherchannel. And because the cluster software wont work with both NICs having their own IP addresses.

Now, we are using VI in a SAN for the heartbeat, which elliminates that problem

When using teaming, if you need to take down a switch for maintenance then you have to break the cluster. Does someone know how to avoid this?

sgercas · ‎11-08-2001

Hello,

I can down one of the two switches for the maintenance without breaking the cluster, because teaming rearrange active links to the switch which stays online.

Regarding my previous posts - now everything is OK.

The trick is to tie multicast mac address to the cluster ports and the etherchannel ports between switches.

Saulius.

sstrack · ‎11-15-2001

Hi Saulius!

Most the phenomenons you described and the 'so called' solutions can be found in the white paper of NLB on Microsoft page.

http://www.microsoft.com/windows2000/docs/NLBtech2.doc