cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7664
Views
6
Helpful
6
Replies

Cisco 4500 and MS NLB fail-over issues.

towry_support
Level 1
Level 1

Hi,

I'm running a couple of MS TMG firewalls on my Internet Edge on the inside, connected to a pair of 4506's and am having issues failing outbound traffic over to the secondary device. I'm running NLB in Multicast mode and my static ARP entry deifned on both switches. It seems to work ok as the TMG console shows sessions being shared across both servers, that is until I try and fail over to the other device.

If I perform a drainstop or stop on the primary node, traffic does not pass through the backup device. Likewise, if I make the secondary 4506 switch HSRP active and shut the link to the primary server, traffic stops?

The config is very simple, vLAN 140 is configured for outbound Internet traffic, with a single port assigned on each switch. Primary switch is HSRP active and TMG cluster routes all traffic back to the virtual IP on the switches. Actual configs can be prvided is necessary.

Has anybody experienced this issue before? If so can you share your experiences?

Thanks in advance.

Michael

1 Accepted Solution

Accepted Solutions

Not sure why that link did not work.  It's a support forum doc I wrote.  Here's the body:

Catalyst Switch Configuration Options for Microsoft NLB Clustering in Multicast Mode

Switch config for Microsoft NLB 2003/2007 for Multicast Mode with IGMP

NOTE: If you are running an IOS version that does not contain the fix for CSCsw72680 you cannot use PIM on the NLB VLAN SVI.  Only use the IGMP snooping querier.

CSCsw72680 Release Notes:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsw72680

NLB in Multicast mode with IGMP:
1) Create an igmp snooping querier (either globally or on the SVI - platform dependent)
On the 4500, the snooping querier is first supported on 12.2(50)SG
(documented in  release note http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/release/note/OL_5184.html#wp1480270)
2) Add static ARP entries to your switches that route into the NLB  VLAN  (this is still necessary as the clients are going be accessing the  NLB  cluster at its unicast IP which is tied to the multicast MAC).

If  you create a querier on the VLAN SVIs that are connected to  the NLB  clusters and have your NLB cluster running in Multicast Mode w/  IGMP  then you do not need static MAC entries, as IGMP joins will be sent  by  the NLB into the switch which will  program the NLB multicast MAC   addresses dynamically.  All downstream IGMP enabled switches on the NLB   subnet will learn this address. Make sure your NLB multicast mac  address  is RFC complaint - meaning it starts with 0100.5eXX.XXXX.  This  should  be created on the NLB.
Sample IGMP Config on Core:
IGMP Snooping Querier:
(config)#int vlan
(config-if)#ip igmp snooping querier
(this is done globally on some DSBU switches)

Good articles about NLB configuration from MSFT
http://support.microsoft.com/kb/193602
http://support.microsoft.com/kb/197862/
http://technet.microsoft.com/en-us/library/cc783135.aspx

IGMP Support for NLB
http://support.microsoft.com/kb/283028

NLB in Multicast mode without IGMP:

1) Create static ARP entries to your switches that route into the NLB VLAN

2)  Create static MAC entries on all layer 2 uplinks and NLB access ports.   This is necessary as we will not dynamicall learn the multicast MAC of  the NLB because the NLB does not send IGMP joins

NOTE:  There have been some issues recently (post April 2010)  where customers  even with the above configs could not get NLB to work.   Both problems  were Microsoft issues.
Problem 1: Clients could not connect to NLB VIP
Resolution: Apply NLB hotfix http://support.microsoft.com/kb/960916

Problem 2: NLB server could not ping outside of the  subnet.  When sourcing a ping  from the VIP, customer would get “ping  transmit failed. general  failure” on the windows CMD screen.

Resolution: Run following commands on NLB Server from Windows Command Line
“netsh interface ipv4 set interface nlb weakhostreceive=enable” “netsh  interface ipv4 set interface nlb weakhostsend=enable” (Customer ran  these brom the c:\windows\system32 directory - not sure if that's  required or not)

View solution in original post

6 Replies 6

James D Hensley
Cisco Employee
Cisco Employee

Hi Michael -

If you haven't already done so you'll probably want to open a TAC case on this.

NLB in multicast mode can run in one of two modes: multicast or multicast with igmp.  From the switch perspective the difference is that if the NLB is running in multcast mode w/ IGMP we should dynamically learn the NLB multicast MAC address on the uplinks to the switch.  This configuration requires an IGMP snooping querier to be defined on the NLB vlan (and can be one of the 4500s if they are running newer code). If the NLB is not running IGMP then the NLB multicast MACs need to be statically defined on the uplinks to the NLB and the link between the HSRP primary and secondary switch.

I'm not exactly clear on what you're failing over so I can't tell you where the failure lies and in which direction the failure is occurring.  If the NLBs are on vlan 140 and connected to the TMG firewalls then you will want to make sure the NLB multicast MAC is still learned on the port to the NLB/TMG FW after failover.

Thanks James.

Haven't opened a TAC case yet, but may do now.

Just to clarify, I'm currently running NLB in Multicast mode without IGMP. If I try and failover the the internal network (the one connected to the 4500 on vLAN 140), Internet access from the Internal network just stops working.

Are you saying that the best way to configure this would be to remove the static mapping on the 4500 for the NLB virtual MAC and enable Multicast mode with IGMP so it the 4500 can learn it dynamically?

Thanks in advance..

Michael

You will always need the static ARP entries becasue we will never dynamically program a unicast IP to a multicast mac (as this breaks RFC1812).  With NLB in multicast mode with IGMP you will not need to program any static MACs on the NLB uplinks and access ports.

Here's a link to basic Catalyst Switch configuration for NLB in multicast and multicast with IGMP:

https://supportforums.cisco.com/docs/DOC-16620

Hi James,

I am unable to get ot the doc you sent a link for. Please could you provide an alternative path to the file? No need to do this if it is a link to this document http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml as I have already read this.

Also, It would seem that the NLB configuration is correct and that the failover only fails for proxy/Internet access. A 4hour call going through our configurations help prove this.

Once I have reviewed the document you sent a link for (access permitting) I'll mark the post as answered

Thanks

Michael

Not sure why that link did not work.  It's a support forum doc I wrote.  Here's the body:

Catalyst Switch Configuration Options for Microsoft NLB Clustering in Multicast Mode

Switch config for Microsoft NLB 2003/2007 for Multicast Mode with IGMP

NOTE: If you are running an IOS version that does not contain the fix for CSCsw72680 you cannot use PIM on the NLB VLAN SVI.  Only use the IGMP snooping querier.

CSCsw72680 Release Notes:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsw72680

NLB in Multicast mode with IGMP:
1) Create an igmp snooping querier (either globally or on the SVI - platform dependent)
On the 4500, the snooping querier is first supported on 12.2(50)SG
(documented in  release note http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/release/note/OL_5184.html#wp1480270)
2) Add static ARP entries to your switches that route into the NLB  VLAN  (this is still necessary as the clients are going be accessing the  NLB  cluster at its unicast IP which is tied to the multicast MAC).

If  you create a querier on the VLAN SVIs that are connected to  the NLB  clusters and have your NLB cluster running in Multicast Mode w/  IGMP  then you do not need static MAC entries, as IGMP joins will be sent  by  the NLB into the switch which will  program the NLB multicast MAC   addresses dynamically.  All downstream IGMP enabled switches on the NLB   subnet will learn this address. Make sure your NLB multicast mac  address  is RFC complaint - meaning it starts with 0100.5eXX.XXXX.  This  should  be created on the NLB.
Sample IGMP Config on Core:
IGMP Snooping Querier:
(config)#int vlan
(config-if)#ip igmp snooping querier
(this is done globally on some DSBU switches)

Good articles about NLB configuration from MSFT
http://support.microsoft.com/kb/193602
http://support.microsoft.com/kb/197862/
http://technet.microsoft.com/en-us/library/cc783135.aspx

IGMP Support for NLB
http://support.microsoft.com/kb/283028

NLB in Multicast mode without IGMP:

1) Create static ARP entries to your switches that route into the NLB VLAN

2)  Create static MAC entries on all layer 2 uplinks and NLB access ports.   This is necessary as we will not dynamicall learn the multicast MAC of  the NLB because the NLB does not send IGMP joins

NOTE:  There have been some issues recently (post April 2010)  where customers  even with the above configs could not get NLB to work.   Both problems  were Microsoft issues.
Problem 1: Clients could not connect to NLB VIP
Resolution: Apply NLB hotfix http://support.microsoft.com/kb/960916

Problem 2: NLB server could not ping outside of the  subnet.  When sourcing a ping  from the VIP, customer would get “ping  transmit failed. general  failure” on the windows CMD screen.

Resolution: Run following commands on NLB Server from Windows Command Line
“netsh interface ipv4 set interface nlb weakhostreceive=enable” “netsh  interface ipv4 set interface nlb weakhostsend=enable” (Customer ran  these brom the c:\windows\system32 directory - not sure if that's  required or not)

Hi James,

Looks like we have some stability in our TMG environment at last.

The problems detailed at the end of your last post were useful as the symptoms were relevant. A whole host of updates have been implemented, but, the one that seems to have addressed the issue is disabling TCP Checksum and Offload on the internal interface used for the Proxy NLB.

FW service on TMG was needing to be restarted every 7 days before that change, but, we have seen 2 weeks of uptime since thaty change was made.

Fingers crossed that should be the end of it.

Thanks for the feedback and support.

Michael.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco