Solved: Re: Cisco 4500 and MS NLB fail-over issues.

towry_support · ‎04-27-2011

Hi,

I'm running a couple of MS TMG firewalls on my Internet Edge on the inside, connected to a pair of 4506's and am having issues failing outbound traffic over to the secondary device. I'm running NLB in Multicast mode and my static ARP entry deifned on both switches. It seems to work ok as the TMG console shows sessions being shared across both servers, that is until I try and fail over to the other device.

If I perform a drainstop or stop on the primary node, traffic does not pass through the backup device. Likewise, if I make the secondary 4506 switch HSRP active and shut the link to the primary server, traffic stops?

The config is very simple, vLAN 140 is configured for outbound Internet traffic, with a single port assigned on each switch. Primary switch is HSRP active and TMG cluster routes all traffic back to the virtual IP on the switches. Actual configs can be prvided is necessary.

Has anybody experienced this issue before? If so can you share your experiences?

Thanks in advance.

Michael

James D Hensley · ‎05-27-2011

Not sure why that link did not work. It's a support forum doc I wrote. Here's the body:

Catalyst Switch Configuration Options for Microsoft NLB Clustering in Multicast Mode

Switch config for Microsoft NLB 2003/2007 for Multicast Mode with IGMP

NOTE: If you are running an IOS version that does not contain the fix for CSCsw72680 you cannot use PIM on the NLB VLAN SVI. Only use the IGMP snooping querier.

CSCsw72680 Release Notes:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsw72680

NLB in Multicast mode with IGMP:
1) Create an igmp snooping querier (either globally or on the SVI - platform dependent)
On the 4500, the snooping querier is first supported on 12.2(50)SG
(documented in release note http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/release/note/OL_5184.html#wp1480270)
2) Add static ARP entries to your switches that route into the NLB VLAN (this is still necessary as the clients are going be accessing the NLB cluster at its unicast IP which is tied to the multicast MAC).

If you create a querier on the VLAN SVIs that are connected to the NLB clusters and have your NLB cluster running in Multicast Mode w/ IGMP then you do not need static MAC entries, as IGMP joins will be sent by the NLB into the switch which will program the NLB multicast MAC addresses dynamically. All downstream IGMP enabled switches on the NLB subnet will learn this address. Make sure your NLB multicast mac address is RFC complaint - meaning it starts with 0100.5eXX.XXXX. This should be created on the NLB.
Sample IGMP Config on Core:
IGMP Snooping Querier:
(config)#int vlan
(config-if)#ip igmp snooping querier
(this is done globally on some DSBU switches)

Good articles about NLB configuration from MSFT
http://support.microsoft.com/kb/193602
http://support.microsoft.com/kb/197862/
http://technet.microsoft.com/en-us/library/cc783135.aspx

IGMP Support for NLB
http://support.microsoft.com/kb/283028

NLB in Multicast mode without IGMP:

1) Create static ARP entries to your switches that route into the NLB VLAN

2) Create static MAC entries on all layer 2 uplinks and NLB access ports. This is necessary as we will not dynamicall learn the multicast MAC of the NLB because the NLB does not send IGMP joins

NOTE: There have been some issues recently (post April 2010) where customers even with the above configs could not get NLB to work. Both problems were Microsoft issues.
Problem 1: Clients could not connect to NLB VIP
Resolution: Apply NLB hotfix http://support.microsoft.com/kb/960916

Problem 2: NLB server could not ping outside of the subnet. When sourcing a ping from the VIP, customer would get “ping transmit failed. general failure” on the windows CMD screen.

Resolution: Run following commands on NLB Server from Windows Command Line
“netsh interface ipv4 set interface nlb weakhostreceive=enable” “netsh interface ipv4 set interface nlb weakhostsend=enable” (Customer ran these brom the c:\windows\system32 directory - not sure if that's required or not)

View solution in original post

James D Hensley · ‎05-23-2011

Hi Michael -

If you haven't already done so you'll probably want to open a TAC case on this.

NLB in multicast mode can run in one of two modes: multicast or multicast with igmp. From the switch perspective the difference is that if the NLB is running in multcast mode w/ IGMP we should dynamically learn the NLB multicast MAC address on the uplinks to the switch. This configuration requires an IGMP snooping querier to be defined on the NLB vlan (and can be one of the 4500s if they are running newer code). If the NLB is not running IGMP then the NLB multicast MACs need to be statically defined on the uplinks to the NLB and the link between the HSRP primary and secondary switch.

I'm not exactly clear on what you're failing over so I can't tell you where the failure lies and in which direction the failure is occurring. If the NLBs are on vlan 140 and connected to the TMG firewalls then you will want to make sure the NLB multicast MAC is still learned on the port to the NLB/TMG FW after failover.

towry_support · ‎05-24-2011

Thanks James.

Haven't opened a TAC case yet, but may do now.

Just to clarify, I'm currently running NLB in Multicast mode without IGMP. If I try and failover the the internal network (the one connected to the 4500 on vLAN 140), Internet access from the Internal network just stops working.

Are you saying that the best way to configure this would be to remove the static mapping on the 4500 for the NLB virtual MAC and enable Multicast mode with IGMP so it the 4500 can learn it dynamically?

Thanks in advance..

Michael

James D Hensley · ‎05-24-2011

You will always need the static ARP entries becasue we will never dynamically program a unicast IP to a multicast mac (as this breaks RFC1812). With NLB in multicast mode with IGMP you will not need to program any static MACs on the NLB uplinks and access ports.

Here's a link to basic Catalyst Switch configuration for NLB in multicast and multicast with IGMP:

https://supportforums.cisco.com/docs/DOC-16620

towry_support · ‎05-26-2011

Hi James,

I am unable to get ot the doc you sent a link for. Please could you provide an alternative path to the file? No need to do this if it is a link to this document http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml as I have already read this.

Also, It would seem that the NLB configuration is correct and that the failover only fails for proxy/Internet access. A 4hour call going through our configurations help prove this.

Once I have reviewed the document you sent a link for (access permitting) I'll mark the post as answered

Thanks

Michael

James D Hensley · ‎05-27-2011

Not sure why that link did not work. It's a support forum doc I wrote. Here's the body:

Catalyst Switch Configuration Options for Microsoft NLB Clustering in Multicast Mode

Switch config for Microsoft NLB 2003/2007 for Multicast Mode with IGMP

NOTE: If you are running an IOS version that does not contain the fix for CSCsw72680 you cannot use PIM on the NLB VLAN SVI. Only use the IGMP snooping querier.

CSCsw72680 Release Notes:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsw72680

NLB in Multicast mode with IGMP:
1) Create an igmp snooping querier (either globally or on the SVI - platform dependent)
On the 4500, the snooping querier is first supported on 12.2(50)SG
(documented in release note http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/release/note/OL_5184.html#wp1480270)
2) Add static ARP entries to your switches that route into the NLB VLAN (this is still necessary as the clients are going be accessing the NLB cluster at its unicast IP which is tied to the multicast MAC).

If you create a querier on the VLAN SVIs that are connected to the NLB clusters and have your NLB cluster running in Multicast Mode w/ IGMP then you do not need static MAC entries, as IGMP joins will be sent by the NLB into the switch which will program the NLB multicast MAC addresses dynamically. All downstream IGMP enabled switches on the NLB subnet will learn this address. Make sure your NLB multicast mac address is RFC complaint - meaning it starts with 0100.5eXX.XXXX. This should be created on the NLB.
Sample IGMP Config on Core:
IGMP Snooping Querier:
(config)#int vlan
(config-if)#ip igmp snooping querier
(this is done globally on some DSBU switches)

Good articles about NLB configuration from MSFT
http://support.microsoft.com/kb/193602
http://support.microsoft.com/kb/197862/
http://technet.microsoft.com/en-us/library/cc783135.aspx

IGMP Support for NLB
http://support.microsoft.com/kb/283028

NLB in Multicast mode without IGMP:

1) Create static ARP entries to your switches that route into the NLB VLAN

2) Create static MAC entries on all layer 2 uplinks and NLB access ports. This is necessary as we will not dynamicall learn the multicast MAC of the NLB because the NLB does not send IGMP joins

NOTE: There have been some issues recently (post April 2010) where customers even with the above configs could not get NLB to work. Both problems were Microsoft issues.
Problem 1: Clients could not connect to NLB VIP
Resolution: Apply NLB hotfix http://support.microsoft.com/kb/960916

Problem 2: NLB server could not ping outside of the subnet. When sourcing a ping from the VIP, customer would get “ping transmit failed. general failure” on the windows CMD screen.

Resolution: Run following commands on NLB Server from Windows Command Line
“netsh interface ipv4 set interface nlb weakhostreceive=enable” “netsh interface ipv4 set interface nlb weakhostsend=enable” (Customer ran these brom the c:\windows\system32 directory - not sure if that's required or not)

towry_support · ‎05-09-2012

Hi James,

Looks like we have some stability in our TMG environment at last.

The problems detailed at the end of your last post were useful as the symptoms were relevant. A whole host of updates have been implemented, but, the one that seems to have addressed the issue is disabling TCP Checksum and Offload on the internal interface used for the Proxy NLB.

FW service on TMG was needing to be restarted every 7 days before that change, but, we have seen 2 weeks of uptime since thaty change was made.

Fingers crossed that should be the end of it.

Thanks for the feedback and support.

Michael.