cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
691
Views
0
Helpful
2
Replies

Strange Behaviour on SLB

Hi all

I was called out to a customer site yesterday as they were complaining that access to their Intranet homepage was intermittent. Every 5th to 7th attempt to open the browser (IE v6.x) resulted in the browser just 'hanging', that is, the page doesn't load, just a white screen with the progress bar stuck at 3 green tabs (Windows XP).

Sure enough I found that this was indeed the case. After eliminating several possibilites, including the ADSL line, local infrastructure etc. I ran a Wireshark session on the ADSL router (877) monitoring a port with a workstation connected.

There are two data centres in this customers network and in each one there are two WWW servers serving their Intranet needs. These servers sit behind an ACE module doing SLB.

What I found is that any attempt to reach the WWW servers at DC A was successful and any attempt to reach the WWW servers at DC B was unsuccessful. I asked my colleague to check the ACE at DC B. He found that (for DC B) the number of active sessions going to WWW server A was approximately 500 and the number of sessions going to WWW server B was approximately 100. Not very load-balanced! However, beyond that there didn't appear to be anything obvious jumping out at him. The problem appeared to be a bit more subtle.

When he took server A out of the SLB group, it seemed to fix everything. Opening up the browser worked flawlessly, regardless of which DC the Intranet session was established to. The remaining server B in DC B seemed to handle it OK. The problem appeared to be with server A in DC B. Sure enough when he re-introduced server B back into the SLB group, we encountered the same problems.

Now, here's where the plot thickens....

If we enter the DNS name of server B in DC B (the suspect one) directly into the browser URL it successfully brings up the Intranet homepage. In other words, bypassing the SLB seems to sort it all out.

This then makes us believe that there is something suspicious going on with the ACE SLB in DC B.

The server logs were checked for anything that may explain this strange behaviour but sure enough everything seemed fine.

My question really is has anyone seen this type of behaviour before? If so, any ideas as to how we can fix it? Anything inparticular we should look out for on the ACE configuration?

I'm afraid I can't post any configs now as I don't have access to the machine.

Thanks in advance for you advice.

2 Replies 2

rajsures
Cisco Employee
Cisco Employee

Hi Devlin,

Possibly the return traffic is not coming back to the ACE i.e. it takes a different route from the suspect server ? Taking packet captures on the ACE will confirm if the traffic is returning back to ACE or not. Compare the routes on both servers and check if that makes a difference.

Thanks,

Rajesh

Hi Rajesh

Thank you for the reply.

We have someone going out too site next week to do just that. Going to take a packet capture in front and behind the ACE, line the output and see if we can spot where it is failing.

I will update this post for future reference for others.

Thank you.

Review Cisco Networking for a $25 gift card