09-11-2012 05:35 AM
A company I work for has a number of CSM modules (WS-X6066-SLB-APC) installed in 6513 chasis switches. The CSM modules are running version
4.2(14)
These CSM modules are configured to load-balance a number of vservers via serverfarms, each serverfarm containing multiple real servers.
Here is some example configuration:
vserver SITE
virtual 10.1.2.3 tcp www
serverfarm SERVERFARM
persistent rebalance
inservice
!
serverfarm SERVERFARM
nat server
no nat client
predictor leastconns
failaction reassign
retcode-map RETCODE-MAP
real 10.2.3.4
inservice
real 10.2.3.5
inservice
!
map RETCODE-MAP retcode
match protocol http retcode 503 503 action remove 5 reset 300
The company is facing a problem with what seems to be related to return code checking. Every once in a while a server will suddenly not receive any traffic for 5 minutes. This always occurs right after the server has sent a HTTP 503 return code. However we cannot see in the CSM logs that the CSM module has actually disabled the real server. For other serverfarms which are running regular HTTP and/or ICMP health checks to real servers we can clearly see in the CSM logs when a real server has been temporarily disabled due to health check failures.
The return code checking is set to disable a real server for 300 seconds after the CSM has received five HTTP 503 responses from the real server. If we check the real server log however we cannot find more than that single 503 return code right before the server stops seeing any incoming traffic unless we move back at least hours in time.
I have tried to figure out what time frame those 5 return codes must be received within for them to count towards the maximum allowed return codes, but nowhere in no documentation can I find any information about this time frame.
For all I know the CSM could keep track of every incoming 503 forever, until the maximum of five 503's is reached, and then the server is disabled for 300 seconds.
Does anyone have any information about the time frame within which those return codes must be received by the CSM to count toward the maximum configured number of return codes before the configured action is taken?
Solved! Go to Solution.
09-13-2012 10:48 AM
Hello Petter-
The "remove 5" is a static counter, not a counter over a timed interval unfortunately.
Regards,
Chris Higgins
09-13-2012 10:48 AM
Hello Petter-
The "remove 5" is a static counter, not a counter over a timed interval unfortunately.
Regards,
Chris Higgins
09-13-2012 11:42 AM
Hi Christopher
EDIT: It might actually be me who misunderstood your reply. You probably gave me the correct answer already, that there is no time limit at all to the counter, and the counter will increase until it reaches 5 even if those 503's are spread out over several months time. Could you please confirm that this is what you meant? I'll leave my original reply down below for you to look at.
ORIGINAL MESSAGE:
Thanks for your reply, but I think you might have misunderstood my question, and I don't exactly blame you because I had some difficulty explaining what I meant.
I know what each and every key word in the following line mean, and I understand the command.
match protocol http retcode 503 503 action remove 5 reset 300
We match the protocol http, and look for when/if the server sends a return code 503 back to a client. Each time a 503 return code is returned from the real server a counter is increased by one. When the counter reaches 5 we take the action to remove (disable) the server. 300 seconds after the server was disabled it will be enabled again, and the counter will also be reset to zero at this point.
But I'm asking about the time frame for the counter, or rather if there is a default timer that resets the counter back to zero after a certain amount of time. Let me give you an example.
What I'm wondering is basically if there is a limited time frame (like a sliding window) after which the return code check counter is reset back to zero? I cannot find any information about this, and as far as I can see there is no command I can use to see what the return code counter is currently at either, so I cannot manually verify this.
It would have made sense if the feature worked like this instead, and I'm still hoping that someone can provide documentation that says this is the way it is supposed to work (because if the above example is true, then the feature is broken and useless)
I hope that the above examples make it a bit clearer what I meant with my original question.
09-13-2012 12:38 PM
Petter-
There is no time at all, it just counts to whatever you set it to remove the server at. You said you are looking at a counter where it is reset back to zero - what counter are you looking at?
Chris
09-13-2012 12:43 PM
Thanks for your confirmation. I marked your first reply as the correct answer to my question.
About your last question, I was not looking at a counter, I was looking for a way to display a counter. But since I have the answer to my question now I no longer need to view the counter I was looking for and I am sure it isn't possible to display the counter anyway
Sent from Cisco Technical Support iPhone App
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide