Solved: CSM ret-code time-frame

petter.osterlund · ‎09-11-2012

A company I work for has a number of CSM modules (WS-X6066-SLB-APC) installed in 6513 chasis switches. The CSM modules are running version

4.2(14)

These CSM modules are configured to load-balance a number of vservers via serverfarms, each serverfarm containing multiple real servers.

Here is some example configuration:

vserver SITE

virtual 10.1.2.3 tcp www

serverfarm SERVERFARM

persistent rebalance

inservice

!

serverfarm SERVERFARM

nat server

no nat client

predictor leastconns

failaction reassign

retcode-map RETCODE-MAP

real 10.2.3.4

inservice

real 10.2.3.5

inservice

!

map RETCODE-MAP retcode

match protocol http retcode 503 503 action remove 5 reset 300

The company is facing a problem with what seems to be related to return code checking. Every once in a while a server will suddenly not receive any traffic for 5 minutes. This always occurs right after the server has sent a HTTP 503 return code. However we cannot see in the CSM logs that the CSM module has actually disabled the real server. For other serverfarms which are running regular HTTP and/or ICMP health checks to real servers we can clearly see in the CSM logs when a real server has been temporarily disabled due to health check failures.

The return code checking is set to disable a real server for 300 seconds after the CSM has received five HTTP 503 responses from the real server. If we check the real server log however we cannot find more than that single 503 return code right before the server stops seeing any incoming traffic unless we move back at least hours in time.

I have tried to figure out what time frame those 5 return codes must be received within for them to count towards the maximum allowed return codes, but nowhere in no documentation can I find any information about this time frame.

For all I know the CSM could keep track of every incoming 503 forever, until the maximum of five 503's is reached, and then the server is disabled for 300 seconds.

Does anyone have any information about the time frame within which those return codes must be received by the CSM to count toward the maximum configured number of return codes before the configured action is taken?

chrhiggi · ‎09-13-2012

Hello Petter-

The "remove 5" is a static counter, not a counter over a timed interval unfortunately.

Regards,

Chris Higgins

View solution in original post

chrhiggi · ‎09-13-2012

Hello Petter-

The "remove 5" is a static counter, not a counter over a timed interval unfortunately.

Regards,

Chris Higgins

petter.osterlund · ‎09-13-2012

Hi Christopher

EDIT: It might actually be me who misunderstood your reply. You probably gave me the correct answer already, that there is no time limit at all to the counter, and the counter will increase until it reaches 5 even if those 503's are spread out over several months time. Could you please confirm that this is what you meant? I'll leave my original reply down below for you to look at.

ORIGINAL MESSAGE:

Thanks for your reply, but I think you might have misunderstood my question, and I don't exactly blame you because I had some difficulty explaining what I meant.

I know what each and every key word in the following line mean, and I understand the command.

match protocol http retcode 503 503 action remove 5 reset 300

We match the protocol http, and look for when/if the server sends a return code 503 back to a client. Each time a 503 return code is returned from the real server a counter is increased by one. When the counter reaches 5 we take the action to remove (disable) the server. 300 seconds after the server was disabled it will be enabled again, and the counter will also be reset to zero at this point.

But I'm asking about the time frame for the counter, or rather if there is a default timer that resets the counter back to zero after a certain amount of time. Let me give you an example.

I enable return code checking for HTTP 503 and configure it to disable a server after five HTTP 503's have been seen by the CSM
1 minute later the server sends three HTTP 503 messages to a client. Now the return code check counter is at 3. This is where the "time frame" that I speak of begins.
Another 6 minutes later the server sends one HTTP 503 message to a client. The time frame within wich the CSM has seen HTTP 503's is now 6 minutes, so the CSM has seen four different HTTP 503's within 6 minutes. The counter would now be at 4.
5 months later the server sends a fifth HTTP 503 message to a client. The time frame is now five months and six minutes. Will this fifth HTTP 503 message increase the return code check counter to 5? If it does, then the server will be disabled for 300 seconds, even though it was 5 months since the previous four 503's were seen by the CSM.

What I'm wondering is basically if there is a limited time frame (like a sliding window) after which the return code check counter is reset back to zero? I cannot find any information about this, and as far as I can see there is no command I can use to see what the return code counter is currently at either, so I cannot manually verify this.

It would have made sense if the feature worked like this instead, and I'm still hoping that someone can provide documentation that says this is the way it is supposed to work (because if the above example is true, then the feature is broken and useless)

I enable return code checking for HTTP 503 and configure it to disable a server after five HTTP 503's have been seen by the CSM
1 minute later the server sends three HTTP 503 messages to a client. Now the return code check counter is at 3. This is where the "time frame" that I speak of begins.
5 minutes later the counter is reset to zero, because no 503's have been seen for over five minutes.
2 months later the server sends a HTTP 503 message to a client. This increases the counter to one. The server is never disabled. After another 5 minutes the counter is reset back to zero again.

I hope that the above examples make it a bit clearer what I meant with my original question.

chrhiggi · ‎09-13-2012

Petter-

There is no time at all, it just counts to whatever you set it to remove the server at. You said you are looking at a counter where it is reset back to zero - what counter are you looking at?

Chris

petter.osterlund · ‎09-13-2012

Thanks for your confirmation. I marked your first reply as the correct answer to my question.

About your last question, I was not looking at a counter, I was looking for a way to display a counter. But since I have the answer to my question now I no longer need to view the counter I was looking for and I am sure it isn't possible to display the counter anyway

Sent from Cisco Technical Support iPhone App