Solved: Service status down & recovery to alive

paul.matthews · ‎08-09-2004

I have a strange one. I have a pair of 11503s in redundant interface/vip mode.

Occasionally a subset of servers will go to down on BOTH CSSs. I would normally suspect that meant the servers had all gone (thoug at the same time would be odd)

Suspending and activating the servers brings them back alive.

The config is:

service server1-80

ip address <Address>

port 80

keepalive type http

keepalive method get

active

What actually causes a down state? I would expect a 400 series, a 500 series, or no response at all to take it down, but the server will respond when I kick the CSS

What is the method used to recover a service? I would expect the CSS would have been retrying all along, without waiting for me to sus/act to force the Keepalive.

Paul.

Gilles Dufour · ‎08-09-2004

Paul,

since you are using the method get under the keepalive, it means the CSS will verify that the webpage is always the same by computing a hash.

So, I would suspect that when everything goes down, it's because somebody modified the content of the webpage.

By doing a suspend/active you forces the CSS to recompute the hash and so everything is ok until the page changes again.

If you don't want the CSS to verify the content of the page, use a method head.

Regards,

Gilles.

View solution in original post

d.parks · ‎08-09-2004

I suspect that the content on your html page has been changing.

With the keepalive method set to "get", the CSS does a hash on the page at the time of the first keepalive and stores that value. On the subsequent checks, the current content of the page is compared against that first hash. If they are different, the service is placed in a down state. Keepalive attempts continue, but the service will not be seen as alive unless the original content is restored.

Suspending and activating the service refreshes the hash value to the current contents of the page, which is why it goes "alive" when you do this.

If this is not the behavior you want, try switching to a keeplive method of "head" which just checks for a "200 ok" response and does not hash the content.

View solution in original post

Gilles Dufour · ‎08-09-2004

Paul,

since you are using the method get under the keepalive, it means the CSS will verify that the webpage is always the same by computing a hash.

So, I would suspect that when everything goes down, it's because somebody modified the content of the webpage.

By doing a suspend/active you forces the CSS to recompute the hash and so everything is ok until the page changes again.

If you don't want the CSS to verify the content of the page, use a method head.

Regards,

Gilles.

d.parks · ‎08-09-2004

Oops, sorry for the "twin" response. You must be quicker on the keys than I this morning.

Gilles Dufour · ‎08-09-2004

no problem - I'm giving you a 5 anyway for the good answer :-)

Gilles.

paul.matthews · ‎08-09-2004

Two of you giving me the same answer is not a problem - thanks for the help guys.

Paul.

d.parks · ‎08-09-2004

I suspect that the content on your html page has been changing.

With the keepalive method set to "get", the CSS does a hash on the page at the time of the first keepalive and stores that value. On the subsequent checks, the current content of the page is compared against that first hash. If they are different, the service is placed in a down state. Keepalive attempts continue, but the service will not be seen as alive unless the original content is restored.

Suspending and activating the service refreshes the hash value to the current contents of the page, which is why it goes "alive" when you do this.

If this is not the behavior you want, try switching to a keeplive method of "head" which just checks for a "200 ok" response and does not hash the content.