cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

Community Helping Community

Datacenter troubleshooting guide - day 5

3568
Views
0
Helpful
1
Comments
Cisco Employee

“Datacenter troubleshooting guide” – a blog by  Gilles Dufour.

Day 4 - Stickyness (2nd part)

Troubleshooting stickyness is not an easy task.

There is no debugging option to know why ACE decided to send a connection to a specific server.

There are a few common errors that we can spot by looking at stats.

Here is the most well known.

1. Whatever the sticky type you selected, when reason to not stick to a server is if the sticky entry timed out or was removed due to lack of sticky resource. So, check your sticky resource allocation with the following command :
AceC6k2 Admin# show np 1 me-stats "-slb -v"

LB Perf stats at address 0x82e05000

LB Perf stats at address 0x82e05000
LB Statistics
--------------
(Context ALL Statistics)
...
Num Stolen For Reuse:                             0             0
Num Active Sticky Entry:                          0             0
Num Active Reverse Sticky Entry:                  0             0
Active Conn Count:                                0             0
Free Sticky Entry Count:                    1794111             0
Num Grp or Timeout Nodes:                         4             0
Static Entry List Count:                          0             0
Num Entry Configured:                       1794111             0
Prev Resources Req:                         1794111             0
Drop Max Remote Stky:                             0             0

The "Num Stolen For Reuse" indicates that you have not enough sticky resource and old sticky entries got deleted in order to save new ones. This is a possible reason for sticky failures. To solve this problem, you can either increment the sticky resource or for sticky source ip, you can change the mask so that one entry can be used for all clients belonging to a same subnet.

"Drop Max Remote Stky" counts the number of sticky lookup that failed because the communication between internal components is too slow to process all the sticky lookup.

2. If you have configured any form of HTTP stickyness (cookie, header, ...), one reason to see sticky failures is http parsing errors. ACE will need to parse the HTTP header to find the information needed to perform stickyness (ie: cookie).  If the HTTP request is malformed, ACE will fail to properly parse the data and will not even try to do a sticky lookup.  Check if there was no any parsing errors with the command :
AceC6k2/Admin# show np 1 me-stats "-shttp -v" | i pars
Second pass parsing:                              0             0
Static parse errors:                              0             0
Max parselen errors:                              0             0
AceC6k2/Admin#

If you see "Static parse errors" that could explain your sticky failure.  Unfortunately, there is no solution here.  With ddts
CSCtj42969 we added a new command to continue parsing http data even when an error was detected.  See previous Blog post for more information.
"Max parselen errors" is the result of http headers being too big and the data we need located beyond our buffer.  You can adjust the size of the parsing buffer with a parameter-map
AceC6k2/Admin(config)# parameter-map type http ParseLen
AceC6k2/Admin(config-parammap-http)# set ?
  content-maxparse-length      Configure content maxparse length
  header-maxparse-length       Configure header maxparse length
  secondary-cookie-delimiters  Configure URL/COOKIE (cookie-in-url-query) delimiter

Clearly, the stats above can't explain all possible sticky issues.  We often have to capture sniffer traces and dump the sticky table regularly in oder to try to understand what is going on.
This is why I have introduced  a new command in version 4.1.x to list connections linked to a particular sticky entry.

CSCti03084: Enh: show conn sticky - display connections linked to a sticky

Here is how to use this new command.

First, list the sticky entries in "detail" mode  (the command is hidden) and exract the internal entry id.

Scimitar1/Admin# show sticky data detail
processor (0/0):             0
results index:               1 of 1
sticky group:                RESERVATIONS_STICKY
sticky type:                 IP
rserver:                     vmware-46
realPort:                    80
timeout (secs):              1200
sticky-entry:                192.168.20.0
internal entry-id:           0x200004
time-to-expire (secs):       1198
stcky-hit-count:             63402
active-conn-count:           1
in-use reference count:      0
static entry:                FALSE
reverse entry:               FALSE
active entry:                TRUE
timeout-active-conns:        FALSE
created-from-HA-peer:        FALSE
HA-replicated-at-least-once: TRUE

With the internal entry-id you can list all connections using that entry.

Scimitar1/Admin# show conn sticky 0x200004

conn-id    np dir proto vlan source                destination           state
----------+--+---+-----+----+---------------------+---------------------+------+
653622     1  in  TCP   20   192.168.20.45:47377   192.168.20.15:80      SYNSEEN
658234     1  out TCP   40   192.168.40.46:80      192.168.20.45:47377   INIT
Scimitar1/Admin#

You can also retrieve the internal entry-id used by a connection.

For that, you need to know the conn-id and the NP to which the connection is associated.

switch/Admin# show np 1 me-stats "-c 242"
Connection ID:seq: 242[0xf2].0
  Other ConnID    : 243[0xf3].0
  Proxy ConnID    : 0[0x0].0
  Next Q    : 0[0x0]

192.168.20.45:44425 -> 192.168.20.15:80 [RX-NextHop: TX] [TX-NextHop: TX]
  Flags:  PAT: No  DynNAT: No  Implicit PAT: No On_Reuse: No
  L3 Protocol     : IPv4                L4 Protocol    : 6
  Inbound Flag    : 1
  Interface Match : Yes
    Interface MatchID: 3
  EncapsID:ver    : 45:0                TCP ACK delta  : 0xc8228d7e
  MSS             : 1460                TOS Stamp       : 0
  Repeat mode     : No          Punt Flag      : No
  TOS Stamp       : No          TCP Window Check: No
  ACE ID          : 98          NAT Policy ID       : 0
  Post NAT hop    : 0
  Packet Count    : 5           Byte Count          : 309
  TCP Information: (State = 3)
    Window size   : 6432                Window scale    : 0
    FIN seen      : No          FIN/ACK seen    : No
    FIN/ACK exp   : No          Close initiator : No
    FIN/ACK expval: 5b40000             Last seq        : 9f50cfc0
   timestamp_delta: 0           Last ack        : 947b3980
    Data/ACK Trigger : 947b3980         Trigger Status   : 1
    Timestamp : 39cd8
  TCP options negotiated:
    Sack:Clear          TS:Clear        Windowscale:  Clear
    Reserved: Allow     Exceed MSS:  Deny       Window var: Allow

 
Sticky Internal Entry-id : 0x200006 

Raw Connection Entry
0000  0x00000000  0xc0a8142d  0xc0a8140f  0x06090003
0010  0xad890050  0xc8228d7e  0x002d0000  0x05b40000
0020  0x000000f3  0x00000000  0x080a2480  0x24450000
0030  0x00000005  0x00000135  0x19200030  0x05b40000
0040  0x9f50cfc0  0x947b3980  0x00039cd8  0x00000000
0050  0x00000062  0x947b3980  0x00039cb1  0x00000000
0060  0x00000000  0x00000000  0x00000000  0x00000000
0070  0x0000bc90  0x00000000  0x00000000  0x00000000
switch/Admin#

Another new feature that I introduced in version 4.1.x is CSCth64697: Enh: New show command to display http url/cookie hash result

Here is an example on how to use this command.

A sticky function very often hashes the input data into a 64bits value.  This is the value that we keep in our sticky entry and that we use to match new input data to existing entries.

So, when you display a sticky database, you usually a sticky-entry which is a huge number.

It's impossible to retrieve the original data from the hash value.

ctw1prod001/maxcon# show sticky  database  group group1
sticky group : group1
type         : HTTP-COOKIE
timeout      : 5             timeout-activeconns : TRUE
  sticky-entry          rserver-instance                 time-to-expire flags  
  ---------------------+--------------------------------+--------------+-------+
  18119923635054054134  rser3:0                          293            -
  5834233610907968705   rser2:0                          6              -

Until version 4.1.x it was also impossible to know the hash value from the original data.
This is why the command
show sticky hash is useful.  This command will hash whatever text you provide and return the hash value in hexadecimal and decimal. 

ctw1prod001/maxcon# show sticky hash cisco123
Hash: 0x50f75c8edad064c1 - 5834233610907968705

ctw1prod001/maxcon# show sticky hash testing123
Hash: 0xfb76e68ae89a0af6 - 18119923635054054134
ctw1prod001/maxcon#

In the example above, the text cisco123 is hashed into value 5834233610907968705
This value correspond to the sticky entry pointing to rser2:0

Below is another example with cookie insert.

ctw1prod001/maxcon# sh sticky  cookie-insert group group1
     Cookie   |        HashKey       |           rserver-instance  
  ------------+----------------------+----------------------------------------+
  R876364218 | 8016668853981260554  | s1/rser1:0
  R876365307 | 1555693097661300743  | s1/rser2:0
  R876366396 | 9600904050871858939  | s1/rser3:0
  R876367485 | 3122196043780649991  | s1/rser4:0
  R2830679934 | 17805359266603035579 | s2/rser4:0
  R2830681023 | 81717080676150073    | s2/rser5:0

ctw1prod001/maxcon# sh sticky hash  R876364218
Hash: 0x6f40edd93b7dab0a - 8016668853981260554

ctw1prod001/maxcon# sh sticky hash R2830681023
Hash: 0x12251407dc8ab39 - 81717080676150073
ctw1prod001/maxcon#

And this will end our stickyness review.

Do not hesitate to send me questions or topics you would like me to address.

Thanks,

Gilles Dufour

1 Comment
Beginner

hi Gilles,

This ACE troubleshooting post is excellent.

Could you please also explain the following commands with sample example.

Capture (how to capture and analyse)

resource usage (to check the usage)

show conn address

regards,

PM

CreatePlease to create content
Content for Community-Ad
FusionCharts will render here