“Datacenter troubleshooting guide” – a blog by Gilles Dufour.
Day 4 - Stickyness (2nd part)
Troubleshooting stickyness is not an easy task.
There is no debugging option to know why ACE decided to send a connection to a specific server.
There are a few common errors that we can spot by looking at stats.
Here is the most well known.
1. Whatever the sticky type you selected, when reason to not stick to a server is if the sticky entry timed out or was removed due to lack of sticky resource. So, check your sticky resource allocation with the following command :
AceC6k2 Admin# show np 1 me-stats "-slb -v"
LB Perf stats at address 0x82e05000
LB Perf stats at address 0x82e05000
LB Statistics
--------------
(Context ALL Statistics)
...
Num Stolen For Reuse: 0 0
Num Active Sticky Entry: 0 0
Num Active Reverse Sticky Entry: 0 0
Active Conn Count: 0 0
Free Sticky Entry Count: 1794111 0
Num Grp or Timeout Nodes: 4 0
Static Entry List Count: 0 0
Num Entry Configured: 1794111 0
Prev Resources Req: 1794111 0
Drop Max Remote Stky: 0 0
The "Num Stolen For Reuse" indicates that you have not enough sticky resource and old sticky entries got deleted in order to save new ones. This is a possible reason for sticky failures. To solve this problem, you can either increment the sticky resource or for sticky source ip, you can change the mask so that one entry can be used for all clients belonging to a same subnet.
"Drop Max Remote Stky" counts the number of sticky lookup that failed because the communication between internal components is too slow to process all the sticky lookup.
2. If you have configured any form of HTTP stickyness (cookie, header, ...), one reason to see sticky failures is http parsing errors. ACE will need to parse the HTTP header to find the information needed to perform stickyness (ie: cookie). If the HTTP request is malformed, ACE will fail to properly parse the data and will not even try to do a sticky lookup. Check if there was no any parsing errors with the command :
AceC6k2/Admin# show np 1 me-stats "-shttp -v" | i pars
Second pass parsing: 0 0
Static parse errors: 0 0
Max parselen errors: 0 0
AceC6k2/Admin#
If you see "Static parse errors" that could explain your sticky failure. Unfortunately, there is no solution here. With ddts CSCtj42969 we added a new command to continue parsing http data even when an error was detected. See previous Blog post for more information.
"Max parselen errors" is the result of http headers being too big and the data we need located beyond our buffer. You can adjust the size of the parsing buffer with a parameter-map
AceC6k2/Admin(config)# parameter-map type http ParseLen
AceC6k2/Admin(config-parammap-http)# set ?
content-maxparse-length Configure content maxparse length
header-maxparse-length Configure header maxparse length
secondary-cookie-delimiters Configure URL/COOKIE (cookie-in-url-query) delimiter
Clearly, the stats above can't explain all possible sticky issues. We often have to capture sniffer traces and dump the sticky table regularly in oder to try to understand what is going on.
This is why I have introduced a new command in version 4.1.x to list connections linked to a particular sticky entry.
CSCti03084: Enh: show conn sticky - display connections linked to a sticky
Here is how to use this new command.
First, list the sticky entries in "detail" mode (the command is hidden) and exract the internal entry id.
Scimitar1/Admin# show sticky data detail
processor (0/0): 0
results index: 1 of 1
sticky group: RESERVATIONS_STICKY
sticky type: IP
rserver: vmware-46
realPort: 80
timeout (secs): 1200
sticky-entry: 192.168.20.0
internal entry-id: 0x200004
time-to-expire (secs): 1198
stcky-hit-count: 63402
active-conn-count: 1
in-use reference count: 0
static entry: FALSE
reverse entry: FALSE
active entry: TRUE
timeout-active-conns: FALSE
created-from-HA-peer: FALSE
HA-replicated-at-least-once: TRUE
With the internal entry-id you can list all connections using that entry.
Scimitar1/Admin# show conn sticky 0x200004
conn-id np dir proto vlan source destination state
----------+--+---+-----+----+---------------------+---------------------+------+
653622 1 in TCP 20 192.168.20.45:47377 192.168.20.15:80 SYNSEEN
658234 1 out TCP 40 192.168.40.46:80 192.168.20.45:47377 INIT
Scimitar1/Admin#
You can also retrieve the internal entry-id used by a connection.
For that, you need to know the conn-id and the NP to which the connection is associated.
switch/Admin# show np 1 me-stats "-c 242"
Connection ID:seq: 242[0xf2].0
Other ConnID : 243[0xf3].0
Proxy ConnID : 0[0x0].0
Next Q : 0[0x0]
192.168.20.45:44425 -> 192.168.20.15:80 [RX-NextHop: TX] [TX-NextHop: TX]
Flags: PAT: No DynNAT: No Implicit PAT: No On_Reuse: No
L3 Protocol : IPv4 L4 Protocol : 6
Inbound Flag : 1
Interface Match : Yes
Interface MatchID: 3
EncapsID:ver : 45:0 TCP ACK delta : 0xc8228d7e
MSS : 1460 TOS Stamp : 0
Repeat mode : No Punt Flag : No
TOS Stamp : No TCP Window Check: No
ACE ID : 98 NAT Policy ID : 0
Post NAT hop : 0
Packet Count : 5 Byte Count : 309
TCP Information: (State = 3)
Window size : 6432 Window scale : 0
FIN seen : No FIN/ACK seen : No
FIN/ACK exp : No Close initiator : No
FIN/ACK expval: 5b40000 Last seq : 9f50cfc0
timestamp_delta: 0 Last ack : 947b3980
Data/ACK Trigger : 947b3980 Trigger Status : 1
Timestamp : 39cd8
TCP options negotiated:
Sack:Clear TS:Clear Windowscale: Clear
Reserved: Allow Exceed MSS: Deny Window var: Allow
Sticky Internal Entry-id : 0x200006
Raw Connection Entry
0000 0x00000000 0xc0a8142d 0xc0a8140f 0x06090003
0010 0xad890050 0xc8228d7e 0x002d0000 0x05b40000
0020 0x000000f3 0x00000000 0x080a2480 0x24450000
0030 0x00000005 0x00000135 0x19200030 0x05b40000
0040 0x9f50cfc0 0x947b3980 0x00039cd8 0x00000000
0050 0x00000062 0x947b3980 0x00039cb1 0x00000000
0060 0x00000000 0x00000000 0x00000000 0x00000000
0070 0x0000bc90 0x00000000 0x00000000 0x00000000
switch/Admin#
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://cdetsweb-prd.cisco.com/apps/dumpcr_att?identifier=CSCti03084&title=Diffs-commit-converge_dev&ext=&type=FILE
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://cdetsweb-prd.cisco.com/apps/dumpcr_att?identifier=CSCti03084&title=Diffs-commit-converge_dev.p2&ext=&type=FILE
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://cdetsweb-prd.cisco.com/apps/dumpcr_att?identifier=CSCti03084&title=Scimitar-code-review&ext=&type=FILE
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://
Another new feature that I introduced in version 4.1.x is CSCth64697: Enh: New show command to display http url/cookie hash result
What | How Bad | Who | Versions | ||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
History |
---|
Histogram of CSCti03084: Covering 85 days and terminal, first Resolved in 3 days, currently in V-Verified state. N -> A [ 0] | <-----------------------------------+-----------------------------------> |
cdetsadm | 11/30/2010 15:15:24 | Integrated-releases: | --> 3.0(0)A90(7.0.12) |
cdetsadm | 11/01/2010 13:42:20 | Integrated-releases: | --> 3.0(0)A4(1.0) |
gdufour | 10/19/2010 00:22:06 | RNE-Approval-Flg: | N --> |
gdufour | 10/19/2010 00:22:06 | Note Title: | --> V-comments |
gdufour | 10/19/2010 00:22:05 | Verified-on: | --> 10/19/2010 00:22:05 |
gdufour | 10/19/2010 00:22:05 | Verifier: | --> gdufour |
gdufour | 10/19/2010 00:21:59 | Verified-confidence: | --> tested |
gdufour | 10/19/2010 00:20:32 | Status: | R --> V |
gdufour | 10/19/2010 00:22:05 | Verified-release: | --> 3.0(0)A4(1.0.5) |
cms | 10/04/2010 07:12:07 | File Name: | --> Scimitar-code-review |
cdetsadm | 09/23/2010 10:30:36 | Integrated-releases: | --> 3.0(0)A90(4.0.38) |
cdetsadm | 09/07/2010 11:35:29 | Integrated-releases: | --> 3.0(0)A4(0.99.214) |
gdufour | 09/06/2010 00:56:13 | RNE-Approval-Flg: | N --> |
gdufour | 09/06/2010 00:56:13 | Note Title: | --> R-comments |
gdufour | 09/06/2010 00:56:13 | Resolved-on: | 07/29/2010 04:09:36 --> 09/06/2010 00:56:12 |
gdufour | 09/06/2010 00:55:57 | Status: | A --> R |
divjain | 09/06/2010 00:51:22 | RNE-Approval-Flg: | N --> |
divjain | 09/06/2010 00:51:22 | Note Title: | --> A-comments |
divjain | 09/06/2010 00:51:22 | Assigner: | gdufour --> divjain |
divjain | 09/06/2010 00:51:22 | Assigned Date: | 07/26/2010 06:19:33 --> 09/06/2010 00:51:22 |
divjain | 09/06/2010 00:51:07 | Status: | R --> A |
gdufour | 09/06/2010 00:42:03 | File Name: | --> Diffs-commit-converge_dev.p2 |
cdetsadm | 08/25/2010 13:05:01 | Integrated-releases: | --> 3.0(0)A90(4.0.27) |
rahul | 08/01/2010 23:52:20 | Found: | de-test --> development |
cdetsadm | 07/31/2010 07:09:49 | Integrated-releases: | --> 3.0(0)A4(0.99.180) |
kavenkat | 07/29/2010 14:19:33 | Attribute2: | Fusion --> FUSION |
gdufour | 07/29/2010 04:09:36 | Resolved-on: | --> 07/29/2010 04:09:36 |
gdufour | 07/29/2010 04:09:28 | Documents-changed: | --> none |
gdufour | 07/29/2010 04:09:28 | Status: | O --> R |
gdufour | 07/29/2010 04:05:46 | File Name: | --> Diffs-commit-converge_dev |
gdufour | 07/28/2010 02:03:37 | Regression: | --> N |
gdufour | 07/28/2010 02:03:47 | Code-reviewer: | --> bblanco |
gdufour | 07/28/2010 02:02:52 | Note: | Static-analysis-runinfo --> Static-analysis-runinfo |
gdufour | 07/28/2010 01:59:11 | RNE-Approval-Flg: | N --> |
gdufour | 07/28/2010 01:59:11 | Note Title: | --> Static-analysis-runinfo |
gdufour | 07/26/2010 08:04:54 | RNE-Approval-Flg: | N --> |
gdufour | 07/26/2010 08:04:54 | Note Title: | --> AUT |
gdufour | 07/26/2010 07:21:27 | Status: | A --> O |
cms | 07/26/2010 07:17:50 | File Name: | --> Scimitar-code-review |
gdufour | 07/26/2010 06:24:00 | RNE-Approval-Flg: | N --> |
gdufour | 07/26/2010 06:24:00 | Note Title: | --> Unit-test |
gdufour | 07/26/2010 06:19:54 | Note Title: | --> Release-note |
gdufour | 07/26/2010 06:19:33 | Assigner: | --> gdufour |
gdufour | 07/26/2010 06:19:33 | Assigned Date: | --> 07/26/2010 06:19:33 |
gdufour | 07/26/2010 06:19:33 | Engineer: | --> gdufour |
gdufour | 07/26/2010 06:19:28 | Status: | N --> A |
gdufour | 07/26/2010 06:19:18 | Defect Created: | --> |
Here is an example on how to use this command.
A sticky function very often hashes the input data into a 64bits value. This is the value that we keep in our sticky entry and that we use to match new input data to existing entries.
So, when you display a sticky database, you usually a sticky-entry which is a huge number.
It's impossible to retrieve the original data from the hash value.
ctw1prod001/maxcon# show sticky database group group1
sticky group : group1
type : HTTP-COOKIE
timeout : 5 timeout-activeconns : TRUE
sticky-entry rserver-instance time-to-expire flags
---------------------+--------------------------------+--------------+-------+
18119923635054054134 rser3:0 293 -
5834233610907968705 rser2:0 6 -
Until version 4.1.x it was also impossible to know the hash value from the original data.
This is why the command show sticky hash is useful. This command will hash whatever text you provide and return the hash value in hexadecimal and decimal.
ctw1prod001/maxcon# show sticky hash cisco123
Hash: 0x50f75c8edad064c1 - 5834233610907968705
ctw1prod001/maxcon# show sticky hash testing123
Hash: 0xfb76e68ae89a0af6 - 18119923635054054134
ctw1prod001/maxcon#
In the example above, the text cisco123 is hashed into value 5834233610907968705
This value correspond to the sticky entry pointing to rser2:0
Below is another example with cookie insert.
ctw1prod001/maxcon# sh sticky cookie-insert group group1
Cookie | HashKey | rserver-instance
------------+----------------------+----------------------------------------+
R876364218 | 8016668853981260554 | s1/rser1:0
R876365307 | 1555693097661300743 | s1/rser2:0
R876366396 | 9600904050871858939 | s1/rser3:0
R876367485 | 3122196043780649991 | s1/rser4:0
R2830679934 | 17805359266603035579 | s2/rser4:0
R2830681023 | 81717080676150073 | s2/rser5:0
ctw1prod001/maxcon# sh sticky hash R876364218
Hash: 0x6f40edd93b7dab0a - 8016668853981260554
ctw1prod001/maxcon# sh sticky hash R2830681023
Hash: 0x12251407dc8ab39 - 81717080676150073
ctw1prod001/maxcon#
And this will end our stickyness review.
Do not hesitate to send me questions or topics you would like me to address.
Thanks,
Gilles Dufour
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.