Intermittent TCP probe failure on ACE module

r.shummoogum · ‎01-25-2012

Hi:

Desperately looking for some help here and thanks in advance for reading.

I have been migrating a lot of serverfarms from the CSM to an ACE environment successfully so far and now I am at the last step where I am migrating a serverfarm from a CSM enviroment to an ACE environment to a dedicated context.

The real servers RSERVER1 and RSERVER2 are behind the routers R1 and R2 respectively.

During the migration we move Fa1/0 from both R1 to the VSS as shown by the dotted lines in the diagram.

We killed server vlan 32 and client vlan 33 on both CSM and SW1, SW2( redundany CSM and ACE not shown on diagram)

Activete vlan 32 and 33 on ACE and SW3 etc...

The show serverfarm detail shows operational and then changed to probe-failed intermittently. Ping towards the Rservers works fine from ACE.

I changed the probe from telnet to icmp and same results ( operatonal then failed probe then operational etc...)

The ARP cache from R1 and R2 point to the ACE.

Note that there is also PBR on R1 and R2 to ensure that traffice flows back to ACE.

the probe disconnect error is

"Server reply timeout"

But how come on CSM it works fine. IS there something that needs to be added on the ACE config?

Here is an edited config and drawing

access-list ACL1 line 10 extended permit ip any any

access-list ACL1 line 15 extended permit icmp any any

probe telnet TN3270

interval 10

passdetect interval 30

parameter-map type http REBALANCE

persistence-rebalance

parameter-map type connection TCP_IDLE_8H

set timeout inactivity 28800

rserver host TN3270_3RDPARTY-SERVER1

ip address 10.10.20.11

inservice

rserver host TN3270_3RDPARTY-SERVER2

ip address 10.10.24.11

inservice

serverfarm host TN3270_3RDPARTY

failaction purge

predictor leastconns

probe TN3270

rserver TN3270_3RDPARTY-SERVER1

inservice

rserver TN3270_3RDPARTY-SERVER2

inservice

class-map type management match-any L4_REMOTE-MGT_CLASS

2 match protocol telnet any

3 match protocol ssh any

4 match protocol icmp any

5 match protocol http any

7 match protocol snmp any

8 match protocol https any

class-map match-all TN3270_3RDPARTY

2 match virtual-address 10.20.128.111 tcp any

policy-map type management first-match L4_REMOTE-MGT_POLICY

class L4_REMOTE-MGT_CLASS

permit

policy-map type loadbalance first-match TN3270_3RDPARTY-POLICY

class class-default

serverfarm TN3270_3RDPARTY

policy-map multi-match TN3270-INTERFACE-POLICY

class TN3270_3RDPARTY

loadbalance vip inservice

loadbalance policy TN3270_3RDPARTY-POLICY

loadbalance vip icmp-reply active

appl-parameter http advanced-options REBALANCE

connection advanced-options TCP_IDLE_8H

interface vlan 32

description TN3270 SERVER VLAN

ip address 10.30.2.2 255.255.255.0

alias 10.30.2.1 255.255.255.0

peer ip address 10.30.2.3 255.255.255.0

no icmp-guard

access-group input ACL1

service-policy input L4_REMOTE-MGT_POLICY

no shutdown

interface vlan 33

description TN3270 CLIENT VLAN

ip address 10.20.128.11 255.255.255.0

alias 10.20.128.10 255.255.255.0

peer ip address 10.20.128.12 255.255.255.0

no icmp-guard

access-group input ACL1

service-policy input L4_REMOTE-MGT_POLICY

service-policy input TN3270-INTERFACE-POLICY

no shutdown

ip route 10.10.20.0 255.255.252.0 10.30.2.12

ip route 10.10.24.0 255.255.252.0 10.30.2.13

ip route 0.0.0.0 0.0.0.0 10.20.128.1

mwinnett · ‎01-30-2012

Its possible that the telnet probe operates slightly differently between ace and csm in terms of how it checks the welcome message. However, if that was an issue, then I would expect it never to work on the ace. You are really going to have to span vlan 32 across sw3 or sw4 and see what happens when it fails.

Matthew

r.shummoogum · ‎01-30-2012

I did span sw3 and sw4 and the trace shows timeout on the SYN.

I do not see any reason why the SYN would timeout. As we move things back to CSM everything becomes smooth.

Note: the PBR is pointing towards the alia on the active ACE.

I also see in the trace that both primary address and secondary address of the ace sends probes.

We tried to move them on ACE on 2 different occassions with the result.

Note: That ACE has few other contexts tht works just fine.

We will be verifying the cables to see if they are OK.

thanks

jsirstin · ‎01-30-2012

This may be a long shot but do you have these vlans configured in any other contexts of the ACE? If so can you run the command "show np 1 interface iflookup" on both the active and standby in the Admin context.

pay note to the "Hostid: X" value. If both ACE show the same value for X then this is the classic shared vlan problem where both ACE are using the same MAC for the physical interface. Keep in mind that this is only an issue if you have the same vlan in more than one context.

If this is the case you can look at the link below for more details. You would then need to hard set the mac addresses with the commands "shared-vlan-hostid x" peer shared-vlans-hostid y" values between 1-16.

http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/vA2_3_0/configuration/rtg_brdg/guide/vlansif.html#wp1025243

Output from my lab with this command. In this case it is Hostid:0.

MR0317-6500-2-ACE-8/Admin# show np 1 interface iflookup

First burnt-in MAC: 00:30:f2:75:79:fb

Last burnt-in MAC: 00:30:f2:75:79:ff

No of burnt-in MACs: 7

Hostid: 0

Shared vlan macs currently in use (offset from 0): 0-15

Vlan-vmac indexes currently in use: 0-4

Flags: Valid shared bridged ftstatus ssl-test normalization icmp-guard switch-m

ode ftvlan remove-eth-pad no-of-lifs MR0317-6500-2-ACE-8/Admin# show np 1 interface iflookup
First burnt-in MAC: 00:30:f2:75:79:fb
Last burnt-in MAC: 00:30:f2:75:79:ff
No of burnt-in MACs: 7
Hostid: 0
Shared vlan macs currently in use (offset from 0): 0-15
Vlan-vmac indexes currently in use: 0-4
Flags: Valid shared bridged ftstatus ssl-test normalization icmp-guard switch-m
ode ftvlan remove-eth-pad no-of-lifs

r.shummoogum · ‎01-30-2012

hostid is 8 on primary and 4 on secondary. VLAN 32 and 33 have been shutdown on the ACE though as everything has been moved back to the CSM.

I also noticed that interface vlan 32 is in the admin context with no ip address an is admin down( this is probably something someone forgot to remove). Another context also has vlan 32 allocated but not defined in the context ( that is no interface vlan 32 and ip address etcc).

mwinnett · ‎02-01-2012

Looking at the diagram and based on the traces, then my guess it has to be related to the switching infrastructure. When the probe fails, does the syn get to the rserver ?

Matthew

r.shummoogum · ‎02-01-2012

Matthew:

Since I see the SYN on the span ports of SW3 and SW4 then I assume it will make it to the routers R1 and R2 as those are directly connected cables.

Nothing has changed beyond that. Also as the cables(using different cables) get moved back to SW1 and SW2 everyting works fine with 3 way handshake as per trace.