cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1317
Views
0
Helpful
6
Replies

Intermittent TCP probe failure on ACE module

r.shummoogum
Level 1
Level 1

Hi:

Desperately looking for some help here and thanks in advance for reading.

I have been migrating a lot of serverfarms from the CSM to an ACE environment successfully so far and now I am at the last step where I am migrating a serverfarm from a CSM enviroment to an ACE environment to a dedicated context.

The real servers RSERVER1 and RSERVER2 are behind the routers R1 and R2 respectively.

During the migration we move Fa1/0 from both R1 to the VSS as shown by the  dotted lines in the diagram.

We killed server vlan 32 and client vlan 33 on both CSM and SW1, SW2( redundany CSM and ACE not shown on diagram)

Activete vlan 32 and 33 on ACE and SW3 etc...

The show serverfarm detail shows operational and then changed to probe-failed intermittently. Ping towards the Rservers works fine from ACE.

I changed the probe from telnet to icmp and same results ( operatonal then failed probe then operational etc...)

The ARP cache from R1 and R2 point to the ACE.

Note that there is also PBR on R1 and R2 to ensure that traffice flows back to ACE.

the probe disconnect error is

"Server reply timeout"

But how come on CSM it works fine. IS there something that needs to be added on the ACE config?

Here is an edited config and drawing

access-list ACL1 line 10 extended permit ip any any

access-list ACL1 line 15 extended permit icmp any any

probe telnet TN3270

  interval 10

  passdetect interval 30

parameter-map type http REBALANCE

  persistence-rebalance

parameter-map type connection TCP_IDLE_8H

  set timeout inactivity 28800

rserver host TN3270_3RDPARTY-SERVER1

  ip address 10.10.20.11

  inservice

rserver host TN3270_3RDPARTY-SERVER2

  ip address 10.10.24.11

  inservice

serverfarm host TN3270_3RDPARTY

  failaction purge

  predictor leastconns

  probe TN3270

  rserver TN3270_3RDPARTY-SERVER1

    inservice

  rserver TN3270_3RDPARTY-SERVER2

    inservice

class-map type management match-any L4_REMOTE-MGT_CLASS

  2 match protocol telnet any

  3 match protocol ssh any

  4 match protocol icmp any

  5 match protocol http any

  7 match protocol snmp any

  8 match protocol https any

class-map match-all TN3270_3RDPARTY

  2 match virtual-address 10.20.128.111 tcp any

policy-map type management first-match L4_REMOTE-MGT_POLICY

  class L4_REMOTE-MGT_CLASS

    permit

policy-map type loadbalance first-match TN3270_3RDPARTY-POLICY

  class class-default

    serverfarm TN3270_3RDPARTY

policy-map multi-match TN3270-INTERFACE-POLICY

  class TN3270_3RDPARTY

    loadbalance vip inservice

    loadbalance policy TN3270_3RDPARTY-POLICY

    loadbalance vip icmp-reply active

    appl-parameter http advanced-options REBALANCE

    connection advanced-options TCP_IDLE_8H

interface vlan 32

  description TN3270 SERVER VLAN

  ip address 10.30.2.2 255.255.255.0

  alias 10.30.2.1 255.255.255.0

  peer ip address 10.30.2.3 255.255.255.0

  no icmp-guard

  access-group input ACL1

  service-policy input L4_REMOTE-MGT_POLICY

  no shutdown

interface vlan 33

  description TN3270 CLIENT VLAN

  ip address 10.20.128.11 255.255.255.0

  alias 10.20.128.10 255.255.255.0

  peer ip address 10.20.128.12 255.255.255.0

  no icmp-guard

  access-group input ACL1

  service-policy input L4_REMOTE-MGT_POLICY

  service-policy input TN3270-INTERFACE-POLICY

  no shutdown

ip route 10.10.20.0 255.255.252.0 10.30.2.12

ip route 10.10.24.0 255.255.252.0 10.30.2.13

ip route 0.0.0.0 0.0.0.0 10.20.128.1

ACE_3270_for_cisco.jpg

6 Replies 6

mwinnett
Level 3
Level 3

Its possible that the telnet probe operates slightly differently between ace and csm in terms of how it checks the welcome message. However, if that was an issue, then I would expect it never to work on the ace. You are really going to have to span vlan 32 across sw3 or sw4 and see what happens when it fails.

Matthew

I did span sw3 and sw4 and the trace shows timeout on the SYN.

I do not see any reason why the SYN would timeout. As we move things back to CSM everything becomes smooth.

Note: the PBR is pointing towards the alia on the active ACE.

I also see in the trace that both primary address and secondary address of the ace sends probes.

We tried to move them on ACE on 2 different occassions with the result.

Note: That ACE has few other contexts tht works just fine.

We will be verifying the cables to see if they are OK.

thanks

This may be a long shot but do you have these vlans configured in any other contexts of the ACE? If so can you run the command "show np 1 interface iflookup" on both the active and standby in the Admin context.

pay note to the "Hostid: X" value. If both ACE show the same value for X then this is the classic shared vlan problem where both ACE are using the same MAC for the physical interface. Keep in mind that this is only an issue if you have the same vlan in more than one context.

If this is the case you can look at the link below for more details. You would then need to hard set the mac addresses with the commands "shared-vlan-hostid x" peer shared-vlans-hostid y" values between 1-16.

http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/vA2_3_0/configuration/rtg_brdg/guide/vlansif.html#wp1025243

Output from my lab with this command. In this case it is Hostid:0.

MR0317-6500-2-ACE-8/Admin# show np 1 interface iflookup

First burnt-in MAC: 00:30:f2:75:79:fb

Last  burnt-in MAC: 00:30:f2:75:79:ff

No of burnt-in MACs: 7

Hostid: 0

Shared vlan macs currently in use (offset from 0): 0-15

Vlan-vmac indexes currently in use: 0-4

Flags:  Valid shared bridged ftstatus ssl-test normalization icmp-guard switch-m

ode ftvlan remove-eth-pad no-of-lifs MR0317-6500-2-ACE-8/Admin# show np 1 interface iflookup
First burnt-in MAC: 00:30:f2:75:79:fb
Last  burnt-in MAC: 00:30:f2:75:79:ff
No of burnt-in MACs: 7
Hostid: 0
Shared vlan macs currently in use (offset from 0): 0-15
Vlan-vmac indexes currently in use: 0-4
Flags:  Valid shared bridged ftstatus ssl-test normalization icmp-guard switch-m
ode ftvlan remove-eth-pad no-of-lifs

hostid is 8 on primary and 4 on secondary. VLAN 32 and 33 have been shutdown on the ACE though as everything has been moved back to the CSM.

I also noticed  that interface vlan 32 is in the admin context with no ip address an is admin down( this is probably something someone forgot to remove). Another context also has vlan 32 allocated but not defined in the context ( that is no interface vlan 32 and ip address etcc).

Looking at the diagram and based on the traces, then my guess it has to be related to the switching infrastructure. When the probe fails, does the syn get to the rserver ?

Matthew

Matthew:

Since I see the SYN on the span ports of SW3 and SW4 then I assume it will make it to the routers R1 and R2 as those are directly connected cables.

Nothing has changed beyond that. Also as the cables(using different cables) get moved back to SW1 and SW2 everyting works fine with 3 way handshake as per trace.

Review Cisco Networking for a $25 gift card