Ask the Expert: Understanding and Troubleshooting ACE Loadbalanc - Page 2

ciscomoderator · ‎08-03-2012

With Sivakumar Sukumar

Welcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about configuration and troubleshooting the Cisco Application Control Engine (ACE) loadbalancer with Sivakumar Sukumar. The Cisco ACE Application Control Engine Module for Cisco Catalyst 6500 Series Switches and Cisco 7600 Series Routers is a next-generation load-balancing and application-delivery solution. A member of the Cisco family of Data Center 3.0 solutions, the module:

Helps ensure business continuity by increasing application availability
Improves business productivity by accelerating application and server performance
Reduces data center power, space, and cooling needs through a virtualized architecture
Helps lower operational costs associated with application provisioning and scaling

Sivakumar Sukumar is an experienced support engineer with the High Touch Technical Support content team, covering all Cisco content delivery network technologies including Cisco Application Control Engine (ACE), Cisco Wide Area Application Services (WAAS), Cisco Content Switching Module, Cisco Content Services Switches, and other content products. He has been with Cisco for more than 2 years, working with major customers to help resolve their issues related to content products. He holds CCNP and DCASI certification.

Remember to use the rating system to let Sivakumar know if you have received an adequate response.

Sivakumar might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the Data Center sub-community discussion forum shortly after the event. This event lasts through August 24, 2012. Visit this forum often to view responses to your questions and the questions of other community members.

sivaksiv · ‎08-14-2012

Hi Philip,

It has to be on the server itself, changing the default gateway on server to ACE ip should work here.

Regards,
Siva

cscherb · ‎08-15-2012

Is or will ACE loadbalancer be capabel to deal with WebSocket protocoll as defined in RFC 6455 ?

How to deal with stickiness in this area ? My on lab experiments are showing that ip based stickniess is working with ACE software version A4(1.0) - but SessionID based stickiness is not possible.

sivaksiv · ‎08-15-2012

Hi,

Thanks for your question.

There are no immediate plans to support websocket on ACE and no roadmap available yet. I can tell from previous documented cases and from my personal experience on cases I've handled, there is a particular requirement which seems to be very important for WebSocket traffic.

As WebSocket requires stickiness, to enable all connections from a single user to stick to one server and is particularly effective (and sometimes strictly necessary) when the application requires user authentication, as otherwise,
traffic would be bouncing between two or more servers.

The type of stickiness that you would implement depends entirely on your network requirements.

Since ACE does not have any specific knowledge of the WebSocket protocols, it doesn't have the capability to do deeper protocol inspection but it seem to work for generic Connection based Level 3 and 4 load balancing which I believe you have already tested in your LAB.

You can also get in touch with your cisco internal contact, share the use case and more details to help assist on your requirement.

Regards,

Siva

Akhtar Samo · ‎08-15-2012

Hi Siva,

Good to see the ACE discussion in the Experts Corner. My query is if there is any permanent fix to CSCsz65679 which causes ACE-20 to crash couple of times in a year ? I have noticed that RMA is not a fix for the problem neither the image upgrade. One of our customer had 10's of ACE-20s and neither RMA nor the upgrades fixed the 'NP Control Store Parity Error', so far they have observed around 10 total ACE-20 crashes on different modules in 3 years of time. The upgrades only reduces the crash frequency, probably due to explicit reload during upgrades which refreshes all the buffers.

I believe this might be an issue with the ACE-20 architecture ? similar issues have not been observed on ACE-30.

Regards,

Akhtar

sivaksiv · ‎08-16-2012

Hi Akhtar,

Thanks for your question.

Sorry your customer had to experience too many crashes due to parity issue.

First let me expalain few things about SRAM. SRAM parity error presented in the core file is not due to a software issue. The issue is the result of a "bit-flip" within the SRAM itself which can occur as a result of environmental conditions. This "bit-flip" is rectified by a simple reboot of the system, which would occur with the generation of the core file.Our testing has shown that these type of issues can occur with very low frequency and if a particular module experiences a significantly higher failure rate and you are running a version which has all the possible workarounds for CSCsz65679 then a proactive RMA could be in order.

ACE20 is susceptible to this because of the way it uses SRAM to store control information and packet data as opposed to scratch-pad storage. Almost any 1-bit flip will be detected as a parity error.

Unfortunately, SRAM's are very sensitive to light, dust, radiation, shock, temperature,... so it is possible to get an SRAM parity error on an healthy ACE.

You are right about ACE30, neither ACE4710 or ACE30 are affected by these issues as the design does not use sram or nitrox.

Also note that we have EOL notice for ACE10/20:

http://www.cisco.com/en/US/prod/collateral/modules/ps2706/end_of_life_c51-674430.html

Regards,

Siva

eng__mohamed · ‎08-18-2012

Hi Siva

i have two ace module , the standby one is reload sudden , how can i know the cause of this

sivaksiv · ‎08-18-2012

Hi,

Thanks for your question.

I understand the standby ACE had an unexpected reload, do you see any crash info generated under "dir core:" after reload. If so please send those files to me to determine the reason for reload.

Otherwise you can raise a tac case and attach the following information for our analysis to determine the root cause.

1- 'show tech' on the switch

2- 'show tech' on the Admin context on the ACE

3- Logs on the switch covering the period when the reload happened.

4- Crash files from ACE located under "dire core:"

Let me know if you have any qusetions.

Regards,

Siva

eng__mohamed · ‎08-19-2012

Hi Siva

I atteched the requied files , but regarding to the crash info , i didnt find crash info for the reload date ( 18 Aug 2012 11:41 PM )

Thanks Siva

Best Regards

Mohamed Abd EL Razik

sivaksiv · ‎08-19-2012

Hi,

Thanks for providing the data.

This looks like a silent reboot and SUP initiated the reload.

However the information doesn't really explain why it happened. Silent reboots are tricky as they don't leave much data to work with.

Here is the defect that we logged to track the silent reboot. With high probability a SW upgrade will be necessary as few bugs related to silent reloads have been fixed in A2(3.3) and current version is A2(3.5)) and then monitor device.

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsy91540

There is an action plan to determine if this was traffic related L7 or management traffic ANM, XML, SNMP... which may be filling up the resources on ACE that caused the reload.

I can send you the detailed action plan via PM if reqiured.

Let me know if you have any questions.

Regards,
Siva

Paul Pinto · ‎08-20-2012

Hi Siva,

Just a note on versions available. We recently appear to have run into the following Bug and had to downgrade to version A2(3.3) as removing our HTTP health probes did not seem like a workable solution for us.

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtz47825

Once downgraded the paired modules stabalised (no longer re-loaded continuously). Both modules were in this state.

Just thought would provide some input.

Thanks.

Paul

sivaksiv · ‎08-20-2012

Hi Paul,

Thanks for your question.

Its good to know that the devices are stable now after downgrading to A2(3.3) and I am able to track down the TAC case you reported recently on this issue.

Looking into the bug, we had this issue reported mainly on version A2(3.5) in the past and we are working on reproducing the issue on different code versions to find out the reason for memory corruption.

We will have the fix after we successfully reproduce the problem and it has been updated with fixed version as A2(3.7).

Let me know if you have any questions.

Regards,
Siva

eng__mohamed · ‎08-20-2012

Hi Siva

thanks for the information

kindly send me the detailed action plan to determine if this was traffic related L7 or management traffic ANM, XML, SNMP

Regards,

Mohamed

sivaksiv · ‎08-20-2012

Hi Mohamed,

Sent you the information via PM. Please check.

Regards,
Siva

eng__mohamed · ‎08-20-2012

Thanks Siva for your support

Regards

Mohamed

ganessub · ‎08-20-2012

Hi Siva,

I am attaching the running-config of the ACE which is currently under test in the lab.

As you can see VLAN - 20 is configured to the Client Side & VLAN-30 is configured on the server side.

I am not able to ping the ACE Interface IP address : 2092:dead:beef:cafe::3 from the Cisco Switch ( 7k ) whose interface is connected to the ACE on VLAN-20.

Any idea if this is normal behavior (or) is there any configuration mistake ?

Thanks !!

hostname ACE-4710

interface gigabitEthernet 1/1

description *** Interface connecting to the UUT-Switch-7k (WS-C7206X) ***

switchport access vlan 20

no shutdown

interface gigabitEthernet 1/2

description *** Interface connecting to the serverfarm ***

switchport access vlan 30

no shutdown

interface gigabitEthernet 1/3

description *** UNUSED***

no shutdown

interface gigabitEthernet 1/4

description *** UNUSED***

no shutdown

access-list everyone extended permit ip any any

access-list everyone extended permit pim any any

access-list everyone extended permit icmp any any

rserver host CNR

ip address 2092:dead:beef:cafe::90

inservice

rserver host CNR-IPv4

ip address 172.27.167.13

inservice

rserver host NMS

ip address 2092:dead:beef:cafe::999

inservice

serverfarm host LABSERVERS

rserver CNR

inservice

rserver CNR-IPv4

inservice

rserver NMS

inservice

! Layer-3 Traffic

class-map type management match-any MGMT

match protocol telnet any

match protocol https any

match protocol http any

match protocol xml-https any

match protocol ssh any

match protocol icmp any

! Layer-4 Traffic

class-map match-all slb-vip-LABSERVERS

match virtual-address 2092:dead:beef:cafe::1 any

! Layer-3 Class-Map defining source traffic. This traffic macthes server initiated

policy-map type management first-match MGMT_POLICY

class MGMT

permit

policy-map type loadbalance first-match LB_POLICY_LABSERVERS

class class-default

serverfarm LABSERVERS

policy-map multi-match CLIENT-VIPS_LABSERVERS

class slb-vip-LABSERVERS

loadbalance vip inservice

loadbalance policy LB_POLICY_LABSERVERS

loadbalance vip icmp-reply active

loadbalance vip advertise active

interface vlan 20

description "Client Interface"

bridge-group 1

access-group input everyone

service-policy input CLIENT-VIPS_LABSERVERS

service-policy input MGMT_POLICY

no shutdown

interface vlan 30

description "Server Farm"

bridge-group 1

service-policy input CLIENT-VIPS_LABSERVERS

service-policy input MGMT_POLICY

no shutdown

interface bvi 1

ipv6 enable

ip address 2092:dead:beef:cafe::3/64

description "Client-Server Bridge Group"

no shutdown

ip route ::/0 2092:dead:beef:cafe::2

username admin password 5 $1$Hh4K/EuN$J9mu8qUJbebWixnC5Wxpo1 role Admin domain

default-domain

username www password 5 $1$9yHPLof8$RZrtAsMV26WtOp/q8Ou8L. role Admin domain de

fault-domain

*******************************************************************

On the 7200 switch which is connecting to the ACE :

!

interface GigabitEthernet0/3

description Connected to ACE-E1

no ip address

ip pim sparse-mode

ip igmp version 3

ip ospf 1 area 0

shutdown

duplex auto

speed auto

media-type rj45

negotiation auto

ipv6 enable

ipv6 ospf 1 area 0

!

interface GigabitEthernet0/3.20

encapsulation dot1Q 20 native

ipv6 address 2093:DEAD:BEEF:CAFE::2/64

!

ipv6 route 2092:DEAD:BEEF:CAFE::/64 2092:DEAD:BEEF:CAFE::1

*********************************************************************************************************

I am setting it up for a basic management setup & later on progress to enable more functionalities in the ACE.

Please let me know if there are any mistakes (or) corrections which I might need to make in the configuration.

Thanks !

Ask the Expert: Understanding and Troubleshooting ACE Loadbalancer