cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

Community Helping Community

Datacenter troubleshooting guide - day 6

3941
Views
5
Helpful
1
Comments
Cisco Employee

"Datacenter troubleshooting guide” – a blog by  Gilles Dufour.

Day 6 - Understanding me-sats

This week I was aked to give some information regarding the me-stats.

This is a large topic but it is indeed an important part in the troubleshooting process.

I could simply send you the meaning of each counter, but I think it would confuse you or create some panic.

Instead, I'm going to focus on the most important counters and I will give you the steps I use to identify the important ones.

First, we need to review the design of the ACE module as we need to understand the path followed by a packet inside the blade to know which counters to look at and when.

The module is divided in 2 parts as I mentioned in a previous blog.  The Control Plane or CP, and the Data Plabe or DP.

DP is itself divided in several Micro Engines or ME.

Each ME has a specific function:

MEFunction
RXReceives all incoming traffic and buffers data
FastPathProcess all incoming packets, tries to match them to existing connections and direct traffic to other ME's
ICMInbound Connection Manager receives all packets not matching an existing connection and applies configured action

OCM

Outbound Connection Manager processes packets going out, applies outbound ACL and NAT
TCPFor connections terminated by ACE, handles TCP 3-way handshaked, TCP options, ...
HTTPAll http functions - matching cookies, url, ...
ReassemblyProcesses fragmented IP packets.

So, when troubleshooting your ACE module, or looking for its status, or checking performance, you typically have to follow the ME in the same order as the path of a packet inside the blade.

First ME is RX which will buffer all incoming traffic.

switch/Admin# show np 1 me-stats "-srx -v"
Receive Statistics: (Current)
------------------
Idle:                                      37253614        100861
Frames Received:                           69012481            73
Control Frames Received:                   25552296            39
Forward Buffered:                          69012481            73
Post stalls:                                      0             0
Packet drops:                                     0             0
Error(bad rbuf):                                  0             0
Error(rbuf parity):                               0             0
Error(rbuf skip):                                 0             0
Error(missing eop):                               0             0
Error(missing sop):                               0             0
Error(data buf alloc fail):                       0             0
Error(control buf alloc fail):                    0             0
Last bad RBUF control word:                       0             0

From this first ME we can collect some very important information like the traffic rate for this IXP.

In this example we can see 73 packet/sec.

We can also see that all "frames received" where successfully buffered ("Forward Buffered").

When your box is under heavy load, you start seeing "Post stalls"  and "Packet drops".

If you do get post stalls and packet drops, it means the other ME's can't keep up with the level of traffic.

You need to modify somes rules from L7 to L4 or reduce the amount of traffic - by going active-active for example.

The next ME in the path is fastpath.

switch/Admin# show np 1 me-stats "-sfp -v"
Fastpath Statistics: (Current)
-------------------
Errors:                                           4             0
FPTX Hi Priority receive:                  25557857            39
Fastpath pkt received:                     76857092            80
FPTX receive:                              43469595            35
FastTX receive:                             7575207             6
SlowTX receive:                              254859             0
Packets transmit to hyperion:              12263591             9
Packets punt to CP:                        13835061            13
Packets punt to Nitrox:                      254800             0
Packets punt to Daughtercard:                     0             0
Packets punt to other IXP:                    17357             1

Packets transmitted (loopback):                   0             0
Debug packet copy to CP:                          0             0
Packets forward to ICM:                     8617882             6
Packets forward to OCM:                           0             0
Packets forward to TCP:                           0             0

Packets forward to Fragmentation:                 0             0
Packets IPCP forward:                           102             0
Large buffer TX count:                            0             0
WARN: TX Packet too small:                        0             0
DROP: Packet too big error:                       0             0
DROP: Connection Miss:                            0             0
DROP: Bad connection route:                       0             0
DROP: RX Interface miss:                   12632166            11
DROP: Out of buffers:                             0             0
DROP: Unknown Msg received:                24079409            37
DROP: Bandwidth rate policed:                     2             0
Close request Sent:                         1061629             0
Packets dropped (encap invalid):                  0             0
Close request Sent: (encap mismatch):             0             0
Packets forward to SSL-ME:                        0             0
Packets forward to SSL-XScale:               254800             0
Ack trigger msgs sent:                            0             0
DROP: TO CP rate policed:                         0             0
Wait for empty TFIFO:                           306             0
FastQ Transmit Backpressure:                      0             0
SlowQ Transmit Backpressure:                      0             0
Hyperion Transmit Backpressure:                   0             0
Drop: Transmit Backpressure:                      0             0
Drop: Virtual MAC packets to standby:          1660             0
Drop: Shared MAC in non-shared interface          0             0
Drop: Next-Hop queue full:                        0             0
Drop: Diag to SSL-ME:                             0             0
Diag packets forwarded to SSL-ME:                 0             0
Drop: Invalid IMPH Destination:                   0             0
Drop: Invalid IMPH Next-Hop:                      0             0
Drop: IP DF bit set:                              0             0
Drop: No fragmentation of L3 Encap :              0             0
FastPath Jumbo pkt retransmit on BP :             0             0
Drop: exceed buffer threshold limit:              0             0
(Context ALL Statistics)
Packets forward to Reassembly:                    0             0
Packets forward to XScale:                  4900258             3
DROP: Connection Route:                        1660             0
Packets forward, reproxy:                         0             0
Packets forward, reproxy w/trigger:               0             0
Drop: Invalid connection hit:                     0             0
Drop: Reproxy out of order:                       0             0

All traffic has to go through RX and Fastpath.

But after FP, packets can go in different directions.

For example, they can be sent to CP (ie: probes, ssh, telnet,...)  - this is counted under "Packets punt  to CP".

Once again the number if the right most column is the packets/second.

Another possible direction is "Packets  transmit to hyperion".  This is the traffic sent out of the module.  Back to the cat6k.

Packets can also be forwarded to other ME's like ICM ("Packets  forward to ICM"), OCM ("Packets  forward to OCM") or TCP("Packets  forward to TCP").


Two interesting counters to monitor for indication of performace issue are :

Drop: Next-Hop  queue full:

Drop: exceed  buffer threshold limit:

First one is an indication that one of the next ME (ICM, OCM, or TCP) is not drainig its queue fast enough and therefore traffic is dropped.

The 2nd counter indicates that we're running out of buffers and traffic is dropped as a preventive measure to avoid total collapse.

You can see the level of buffer utilisation with the followin me-stats command :

switch/Admin# show np 1 me-stats "-scommon"
Common Statistics: (Current)
------------------
Internal buffers allocated:                70643567            72
Internal buffers released:                 70640804            71
External buffers allocated:                  767167             0
External buffers released:                   763567             0
Hash lock contention count:                      50             0
X TO ME Pkt count:                          4861866             3

switch/Admin#

To know the amount of buffer currently used by the system, you need to substract number of buffers released from number of buffers allocated.

In this case : 70643567 - 70640804 = 2763.

We have 256k buffers and 2 thresholds.  We drop new connections at 192k buffers used and we drop packets at 224k.

Next I will continue with ICM, OCM, TCP, ...

Gilles Dufour

1 Comment
Cisco Employee

Hi Gilles,

As usual excellent document. Very helpful and informative.

Regards,

Kanwal

CreatePlease to create content
Content for Community-Ad
FusionCharts will render here