Re: Nexus 3000/9000 Toolkit for Device and Network Performance Monitoring - AMA

ciscomoderator · ‎12-07-2020

This event is an opportunity to have your questions answered on tools available for Nexus 3000 and 9000 Series device/network performance monitoring. Michal will cover usage, configuration, output analysis of available CLI features such as iCAM, telemetry, Catena, PLB, latency monitoring, and more.

To participate in this event, please use the button below to ask your questions

Ask questions from Monday 7th to Friday 18th of December, 2020

Featured expert

Michal Kilar is a Technical Consulting Engineer for the Data Center Routing and Switching Team (EMEAR). Michal joined the Cisco Technical Assistance Center (TAC) in May 2012 and since then has acquired extensive knowledge and experience in Nexus products and troubleshooting. He holds a master’s degree in telecommunications, CCIE #38746 in Enterprise Infrastructure, and other design related certifications.

Michal might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the Data Center Switches community.

**Helpful votes Encourage Participation! **
Please be sure to rate the Answers to Questions

Sarah Staker · ‎12-08-2020

Hello Michal,

What’s the difference between Catena “bypass” mode and regular traffic forwarding?

Thank you.

- Sarah

mkilar · ‎12-09-2020

Hello Sarah,

If the traffic matches the specific element of Catena chain – it will be forwarded according to rules specified in this element.

The mode defined at the end of the element is applicable during a failure scenario, when one of the elements in a configured chain fails:

In forward mode – traffic will be forwarding using regular L2/L3 forwarding – just as if Catena wouldn’t be configured,
In bypass mode – just a faulty element of a chain is bypassed, the traffic will be sent through remaining (operational) elements of the chain,
In drop mode – the traffic is dropped.

Best regards,

__

Michal

John Ventura · ‎12-12-2020

Hello,

I have a concern... How can I correctly interpret the output of Active Buffer Monitoring statistics?

TY

mkilar · ‎12-13-2020

Hi John,

Let’s use the following/example output

Nexus3500# show hardware profile buffer monitor interface Eth1/7 detail
Detail CLI issued at: 09/30/2013 19:47:01

Legend -
384KB - between   1 and 384KB of shared buffer consumed by port
768KB - between 385 and 768KB of shared buffer consumed by port
307us - estimated max time to drain the buffer at 10Gbps

Active Buffer Monitoring for Ethernet1/7 is: Active
KBytes                 384 768 1152 1536 1920 2304 2688 3072 3456 3840 4224 4608 4992 5376 5760 6144
us @ 10Gbps            307 614 921 1228 1535 1842 2149 2456 2763 3070 3377 3684 3991 4298 4605 4912
                      ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
09/30/2013 19:47:01      0    0    0    0    0    0    0    0    0    0    0    0    0 250    0    0
09/30/2013 19:47:00      0    0    0    0    0    0    0    0    0    0    0    0    0 252    0    0
09/30/2013 19:46:59      0    0    0    0    0    0    0    0    0    0    0    0    0 253    0    0
09/30/2013 19:46:58      0    0    0    0    0    0    0    0    0    0    0    0    0 250    0    0
09/30/2013 19:46:57      0    0    0    0    0    0    0    0    0    0    0    0    0 250    0    0
09/30/2013 19:46:56      0    0    0    1    0    0    0    0    0    0    0    0    0 250    0    0
09/30/2013 19:46:55      0    0    0    0    0    0    0    0    0    0    0    0    0 251    0    0
09/30/2013 19:46:54      0    0    0    0    0    0    0    0    0    0    0    0    0 251    0    0
09/30/2013 19:46:53      0    0    0    0    0    0    0    0    0    0    0    0    0 250    0    0
09/30/2013 19:46:52      0    0    0    0    0    0    0    0    0    0    0    0    0 253    0    0
09/30/2013 19:46:51      0    0    0    0    0    0    0    0    0    0    0    0    0 249    0    0

The buffer occupancy statistics are presented with one second granularity (one line of the output for each second).

For each second statistics we are presented with multiple counters – each representing certain buffer utilization.

Eg. third counter indicates buffer utilization between 769 and 1152 Bytes.

The buffer utilization statistics are polled every sampling interval (4ms by default).

If during the poll, buffer utilization for a port reached certain level – a counter under corresponding range will be increased by 1.

Eg. within a second 19:46:56 during one of the pools the buffer utilization for port Eth1/7 was between 1152 and 1536 Bytes.

During each poll only one of the available counters might increase.

This means that the sum of the counters for each second (line in the output) will be equal or lower to the number of polls (with 4ms sampling interval we will have 250ms polls within each second).

Also please be advised that each counter is 8bits – can store a value up to 255.

If one lowers the sampling interval (eg. to 1ms – 1000 samples per second) and the buffer utilization will continuously be at the same level (within the same range) – it may happen that more than 255 polls will return the same buffer utilization. In that case corresponding counter will reach 255 and not increase further within a given second.

crazy.xploring · ‎12-12-2020

Context: FHRP

Protocol: VRRP

Platform : Nexus

VRF cofigured per customer - all svi interfaces of a customer are in the same vrf id.

In the classical networking, VRRP peers are expected to be in the same group id. But in the nexus platform, why it expects the svi's belong to the same VRF needs to use identical group id. Here the example.

Nexus-1:

vrf context cus-ID-1

!

interface vlan 10

ip 10.10.10.1/24

vrf member cus-ID-1

vrrp 1

address 10.10.10.254

!

interface vlan 20

ip 20.20.20.1/24

vrf member cus-ID-1

vrrp 1

address 20.20.20.254

Nexus-2:

vrf context cus-ID 1

!

interface vlan 10

ip 10.10.10.2/24

vrf member cus-ID-1

vrrp 1

address 10.10.10.254

!

interface vlan 20

ip 20.20.20.2/24

vrf member cus-ID-1

vrrp 1

address 20.20.20.254

If I try to use different vrrp group id for the svi interface (for example vlan 20 with vrrp 2) , I´m unable to ping vip of that svi.

mkilar · ‎12-13-2020

Hello,

Could you please elaborate a bit on the details of your scenario?

1. What Nexus platform/software are you using?

2. Is it a VPC setup?

3. Are you trying your ping from VRRP standby or some other host in vlan 20 (maybe you can attach a small topology diagram)?

4. Is the VRRP group number for vlan 20 the only thing you change between working and non-working scenario?

In general there is no requirement for all SVIs in a given VRF to use the same VRRP group number.

crazy.xploring · ‎12-17-2020

My apologies for the belated reply.

It is a lab setup, using VIRL images.

1 . What Nexus platform/software are you using?

Nexus 9000v is a demo version of the Nexus Operating System

Software
BIOS: version
NXOS: version 9.3(6)
BIOS compile time:
NXOS image file is: bootflash:///nxos.9.3.6.bin
NXOS compile time: 11/9/2020 23:00:00 [11/10/2020 11:00:21]

Hardware
cisco Nexus9000 C9500v Chassis ("Supervisor Module")
with 7836768 kB of memory.
Processor Board ID 9VPDK0NTLFO

Device name: nxs95K-02
bootflash: 4287040 kB
Kernel uptime is 0 day(s), 1 hour(s), 29 minute(s), 0 second(s)

Last reset
Reason: Unknown
System version:
Service:

plugin
Core Plugin, Ethernet Plugin

Active Package(s):

2. Is it a VPC setup?

Yes. Back to Back vPC

3. Are you trying your ping from VRRP standby or some other host in vlan 20 (maybe you can attach a small topology diagram)? Attached

4. Is the VRRP group number for vlan 20 the only thing you change between working and non-working scenario?

All SVIs vip does not respond to icmp echos from a host.

mkilar · ‎12-18-2020

Hello,

As I mentioned in my previous post, there is no limitation on VRRP which would require the same group number to be used on multiple SVIs.

I did run few quick tests in the lab with N9K-C93180YC-FX running different software versions: 7.0(3)I7(8), 7.0(3)I7(4), 9.2(2), 9.3(5)
I matched the topology to yours: two VPC pairs connected back-to-back.
My test PC was connected in vlan 20 via orphan port to device which corresponds to nxs93k-02 of your topology.
VRRP active is on nxs95k-01 (both vlan 10 and 20 are in the same/non-default vrf and use vrrp group 1)

With no issues I'm able to ping SVI IPs of all four devices and the vrrp virtual IP.

I believe that in your case issue might be specific to the fact that you use virtualized environment.

If you would like to further narrow down the issue I would consider:
1. Focus on connectivity between server (A) and SVI of directly connected nexus (B).
2. Check if both ends have arps resolved and if the map IP with MAC correctly.
3. When intiating the ping from server - check if it arrives to Nexus (eg. by doing packet capture/ethanalzyer etc.)
4. Is it formatted correctly - (vlan tag, src/dst mac/ip)

Best regards,

__

Michal

whitlelisa · ‎12-16-2020

What’s new with iCAM?

Thanks,

Lisa

mkilar · ‎12-17-2020

Hello Lisa,

Starting from release 9.3(5) iCam is enabled by default – “show run all | i icam”

iCAM now also integrates monitoring of memory usage and leak detection.

This allows generating warnings to users on possible process reloads, which can allow users to prepare and take proper actions before the unexpected incidents occur.

N9k# show icam system <<== new command introduced for monitoring memory usage

Retrieving data. This may take some time ...

==================================================

Info Threshold = 80 percent (default) | <<== default threshold values

Warning Threshold = 90 percent (default) |

Critical Threshold = 100 percent (default) |

All timestamps are in UTC |

==================================================

Process Memory

==============

Process Instance Unit Limit Value Util. Alarm Timestamp

------------------------------------------------------------------------------------------------------------------------------------------

aaa VDC:1 MOD:1 UUID:0x000000B5 PID:30854 Bytes 4294967295 451297280 10.50 None 2020-10-22 22:17:57

acllog VDC:1 MOD:1 UUID:0x0000023C PID:1328 Bytes 25223122944 446713856 1.77 None 2020-10-22 22:17:57

aclmgr VDC:1 MOD:1 UUID:0x00000182 PID:30859 Bytes 25223122944 462151680 1.83 None 2020-10-22 22:17:57

…

Legend:

acllog – process name

Threshold values can be tuned with:

N9k(config)# icam monitor system threshold info 40 warning 60 critical 90

N9k(config)# show icam system

Retrieving data. This may take some time ...

==================================================

Info Threshold = 40 percent (configured) |

Warning Threshold = 60 percent (configured) |

Critical Threshold = 90 percent (configured) |

All timestamps are in UTC |

==================================================

Best regards,

__

Michal

Olipo · ‎12-16-2020

Hi,

How can PLB be used for analytics ?

Thanks for your reply, Oliver.

mkilar · ‎12-17-2020

Hello Oliver,

PLB (Pervasive Load Balancing) is a feature, which allows the fabric to perform as a load-balancer.

The main functionality of the feature is to distribute the load between resources (servers, vms, containers) scattered throughout the entire fabric.

In parallel to the actual load balancing, the feature can also be collecting significant amount of analytics data.

In order to do so, you need to enable the collection of data on a per PLB service basis using “plb analytics service_name”

The information can be viewed using “show plb analytics …” commands.

Best regards,

__

Michal

Didier M · ‎12-17-2020

Hi everybody !

I have one question, can we configure Catena to redirect between multiple vrfs ?

Didier

mkilar · ‎12-18-2020

Hello Didier,

There is a strict requirement that all nodes within a device-group must belong to the same vrf – vrf name needs to be specified under device-group configuration. So, in this aspect we are not allowed to “mix” multiple vrfs.

However, the inter-vrf routing is supported with Catena.

What I mean by that is that traffic arriving on port-group belonging to one vrf (A) can be redirected to device-group belonging to another vrf (B)

For example in the below picture, in one of the chain sequences traffic ingressing on interface Eth1/5, which belongs to vrf A, will be redirected out the interface Eth1/7 in vrf B.

Best regards,

__

Michal