cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
582
Views
2
Helpful
7
Replies

How to determine API load on an APIC

sdavids5670
Level 2
Level 2

I have a controller which is very slow to respond via web GUI and I suspect that maybe one or more devops or ops teams is hammering it with API calls.  Is there a way to quickly determine how much of the system resources are getting consumed by API calls from external sources?

7 Replies 7

Robert Burns
Cisco Employee
Cisco Employee

A few questions to help you out.

1. Detail your enviornment.  Versions of SW, which APIC platform (M2/M3/M4 etc), size of fabric etc.

2. Is this a single Controller Cluster, or multiple (3+)?  If this is a multi-controller cluster, do you experience the same "slowness" with the UI of each controller?   Reason I ask, is API calls are processed only by the controller they are directed to, so if you had an ops team hammering the API, pounding one controller, would not significantly impact the others.

There is an ability to throttle API requests, but first let's get aboved answered.  https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/cisco-aci-support-for-nginx-rate-limit.html

Robert

 

@Robert Burns 

 

We have 5 controllers (in a 2 POD setup) running 5.2(7f) and only the first controller is sluggish so that's why I suspect that it's a case where the ops/devops ppl are all just picking on the first controller and leaving the others alone.  I'm just hoping that there's a statistic somewhere that I can pull up that shows API call statistics which I can compare across all of the APICs to either prove or disprove my theory.

Ok good datapoint.  So let's prove your assumption, then you can foucs on haivng your DevOps folks to spread out their API requests across the controllers (either statically or using a LB).

A simple check you can do against APIC1 vs. APIC2/3/4/5, is check the usage of NGINX (service that processes API requests) by using 'top'.

DCVLab-APIC1# top | grep nginx
22142 root 20 0 10.4g 9.7g 276708 S 0.9 15.6 5196:50 nginx.b+
22142 root 20 0 10.4g 9.7g 276708 S 2.4 15.6 5196:50 nginx.b+
22142 root 20 0 10.4g 9.7g 276708 S 1.0 15.6 5196:50 nginx.b+
22142 root 20 0 10.4g 9.7g 276708 S 1.3 15.6 5196:50 nginx.b+
22142 root 20 0 10.4g 9.7g 276708 S 5.6 15.6 5196:50 nginx.b+

CPU % Usage
Memory % Usage

Another option is to check the hits against the access.log which monitors all API requests coming into nginx service. 
DCVLab-APIC1# pwd
/var/log/dme/log
DCVLab-APIC1# tail -f access.log

127.0.0.1 (::ffff:10.0.0.1) - - [14/May/2024:10:28:44 -0400]"GET /api/class/fabricPod.json?page-size=75000&page=0 HTTP/1.1" 200 236 "-" "python-requests/2.31.0"
127.0.0.1 (::ffff:10.0.0.1) - - [14/May/2024:10:28:44 -0400]"GET /api/mo/topology/pod-1.json?query-target=children&target-subtree-class=fabricNode&page-size=75000&page=0 HTTP/1.1" 200 2921 "-" "python-requests/2.31.0"
127.0.0.1 (::1) - - [14/May/2024:10:28:46 -0400]"GET /api/node/class/topSystem.json?query-target-filter=eq(topSystem.role, \x22controller\x22) HTTP/1.1" 200 1337 "-" "DC/Go-http-client/1.1"
//snip

log_format proxy_ip '$remote_addr ($http_x_real_ip) - $remote_user [$time_local]'
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';

Again, you're trying to compare APIC1 with other controllers activity levels.  Let us know what you find.

Robert

@Robert Burns 

user@APIC1:log> cat access.log | grep --count "14.May.2024.*GET"
11502
user@APIC2:log> cat access.log | grep --count "14.May.2024.*GET"
7028
user@APIC3:log> cat access.log | grep --count "14.May.2024.*GET"
5496
user@APIC4:log> cat access.log | grep --count "14.May.2024.*GET"
7157
user@APIC5:log> cat access.log | grep --count "14.May.2024.*GET"
6059

From this it doesn't look like APIC 1 is being used that much more heavily than the others.  At least not to the point where it'd account for the significantly more laggy behavior that I'm seeing in the web GUI.

I dont think you can, as the UI and API use the same resource pool, so all you would see is CPU, Mem usage etc.. but not see how this is broken down by UI/API etc..

Please mark this as helpful or solution accepted to help others
Connect with me https://bigevilbeard.github.io

sdavids5670
Level 2
Level 2

This doesn't seem right...

 

APIC1# show controller detail | egrep ^ID
ID : 1*
ID : 2
ID : 3
ID : 4
ID : 5
ID : 21~
APIC1# show stats granularity 1h communication controller 1
Start Time Counter Value Unit
-------------------- ---------------------------------------- -------------------- ------------------------------
2024-05-14 09:03:53 Current active connections 2 connections
2024-05-14 09:03:53 Current reading connections 0 connections
2024-05-14 09:03:53 Current waiting connections 0 connections
2024-05-14 09:03:53 Current writing connections 1 connections
2024-05-14 09:03:53 Total accepted connections 1,477 connections
2024-05-14 09:03:53 Total handled connections 1,477 connections
2024-05-14 09:03:53 Total requests 1,481 requests
APIC1# show stats granularity 1h communication controller 2
Start Time Counter Value Unit
-------------------- ---------------------------------------- -------------------- ------------------------------
2024-05-14 09:03:55 Current active connections 0 connections
2024-05-14 09:03:55 Current reading connections 0 connections
2024-05-14 09:03:55 Current waiting connections 0 connections
2024-05-14 09:03:55 Current writing connections 0 connections
2024-05-14 09:03:55 Total accepted connections 140,736,074,219,641 connections
2024-05-14 09:03:55 Total handled connections 129,128,783,236,545 connections
2024-05-14 09:03:55 Total requests 211 requests
APIC1# show stats granularity 1h communication controller 3
Start Time Counter Value Unit
-------------------- ---------------------------------------- -------------------- ------------------------------
2024-05-14 09:04:00 Current active connections 0 connections
2024-05-14 09:04:00 Current reading connections 0 connections
2024-05-14 09:04:00 Current waiting connections 0 connections
2024-05-14 09:04:00 Current writing connections 0 connections
2024-05-14 09:04:00 Total accepted connections 399 connections
2024-05-14 09:04:00 Total handled connections 399 connections
2024-05-14 09:04:00 Total requests 415 requests
APIC1# show stats granularity 1h communication controller 4
Start Time Counter Value Unit
-------------------- ---------------------------------------- -------------------- ------------------------------
2024-05-14 09:03:59 Current active connections 0 connections
2024-05-14 09:03:59 Current reading connections 0 connections
2024-05-14 09:03:59 Current waiting connections 0 connections
2024-05-14 09:03:59 Current writing connections 0 connections
2024-05-14 09:03:59 Total accepted connections 165 connections
2024-05-14 09:03:59 Total handled connections 165 connections
2024-05-14 09:03:59 Total requests 181 requests
APIC1# show stats granularity 1h communication controller 5
Start Time Counter Value Unit
-------------------- ---------------------------------------- -------------------- ------------------------------
2024-05-14 09:03:54 Current active connections 1 connections
2024-05-14 09:03:54 Current reading connections 0 connections
2024-05-14 09:03:54 Current waiting connections 0 connections
2024-05-14 09:03:54 Current writing connections 1 connections
2024-05-14 09:03:54 Total accepted connections 178 connections
2024-05-14 09:03:54 Total handled connections 178 connections
2024-05-14 09:03:54 Total requests

The connection stats listed for controller #2 are insane.  That's got to be a bug.  Controller #2 isn't even the one that is sluggish.  It's controller #1 that I have problems with.

 

Remi-Astruc
Cisco Employee
Cisco Employee

Hi @sdavids5670 ,

Build a curl request similar to what you experience in the slow GUI (e.g. retrieve the list of Tenants), and run it on APIC1 CLI to localhost.

Do the same on other APICs CLI.

If the response time for all is similar, your problem lies between your browser and the APIC1 Mgt interface, then use "traditional" troubleshooting (continuous 1400B pings, traceroute, search for drops on the path, ...).

If response time is higher on APIC1, problem lies definitely on it. Let us know for further help then.

Regards

Remi Astruc

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License