Cisco PI - Syslog Streaming Performance

Michael Redbourne · ‎12-02-2022

Good Evening,

Note: I don't belong to an organization who sells Cisco equipment, so my knowledge is limited the documentation made available from Cisco. If this needs to be escalated directly to Cisco support (TAC), I can have our client submit this directly. I've already put in a ticket with our SIEM provider to review performance settings and options.

I'm hoping someone can provide guidance on an issue a client is encountering. They have a substantial product portfolio of various Cisco assets, but the problematic one is the Cisco Prime Infrastructure device. They've requested that we ingest the logs from that asset into a SIEM solution. Shortly after allowing the logs through, we were alerted to large swaths of UDP packets (10-20% day over day) being dropped from the appliance. After some digging, we came up with some base information.

General Problem (CMD: netstat -su)

Specific Bottleneck (CMD: cat /proc/net/udp)

Notes: This tells me there's a performance bottleneck on lo:25224. That port belongs to Microsoft's OMS Agent for Linux package.

UDP Checking (CMD: dropwatch -l kas)

General overview, but a good indication that the OMS Agent isn't clearing the queue fast enough. Dropwatch logs can vary substantially... Sometimes it drops 10k logs in a run, sometimes it drops 500k logs in a run. This seems to happen at evenly spaced times, which was a good indicator that 1 or more log sources was simply dumping some buffer of logs to the Syslog agent, instead of streaming them over in real time.

TCPDump - Identify the Asset (CMD: tcpdump -i ens192 port 514 -A > dump.pcap - wireshark analysis)

This is only a single dump. But we're averaging 1k-2k EPS without any performance issues. Then something around the 210-211s mark in the PCAP dump starts transferring 17k-18k EPS in a short timespan. From the PCAP, right around that mark, there are several thousand packets from a single source, all with the same type of Syslog message - "$TIMESTAMP ERROR [wirelessuser] [seqtaskexecutor-$PID] ERROR: Station entry NULL for ----<MAC>.\n".

I reached out to the client, who informed me the asset was a Cisco Prime Infrastructure device. Given the reliable nature in the timespan in which these logs show up, I'm assuming the asset is storing the logs and then sending them in one go (at least these types of logs... It's possible that some logs are streamed in real time.) So, I have a couple questions:

1. Is the default logging behaviour Cisco PI to buffer and then send some logs?

2. If it is, is there a way to change that default logging behaviour?

3. If it isn't, why is it doing this?

4. If there isn't [a way to change the logging behaviour], is it possible to stop logging specific messages?