I'd like to consult with you guys regarding an issue I have on an N5K device:
I have an N5K device running software version 5.1(3)N2(1), to which the SSH access is being locked out every couple of days.
When the issue occurs, I connect it using a console cable and see a lot of open connections from our Cisco Prime Infrastructure 1.2.1 system.
In the "show users" N5K command output, I see connections which are opened for almost a month so probably the N5K or PI 1.2.1 does not clear them.
When I login to the Linux shell of the PI 1.2.1, and run the command netstat -ano, I also see a lot of open SSH sessions to the IP address of the Nexus device.
These connections remain open unless I shut down the services of PI or disable/enable the SSH feature on the N5K.
To Further investigate the issue, I also captured some traffic on the PI server using tcpdump, for a period of about a minute.
I saw no SSH traffic, so clearly these sessions are idle for a lot of time and are not in use now, so no reason for them to remain open.
This leads me to the conclusion that either PI 1.2.1 or N5K (or both) does not clear the idle sessions that are not in use anymore, and this, to my opinion, leads to the issue, where the N5K locking the SSH access to the device.
When looking at the output of "show process cpu" on the N5K during the issue, I see 98% idle cpu and the sshd process is consuming 0%, so it is not stuck.
I found four known bugs which seems to be related:
CSCty90928 - Cisco Prime LMS related bug (not PI 1.2.1) and it is already fixed.
Our LMS system that we use in parallel with PI 1.2.1 is already upgraded to the release which fixes this bug.
I believe that this is the reason why we don't see stale SSH sessions on the N5K, coming from the LMS IP address (we saw that before the upgrade).
On the other hand, I have another customer of mine that has LMS upgraded to the fixed version, and also sees this issue with its N5K devices.
So maybe it is not yet fixed in the LMS side.
CSCty00044 - Categorized as LMS related, although my Cisco SE said it is a mistake and it should be N5K related. Anyway, it is in open (waiting) state.
CSCuc90628 - N5K related, in open (postponed) state.
He said that they were postponed or terminated because Cisco thinks that the issue was fixed in LMS with bug CSCty90928.
Do you know what is the ETA for a fix for these two bugs (CSCuc90628 and CSCty00044), and if the fix will be on the N5K devices or on LMS/PI systems?
Or maybe there are some other related bugs?
Attached are the following files:
ssh_limit_200213173389.gz -> tac-pac from the N5K during the issue.
pi-tech-support -> show tech-support from the PI CLI during the issue.
pi-tcpdump.pcap -> tcpdump takem from the Linux shell of PI during the issue, for a period of about a minute.
netstat_ano.txt -> netstat -ano output from the Linux shell of PI, taken during the issue.
5596-firewall2-usesr+sockets -> output of "show users" and "sh sockets connection tcp" commands from the N5K, taken during the issue.
I'll be greatful for any advice.