06-04-2020 08:43 AM - edited 06-04-2020 08:43 AM
Hi all,
(Sorry if this is not the best place to post this question - I don't usually get involved with servers)
We have a C220 M5 (DN2-HW-APL) running DNA Center. On this particular server, after an application update the main Enterprise NIC stops working for a random amount of time and then eventually starts working again.
In the past this has always been < 24 hours but it's currently still down after 1.5 days. :-(
The interface is admin up but operationally down. You can't get the interface to start working usual the usual Linux commands.
The switch port and cables are fine.
I have a TAC case open for that particular issue but we need to grab a large log file from the OS.
With the main NIC down I can't pull a file using SCP (or any IP protocol!)
I am not geographically close to the server but I do have full access via the CIMC.
Is there any funky way I can use the CIMC to grab a file off the OS?
I'm guessing no, but there is no harm in asking the experts!
Thanks!
Matt.
Solved! Go to Solution.
06-07-2020 11:26 PM
Hi Kirk,
Thank you for the information and suggestions.
The ultimate question was whether the CIMC had access to the running OS which you answered. Thanks.
This server is our lab server although it's not been 'messed' with so is just like any other prod install.
We have customers with the same hardware and software who are not affected.
It's running the latest HUU/firmware (4.1f) as I recently updated it.
The affected port is 10G SFP currently connected by a 3M twinax cable.
Nothing in the switch port logs other than the port going down and not coming back up - no errors.
The port has come back up since I raised this. It was down for about 27 hours for no apparent reason.
I still have a TAC case open and they are investigating.
Matt.
06-05-2020 05:09 AM - edited 06-05-2020 10:38 AM
The CIMC (in special debug mode) would only be able to use it's mgmt connection to go out to the network and come back in on the network link the OS is using. The CIMC does not have access to the running OS (that would be viewed as a huge security risk if it did).
I'm not sure which NIC the OS is using, but sounds like an OS related issue if the problem is sporadic.
You might want to try booting a linux on a stick ISO distro, and see if you can repro the problem, if the server/appliance isn't locked down via secureboot.
Also, might want to check what HUU/firmware package can be applied, as that will contain firmware for the Intel x710.
Those are 10Gb copper right?
Does the upstream switch port show any counters of interest?
On a few rare instances, I've seen issues with riser boards that the NIC cards plug into either not be seated correctly or be faulty. That's assuming you still have same issue after replacing the x710-da2 nic, and have exhausted firmware/driver options.
Kirk...
06-07-2020 11:26 PM
Hi Kirk,
Thank you for the information and suggestions.
The ultimate question was whether the CIMC had access to the running OS which you answered. Thanks.
This server is our lab server although it's not been 'messed' with so is just like any other prod install.
We have customers with the same hardware and software who are not affected.
It's running the latest HUU/firmware (4.1f) as I recently updated it.
The affected port is 10G SFP currently connected by a 3M twinax cable.
Nothing in the switch port logs other than the port going down and not coming back up - no errors.
The port has come back up since I raised this. It was down for about 27 hours for no apparent reason.
I still have a TAC case open and they are investigating.
Matt.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide