cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
600
Views
0
Helpful
3
Replies

hyperflex 220-M5 error loading esxi

laaaaiiit
Level 1
Level 1

Hello, in short, esxi does not turn on after a hot reboot.

And outputs shell, as shown in the attached picture. And outputs shell, as shown in the attached picture.

This was due to the hyperflex virtual machine (stCtlVM) freezing. She did not respond and consumed 0 resources. Therefore, the entire host did not respond, I did not understand why. Because of all this, all the virtual machines that were on this host did not work. Although they seemed to be active in vsphere.

I had to do a hot reboot of the host. After that, it does not turn on.

The question is how do I restore the esxi bootloader this is perfect. I would like to do without completely reinstalling esxi, since then I will have to think about how to add it to hyperflex.

Thank you all in advance for the answers!

3 Replies 3

naveeku6
Cisco Employee
Cisco Employee

Hi @laaaaiiit ,

 

I'm sorry to hear that you're experiencing issues with your HyperFlex virtual machine freezing. Virtual machine freezes can be caused by various factors, such as resource constraints, software bugs, or configuration issues. Here are a few steps you can take to troubleshoot the problem:

1. Check resource utilization: Verify if the virtual machine has enough CPU, memory, and disk resources allocated to it. You can use your virtualization management software to monitor the resource utilization.

2. Review logs: Check the logs of the virtual machine and the underlying hypervisor for any error messages or warnings that might indicate the cause of the freeze. Look for any specific error codes or patterns that can help narrow down the issue.

3. Update software and drivers: Ensure that the virtual machine's operating system, as well as the hypervisor and its associated drivers, are up to date. Software bugs or compatibility issues can often be resolved by applying the latest updates.

4. Check for conflicts: Determine if there are any other virtual machines or processes running on the same host that might be causing resource contention or conflicts. Temporarily suspending or migrating other VMs can help isolate the issue.

5. Review virtual machine settings: Verify the virtual machine's configuration settings, such as the number of virtual CPUs, memory allocation, and disk settings. Adjusting these settings might help improve performance and prevent freezing.

6. Consider hardware issues: If the problem persists, it's worth checking the hardware health of the host server and the storage infrastructure that the virtual machine is running on. Faulty hardware components can cause freezing or performance issues.

If you're still unable to resolve the freezing issue, it might be helpful to contact the technical support team for your virtualization platform or HyperFlex system. They can provide more specific guidance based on your environment and configuration.

 

-----------------------------------------
If you find my reply solved your question or issue, kindly click the 'Accept as Solution' button and vote it as helpful.

You can also learn more about Cisco Hyperflex through our live Ask the Experts (ATXs) session. Check out this ATXs Resources [https://community.cisco.com/t5/data-center-and-cloud-knowledge/cisco-aci-ask-the-experts-resources/ta-p/4394491] to view the latest schedule for upcoming sessions, as well as the useful references, e.g. online guides, FAQs.
-----------------------------------------

Thanks for your reply.

The problem was that we are not running in esxi, which was standing on a bare spot, one of the cisco hyperflex 220 m5 nodes.

The items that you described almost completely do not fit my first message.

And this was not the first time with the fall of esxi on a cluster of this type.

And the fall of one of the hosts critically loads the person who is engaged in this cluster because of the restoration of the host, which is too problematic an action.

The only thing I realized when I started to figure it out was that one of the M.2 disks, or something like that, was lost in operation after 1.5 years of use, on which esxi seemed to be.

Correct me if I'm wrong. I haven't been physically convinced yet since I need to be there to disassemble the host and make sure.

I don't quite understand the cisco IMC structure. Therefore, I can't find the location of the physical disks programmatically and make sure that all the disks are alive and maybe the disk configuration has just been reset (virtual disks created through a raid controller and the like). I also can't figure out how to work with a raid controller.

After I wanted to put esxi behind me, since there was not a single fs disk in the shell boot. I realized that the esxi image does not see any esxi image on all the disks of the host. So either one of the physical disks died, or the virtual one on which esxi was standing. I'm talking about M.2 disks, because I think there should be 2 of them. And the one that was faulty, there was already 1 m.2 disk.

Well, the briefest solution is that with each such problem, I need to additionally download hyperflex recovery for myself or another administrator. It was decided to switch completely to the vmware + vSAN solution. Yes, there are some problems with when there is a large overload of the cluster over the network. But if there is a similar or other problem, it is solved simply by reinstalling esxi in 5-10 minutes without any problems and additional settings, and also eliminates eating +-50G of RAM per host. Which of course is also a huge plus.

All I want to know and can't find on the Internet is how to log into the controller raid via bios or another way so that I can see all the physical disks, and not their logical or virtual created images. Is there any way to do this?

Hi @laaaaiiit ,

 

I understand that ESXi is falling out from Hyperflex server and there could be multiple reasons for this type of issue. I highly recommend to raise a TAC case to investigate the issue and out technical assistance team would help you to fix the issue. please raise the TAC case.