cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1512
Views
0
Helpful
8
Replies

UCS C-Series has high CPU Load and IO Delay issues with Proxmox VE

radensun
Level 1
Level 1

Dear All Experts,

I'm having high CPU Load and IO Delay issues on UCS C-Series.
I'm using Proxmox VE (required for existing environment) on several types of servers and I don't experience IO Delay and CPU Load issues like on UCS servers.

The impact of IO Delay and CPU load issues, memory utilities are not optimal, and data read/write throughput is only around 10-30 MB/s, even data transfers in the same server disk or between servers in a LAN network only get 10-30 MB throughput /s.

I have tried and confirmed several actions to deal with this, but did not find a solution, including:
1. Reconfigure RAID and Virtual Drives
2. Reset BIOS and CIMC configurations
3. Upgrading BIOS and CIMC
4. Replace the HDD with a new one
5. Using JBOD mode without RAID
6. Manual installation of Debian 11/12
7. Tuning the system for IO delay issues

I also attach the output of the ATOP command which shows high DISK IO activity as the cause of the above issue, while the SAS HDD is in new condition.

I hope there is a solution from CISCO experts regarding the issue that I experienced. Thank You.

best regards,

Sunardi

 

 

8 Replies 8

marce1000
VIP
VIP

 

  - FYI : https://forum.proxmox.com/threads/high-io-delay.122858/
                Or post in https://forum.proxmox.com/#proxmox-virtual-environment.11 (e.g.) 

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Hi Marce1000,

thankyou for the response.

Previously I had read several forums on that topic. and I've tried BIOS level configurations, such as: ASPM, IOMMU and SRV-IO, to OS level: vm.swappinesses tuning configurations, vm.dirty_ratio, etc. But the IO Delay problem still occurs. And I just experienced IO Delay like this on UCS, it has happened on servers other than UCS but it can be solved by tuning at the system level.

Apart from changing SAS to SSD, does you have any other solutions?

Thank You for the advice.

Sunardi

 

 

 

                    >...Apart from changing SAS to SSD, does you have any other solutions?
   No . my response was initially found through search engines with the subject of the post , I have no experience with UCS servers neither with Proxmox

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Steven Tardy
Cisco Employee
Cisco Employee

What is your exact generation / model / PID server?
(Serial number would be great so I can see what that server shipped with hardware-wise if nothing has changed since.)
What is the disk model / PID?
What is the network NIC/VIC model / PID?
Is the RAID configured with write-back (faster) or write-through (slower)?
How many disks in the RAID group?

Does this issue happen with another (supported) OS installed?

I often work on customer issues like "copying this file over the network is slow".
Need to isolate this issue a bit as this one thing is really 3 things:
+ Reading from source
+ Copying bits over the network
+ Writing to destination
Isolation is often what helps uncover/resolve any given bottleneck.

During a test what does `iostat -x -d 2 5` look like?
What size IOs are you sending?
This could be just that HDDs only provide 150 IOPs, but without more details that is difficult to determine.

Would be best to open a TAC case and let TAC help isolate your issue.

Hi Steven Tardy,
 
Thank you for Your attention, and sorry for my late response. Here are the answers to your questions that I can give.

ASK: What is your exact generation / model / PID server?
ANS: PID: UCSC-C220-M4S & S/N: FCH2147V1WC

ASK: What is the disk model / PID?
ANS: PID: UCSC-MRAID12G (Ctlr) > UCS-HD1T7K12G with HDD Model: ST1000NX0453

ASK: What is the network NIC/VIC model / PID?
ANS: Intel(R) I350 1 Gbps Network Controller

ASK: Is the RAID configured with write-back (faster) or write-through (slower)?
ANS: No, only RAID1 with default configuration. I've also tried converting to JBOD, but the IO Delay issue still occurs.

ASK: How many disks in the RAID group?
ANS: RAID1 with two disk.

ASK: Does this issue happen with another (supported) OS installed?
ANS: Haven't tried it yet, but on older Proxmox VE versions 5 & 6, the IO Delay problem doesn't appear, it's just that the throughput problem when data transmission is stuck at 10-30Mbps, even though the data copy is on the same disk and machine, and this condition also occurs when transmitting data in a LAN.

ASK: During a test what does `iostat -x -d 2 5` look like?
ANS: As I Attached.

ASK: What size IOs are you sending?
ANS: Sorry, I don't understand the size of the IOs in question.

ADVICE: Would be best to open a TAC case and let TAC help isolate your issue.
RESPON: I haven't opened a TAC ticket for this issue yet.

Thank you.

Sunardi

Your `iostat` output shows:

  • sda 75 %util -=> This is high. Anything above about 10-20 the user / app will "feel" slow.
  • sda 70550 rkB/s -=> So this disk is providing 70MBps.
  • sda 43.93 rareq-sz -=> IO size is 22.5 KB/IO.
  • sda r/s 1606 -=> 1600 IOPs is good for sequential reads.

To get the IO size let Google do the math:

44*512Bytes = ? KB

Returns:

44 * 512 bytes =
22.52800 kilobytes

 Looks like 70MBps with 1600 IOPs is at/near the limit of a 7200 rpm drive:
    https://en.wikipedia.org/wiki/IOPS#Mechanical_hard_drives

What disk firmware?

I see CDET CSCvo58565 where the disk returns sense errors on firmware N0A3.
Does the Linux kernel / dmesg show "unexpected sense" or any "sense" errors?

I also helped on one old TAC case where Seagate disks performed slower when reading / writing to multiple parts of the hard drive due to a poor firmware disk elevator seeking algorithm as detailed in CSCve59476.

Try up to date disk firmware to see if that resolves your issues.

Hi Steven Tardy,

It looks like the IO Delay problem really points to the HDD firmware.
Btw, what version of HUU includes an update for the HDD firmware to fix this problem?
Because previously I had upgraded CIMC and BIOS firmware using HUU version 4.1(2l) and C220M4.4.1.2e.0.0615220033. But the IO Delay issue still occurs.

Thank you.

Sunardi

I have the latest version updates of HUU firmware on 1 of 8 servers (as attached) :

https://software.cisco.com/download/home/286281345/type/283850974/release/4.1(2l)

but the IO Delay issue still occurs.

Sunardi

Review Cisco Networking for a $25 gift card