I have two identical UCS C240M3 units. Both are configured exactly the same, same firmware etc.
On one unit, I have a "marginal" fault (overall is green), that points me to the RAID controller.yellow state of the two RAID sets.
In the general section I can see: Current write cahce policy state is: direct.
Battery state is green and no other errors. In my other life with similar products, most of the time I had to find the error in the battery area.
But since this here does not give me any error state, what could it be then?
thanks for a hint.
Which version of firmware?
Which RAID Controller does the system have?
Do you have a Battery Backup Unit (BBU) for the RAID controller?
Cache Degraded is a state that happens normally with some models of RAID controller Battery Backup Units (BBUs). In normal operations, data written to a virtual drive is cached in RAM on the RAID controller for a few seconds before being written to the drive. This lets the virtual drive run faster and gives the application better write latency. The BBU is there to keep the RAM powered on in the event of a power failure so that the data is not lost. Under some circumstances, the BBU is not able to provide power to the RAM. When that happens, the RAID controller changes its caching strategy from "write back" (save to RAM, write to disk eventually) to "write-through" (save to disk immediately). Write-through is slower but still completely functional.
The BBU is not able to provide power if:
(a) it is missing
(b) it has failed or is about to fail
(c) it is undergoing a "learn cycle". A learn cycle involves intentionally draining and recharging the battery to check how much its capacity has dropped. The RAID controller schedules learn cycles automatically about once a month. They last several hours.
We need to determine the exact reason. You can do this by going to the BBU page with the CIMC Web GUI.
On left window, choose "Storage" -> on right window, choose "Battery Backup Unit". Here you should be given more information as to why the cache is degraded.
Thanks for your feedback.
The Raid unit is the LSI Megaraid9266-8i.
I had the same suspicion about the BBU. But as you can see from the screenshot, everything is green (sorry, have not the detail BBU screen this time).
The learn cycle is not running. That I have seen when I checked it.
Time to open a TAC? :-)
Thanks and regards
What was the verdict in that TAC case.
I have created a TAC case i have the same cache degrade after i replaced a defective RAID controller LSI-9271-8i. everything was working fine but then 3 hours later i get the error.
firmware 2.0(1a) UCS C240-M3
It also somehow changed my RAID from RAID 5 to RAID0 I dont know why but that happend.
I have found this link Fig 13 to change throught megacli from RAID0 to RAID 5 but can find the right switches to do the RAID0 to RAID5.
The problem is what impact and how long would this be to migrade RAID config.
The problem is that it is a procution server.
The latency is right now about 1-4 ms sometimes peak about 10-15ms.
In office peak hours it is about 6ms in avarage
It is a 14 300 GB 15K SAS disk.
Thanks for you input
Hello I opend a TAC SR,
It semese that when i did a HUU firmware upgrade the FPGA did not upgrade and the reson to the cache degraded state.
I have to get the HOST OS offline.
Then issue through CIMC SSH connection. Then i could boot up the OS when the CIMC was up and running again and boot the server.
# scope firmware
# scope cimc
# Y to reboot
We have two identical UCS C260s at RCDN9 building in Richardson TX Cisco Data Center. One exhibits the same 'Cache degraded' warning. The raid card is LSI Megaraid 9261-8i.
Both systems do NOT have the BBUs.
However, the other one does NOT show this warning. The one C260 that shows 'cache degraded' is showing all RAID 1 drives are healthy and ESXi5.5 is serving all Virtual Machines fine. vCenter has no warning on it.
Please give your comments.