RAID Controller Battery Failure Impact – Ask the Engineer [VIDEO]
Ask the Engineer
Jack Kauter and Park Place Technologies Field Service Engineer, Yogen Balakrishnan, discuss the question – “What is a cache battery failure and what are the ramifications?” in this month’s Ask the Engineer.
Jack: Hi guys Jack here from Park Place and it’s time for ask the engineer. Joining me today is field service engineer, Yogen Balakrishnan. Yogen, how’s it going?
Yogen: Yeah I’m doing great.
What Does a RAID Controller Battery Do?
Jack: It’s great to hear, Yogen. Today’s question is regarding cache battery failures. It seems this is a common problem, but what is a cache battery failure and what are the ramifications?
Yogen: OK, so this is a very interesting question. So, the function of a RAID controller cache is to help increase IO write performance on a server (learn more about RAID groups in storage and IOps vs. throughput on our blog).
It temporarily caches data from the whole system until it gets successfully returned to the disk.
Impact of Controller Cache Battery Failure
So, without the cache the controller to stop the flow of data until the disk acknowledges that is ready to receive new data, and this slows down the data transfer process.
Now the function of the cache battery here is to actually hold or protect the data in the cache in case of any power failures.
This battery is capable of holding power for up to 72 hours. So, without a backup battery, the data in the cache will be permanently lost and this can lead to data corruption.
Hence, why it’s a protection feature when the battery failure has been detected by the controller. It actually disables the cache, so this avoids data loss from happening and when it gets disabled we start noticing that the IO operations on the server start to slow down since the cache has been disabled.
And, there was a study done on a lab environment where we could see the impact on the performance metrics by up to 70% with the right cache disabled.
How to Avoid RAID Controller Card Battery Failure
Jack: Yogen, how would you avoid a cache battery failure?
Yogen: So, in the field we often see cache battery failures, especially on the older generation servers.
The management software, actually fails to highlight these failures, so we actually miss it.
The cache battery lasts up to three years, so for all the servers I would recommend you periodically log into the servers to do a health check just to make sure that the battery is healthy, and the cache is enabled.
But alternatively, you can actually configure e-mail alerting via SMTP, or better still, use a monitoring solution to pick these faults up (like the Entuity fault management software).
So, there are many monitoring tools available in the market right now. They are capable of, proactively detecting these faults before they occur, and as they occur.
This gives us time to actually plan, the downtime and have the battery replaced to avoid any outage. That’s about it from my side.
Jack: Yeah, I mean great. Thank you again for parting with your knowledge on your topic. There’s some real interesting areas covered, I’m sure it’ll be useful for many out there.
If you’d like to just suggest a question to one of our engineers, please feel free to reach out to us.
We’ll see you next time on Ask the Engineer.