Check your mainboard battery

Build Log:

So let’s talk for a moment about a small part on your mainboard that might have a lot more influence over your system’s stability and even performance than is readily apparent. Given the title, I’m talking about the humble little BIOS battery. That simple and easy-to-overlook CR2032 coin battery on every mainboard.

So why am I bringing this up? Nasira – hence the inclusion of this in that category.

Currently Nasira’s mainboard is the X99 mainboard I bought when I upgraded my wife’s system to X99 back in… 2016. So about 8 years ago. And I upgraded Nasira from its original 990FX platform to X99 only a touch over a year ago. But it had been performing more-or-less solidly up until recently.

This past Saturday I tried to install a TrueNAS update through the web UI. The installation appeared to go well until the system went down for reboot. It never came back up. And the symptoms that appeared made me, at first, wonder if the TrueNAS update had again failed. Immediately I started wondering if I was about to lose a fourth SSD.

Fourth SSD?!?

Recall back in August last year that I was diagnosing some rather odd issues with the system not wanting to boot. After trying to install yet another TrueNAS update. At first I thought the issue was the 10+ year-old power supply when it was actually the 5+ year-old ADATA SSD, so SSD #1.

I replaced it with an Inland 128GB SSD from Micro Center, SSD #2. That died in November, so all of three months, and I replaced it with a Crucial BX500 240GB SSD that I also bought from Micro Center, SSD #3. And that didn’t even make it a month before dying – though I think it was showing SMART errors from day 1. But DOA parts happen.

So now the boot SSD is the HP NVMe SSD that I initially installed as an SLOG, so SSD #4. Anyway…

Like I said, the symptoms made me wonder if I was about to lose that SSD as well. It was still being detected by the system without issue. GRUB would load as well. But attempting to boot into TrueNAS caused the system to hang. Similar symptoms to what I was seeing before.

Except it wasn’t hanging trying to go into the BIOS, telling me the drive was initializing as expected, so this shouldn’t be a hardware failure. (Mostly.)

Corrupted update?

So did the TrueNAS update corrupt the system? That was my initial thought.

So I prepared a Rocky Linux boot drive with the intent of migrating Nasira to that and jettisoning TrueNAS entirely. Except… it wouldn’t boot. The UEFI boot failed to load, and the non-UEFI boot option would halt the system with either an “uncompression error” or “32-bit relocation outside of kernel”. Google searches on the latter pointed to this being a hardware problem.

Strange… Aside from the mainboard, processor, and graphics card, none of the other hardware was really all that old. And there’s no reason to think anything became unseated.

But it would occasionally, though not consistently, fail to POST entirely, not showing a picture and not allowing me into the BIOS. Resetting or powering off completely kicked it back to life and I was able to get into the BIOS to access the boot menu. And sometimes I’d get an error that the overclock failed.

So I pulled Nasira out of the rack and opened her up.

That’s when the glint of the CR2032 caught my attention. How old was that battery? Had it ever been replaced? Likely not. So I popped it out – system was unplugged, so this also did a BIOS reset – and replaced it with a fresh Energizer CR2032.

And on a fresh boot with a fresh battery, the system was a lot more responsive. And stable. And the Rocky Linux install loaded without issue. But I decided to let it boot into TrueNAS, which also came up without issue.

The system has been stable since. And performing better than previously, actually. So that’s something I really should’ve done last year before I put this board into service as Nasira.

So yeah, check your mainboard battery!

Another unexpected benefit! (Update: 2024-04-01)

Today was the first scrub since changing the mainboard battery and… the first with out any checksum errors.

Recall that Nasira is running on a consumer X99 mainboard with an i7-5820k, so no ECC RAM. And I’d been getting checksum errors with every monthly scrub since putting it into service. Until today. So there could’ve been something about the dying CR2032 that led to checksum errors when performing a scrub, which is definitely odd, to say the least.

If I get checksum errors on the May 1 scrub, I’ll update accordingly.

How often to replace it?

The lifespan of the CR2032 on your mainboard is dependent on several factors. The quality of the cell easily being the biggest one. How often the system is running versus how often it’s powered off being another.

And the CR2032 batteries that come with most mainboards aren’t the best quality available. But they also don’t really need to be.

And that no CR2032 cells were included was one thing that stood out when I bought the Machinist mainboards that went into my router and virtualization machine. But these coin cells are also Lithium cells, which generally cannot be included in air freight. But it also meant that I could more-or-less guarantee it gets a high quality battery.

So how can you know if you have a quality CR2032? Look up the brand and see if there is any information as to how long the battery can last in storage. Since all cells lose energy over time – that’s just the nature of it – and lower quality cells will degrade faster. While your system is powered on, that battery is sitting idle, meaning it’s every so slowly degrading.

Typically these cells should last 5 years in storage. Higher quality cells are generally rated for longer – Duracell advertises a 10 year lifespan under ideal storage conditions. While the system is powered off, that battery is being used to power the volatile storage for the BIOS, so it’ll drain faster. But it isn’t drawing a lot of current – we’re talking microamps- so it’ll still take a couple years for an idle mainboard to drain it.

And 5 years has generally been the guideline for how often to replace the battery.

One thing to also point out: you can’t always use idle voltage to determine whether the battery is dead. Nominal idle voltage for a CR2032 is 3V. Sure if it tests well below that with no load, then it’s definitely dead. But it might also register 3V when it is actually dead.

Instead you need to have a resistance load – 1kΩ works fine – on the battery when testing the voltage. You can find DIY solutions online for this. You just need a coin cell adapter, 1kΩ resistor, and a voltage display of some kind.

It’s generally a good idea to replace the battery if the mainboard had been sitting unused for a very extended period of time – e.g. Greg Salazar’s attempt to build an X58 system from a brand new, unused mainboard (the battery absolutely needed replacing, but that wasn’t the only problem). If you’re buying a mainboard on eBay that’s more than a few years old and the seller doesn’t mention in the listing they replaced the battery, just replace it as soon as you get it to avoid it giving you any issues.

Different mainboards will react differently to a dead or dying battery. Some will still POST but may display a warning that the battery needs replaced. And with those boards, it’ll also display a message that the BIOS was reset if the battery completely dies. Some will not POST at all.

And then you might get stability issues like I noticed with the ASUS X99 board in question. And given some stability issues I’ve had playing with the Sabertooth X99 I still have laying around, I might just see if replacing the battery alleviates that issue.

So if you’re noticing some weird system stability issues, it might be worthwhile to change out that battery. A simple, cheap little part that’s quick to replace, and something a lot of us likely overlook. Especially if the mainboard is older – again, I bought Nasira’s mainboard in 2016 – and you don’t recall ever replacing it or you know it’s never been replaced.

Make sure, though, to replace it with a quality brand as well – Duracell and Energizer are who I typically go for – to ensure the new battery will also last.

Make sure as well that you replace the battery with the system unplugged. This isn’t for any kind of safety concern, but so the CMOS is also cleared. Yes, this means you’ll need to change all your settings back and reapply any overclock, but it’ll also ensure the CMOS doesn’t have any potentially-corrupt data.