Cordelia

Seems kind of odd that just a few months after writing about giving my virtualization server a 2TB NVMe drive that I’m now writing about it again. And this time, it’s a platform upgrade. So what gives?

With taking pretty much everything else on my home network to X99 I decided to fast-track an upgrade on my virtualization server as well.

In terms of performance, I’ve tended to lean on the side of more cores and threads over single-threaded performance. Given the VMs I typically had running at any given time, there wasn’t much point in going for single-thread performance over thread count. With this X99 upgrade, though, I’m getting both more threads and better single-thread performance.

My first dedicated virtualization server was a refurbished HP Z600 with dual Xeon E5520 processors. This provided, overall, 8 cores and 16 threads. It had 3 memory slots per processor that could take up to 48 GB RAM max. It’s now completely retired, and I’ll be figuring out what to do with it later.

About 5 years ago I replaced that with the dual-Opteron 6278. This gave me double the threads – 32 overall, 16 per processor – and a lot more memory capacity. The mainboard I chose could take 16GB RDIMMs or 8GB UDIMMs, maxing out 256GB or 128GB, respectively. As of this writing, I had 64GB (8x8GB) Registered ECC.

“Cordelia” is the name I gave this server after migrating it to Fedora Server with the NVMe installation to run VirtualBox and Docker CE.

Current specs

So to recap, here are the specifications before the upgrade:

CPU:2 x AMD Opteron 6278 – 16 cores, 16 threads each
CPU cooler:Noctua NH-U9DO A3
Mainboard:ASUS KGPE-D16
RAM:64GB (8x8GB) Registered ECC DDR3-1600
Storage:500GB Samsung 850 EVO M.2 SATA
Inland QN322 2TB QLC NVMe
OS:Fedora Linux with Docker CE and VirtualBox

Onward to X99

CPU:Intel Xeon E5-2697 v4 – 18 cores, 36 threads
CPU cooler:ThermalTake TH120
Mainboard:Machinist MR9S (buy on eBay)
RAM:256GB (8x32GB) Registered ECC DDR4-2400
GPU:Zotac GTX 1060 3GB

So DDR3-1600 to DDR4-2400. Dual CPU to single CPU with slightly more threads overall. Slightly lower clocks on the Xeon, but far newer platform. PCI-E 3.0. A lot more memory. And quad-channel!

Dual-CPU to single-CPU eliminates the NUMA node barrier and also reduces the server’s overall power consumption (dual 115W TDP vs single 140W TDP) – though adding in the GTX 1060 kind of offsets that.

Speaking of, while I am giving up onboard video for a dedicated video card, I’m actually not giving up much. Th eonboard video for the ASUS dual-Opteron board has only 8MB of VRAM. No, I didn’t make a typo. Only 8 megabytes. It works fine for a text console. Don’t try to use it for anything even remotely graphically intense.

I did consider the E5-2699 v4 (buy on eBay), which is 22-cores, 44 threads. But also about 3x the price on eBay. For just 4 cores and 8 threads more. I paid just 85 USD for the E5-2697 v4. And at the time I bought it, the E5-2699 v4 was going for 250 USD minimum. So no thanks.

An interesting addition to this server, though, is a Google Coral AI PCI-E module, which allowed me to migrate my home camera monitoring to Frigate. Which can do object detection instead of merely detecting motion. Which should vastly reduce how many false positives I get. While the Google Coral module isn’t required for Frigate, it’s highly, highly recommended. And to further aid Frigate’s functionality with hardware video decoding/encoding, I added a GTX 1060 I had laying around rather than just any graphics card.

I also had to change this over from Fedora to Ubuntu.

Fedora 38 was newly released when I first installed it. So new that Docker hadn’t been released for it. And was only released on April 26. So while I considered going with Fedora 37, which is what I was using prior to the migration, with the plan to eventually in-place upgrade it back to Fedora 38, I opted to install Ubuntu 22.04 LTS instead to get everything up and running sooner.

About the AIO and Micro Center’s error

Before the ThermalTake AIO, I had an NZXT M22 mounted to it. But the pump started making noise – likely due to coolant evaporation – and I needed to replace it. It was also… a week out of warranty, so I can’t RMA it.

So I went looking for a more-or-less direct replacement.

Micro Center had two options in stock: the ThermalTake MH120 and the CoolerMaster MasterLiquid ML120L. Both were listed on Micro Center’s website as supporting Intel 2011, 2011v3, and 2066 sockets. So I picked the MH120 since it was a little less expensive.

Only to discover when getting it home that there was no 2011v3 hardware included. And ThermalTake’s website does NOT list 2011v3 as one of the supported sockets.

But I was able to use the 2011v3 hardware from the NZXT M22 to mount this. And all indications are that it works fine. So the TH120 can support 2011v3. ThermalTake just is not including hardware for it. And the CoolerMaster cooler, though, does support 2011v3 out of the box according to their website.

And I went with the M22 initially as I just had it lying around unused. I didn’t have anything else readily available for 2011v3 that would fit into a 4U chassis. It was only a couple days into service that it started making noise.

Hands-off script for installing Apache Guacamole for Docker

So what’s different with this over other methods of setting up Apache Guacamole?

The main thing is it’s entirely hands-off. It’ll pull the images, set up the network, create the containers, initialize the MySQL database… Everything. Including generating secure random passwords for you using Random.org and writing those to the console for you to store off for later updates. (See sections below.) Just copy the script to a .sh file and run it.

And speaking of later updates, the script sets up the containers on their own network with static IPs assigned to each over using the “link” command. This allows for very easy updates down the line since the containers – especially the MySQL container – can be recreated onto the same IP address as before.

Change what you need to avoid conflicts with any existing Docker networks or if you want the main Guacamole container to be accessible on a different port. Hopefully you won’t need to extend out the 30-second wait for the MySQL container to initialize. Bear in mind as well that the gaucd container takes a few minutes to fully start up and its status to be “Healthy”.

Once everything is running, the default admin login (as of this writing) for the Guacamole web interface is guacadmin/guacadmin.

#!/bin/bash

echo Pulling latest Docker images.

sudo docker pull guacamole/guacamole
sudo docker pull guacamole/guacd
sudo docker pull mysql

echo Creating volumes for MySQL data

sudo docker volume create guac-mysql-data

echo Creating network the containers will use.

sudo docker network create \
--subnet=192.168.10.0/24 \
--gateway=192.168.10.1 \
guacamole-net

echo Contacting Random.org for new 16-character passwords for MySQL root and Guacamole users.

root_secure_password=$(curl -s "https://www.random.org/strings/?num=1&len=16&digits=on&upperalpha=on&loweralpha=on&unique=on&format=plain&rnd=new")
guac_secure_password=$(curl -s "https://www.random.org/strings/?num=1&len=16&digits=on&upperalpha=on&loweralpha=on&unique=on&format=plain&rnd=new")

sql_create="\
ALTER USER 'root'@'localhost' \
IDENTIFIED BY '$root_secure_password'; \
CREATE DATABASE guacamole_db; \
CREATE USER 'guacamole_user'@'%' \
IDENTIFIED BY '$guac_secure_password'; \
GRANT SELECT,INSERT,UPDATE,DELETE \
ON guacamole_db.* \
TO 'guacamole_user'@'%'; \
FLUSH PRIVILEGES;"

echo Creating MySQL container

sudo docker run -d \
--name guac-mysql \
-e MYSQL_ROOT_PASSWORD=$root_secure_password \
-v guac-mysql-data:/var/lib/mysql \
--network guacamole-net \
--ip 192.168.10.2 \
--restart unless-stopped \
mysql

echo Let\'s wait about 30 seconds for MySQL to completely start up before continuing.
sleep 30

echo Initializing MySQL database

sudo docker exec guac-mysql \
mysql --user=root --password=$root_secure_password -e "$sql_create"

sudo docker exec guac-mysql \
mysql --user=root --password=$root_secure_password \
--database=guacamole_db \
-e "$(sudo docker run --rm guacamole/guacamole /opt/guacamole/bin/initdb.sh --mysql)"

echo Creating guacd container

sudo docker run -d \
--name guacd \
--network guacamole-net \
--ip 192.168.10.3 \
--restart unless-stopped \
guacamole/guacd

echo Creating main Guacamole container

sudo docker run -d \
--name guacamole \
--network guacamole-net \
--ip 192.168.10.4 \
--restart unless-stopped \
-e GUACD_HOSTNAME=192.168.10.3 \
-e MYSQL_HOSTNAME=192.168.10.2 \
-e MYSQL_DATABASE=guacamole_db \
-e MYSQL_USER=guacamole_user \
-e MYSQL_PASSWORD=$guac_secure_password \
-p 8080:8080 \
guacamole/guacamole

echo Done.

echo MySQL root password: $root_secure_password
echo MySQL guacamole_user password: $guac_secure_password

echo Store off these passwords as they will be needed for later container updates.

Update Guacamole containers

Just copy off this script and keep it on your server to update the container with the latest Guacamole images.

#!/bin/bash

read -s -p "MySQL Guacamole user password: " guac_secure_password
echo

sudo docker pull mysql
sudo docker pull guacamole/guacamole
sudo docker pull guacamole/guacd

sudo docker stop guacamole
sudo docker stop guacd
sudo docker stop guac-mysql

sudo docker rm guac-mysql
sudo docker rm guacd
sudo docker rm guacamole

sudo docker run -d \
--name guac-mysql \
-v guac-mysql-data:/var/lib/mysql \
--network guacamole-net \
--ip 192.168.10.2 \
--restart unless-stopped \
mysql

sudo docker run -d \
--name guacd \
--network guacamole-net \
--ip 192.168.10.3 \
--restart unless-stopped \
guacamole/guacd

sudo docker run -d \
--name guacamole \
--network guacamole-net \
--ip 192.168.10.4 \
--restart unless-stopped \
-e GUACD_HOSTNAME=192.168.10.3 \
-e MYSQL_HOSTNAME=192.168.10.2 \
-e MYSQL_DATABASE=guacamole_db \
-e MYSQL_USER=guacamole_user \
-e MYSQL_PASSWORD=$guac_secure_password \
-p 8080:8080 \
guacamole/guacamole

Goodbye, MikroTik

Build Log:

Four (4) years ago, I bought the MikroTik CRS317 after seeing it retailing for… around 300 USD. And it’s a great switch, so long as you use it as a switch. Later I also acquired a MikroTik CSS610. And I recently replaced both for switches from the TP-Link lineup.

A couple details pushed me to do this, but it mostly stems from the upgrade to Google Fiber’s 5Gb Internet service. Other choices made here were also about consolidating. Prior to this changeover, I had three switches in my network rack:

And I consolidated to two switches, both from TP-Link and manageable via the Omada Controller software:

TP-Link TL-SG3210XHP-M2

This switch replaced two in my network rack: the MikroTik CSS610 and BV-Tech POE-SW801. The latter is an 8-port 10/100 POE switch with a 100Mbps uplink. I bought it to support the security cameras I have, but never used all the ports on it.

The TP-Link TL-SG3210XHP-M2 has eight (8) 2.5GbE ports, all of which are Active POE+ enabled. This allows me to consolidate my security cameras and the TP-Link EAP670 wireless access point. Previously I had the EAP670 connected to the CRS317 via a TP-Link 10GbE RJ45 module and powered using its included DC power supply.

And being Active POE+ allows me to consolidate the couple Gigabit connections from the CSS610, putting me in a position to upgrade those connections to 2.5GbE. Since 2.5GbE runs across Cat5E without issue, it’s a drop-in upgrade.

One has already been upgraded as of this writing, that being the connection to my work laptop. My wife’s laptop is the other connection that will be upgraded.

I’ve considered swapping out the Gigabit switch on my living room rack for a 2.5GbE switch, but what’s connected to it isn’t really making me all that enthused about doing it. For the curious, all that’s connected is the uninterruptible power supply, mail server, and IP-KVM. And the only one that’s even slightly bandwidth intense is the latter, but it doesn’t saturate a GbE connection.

So this switch, though, means the wireless access point is powered from the TP-Link switch, removing a connection on the CRS317. My security cameras are as well, and if I expand my security camera setup more, I’ll add one of TP-Link’s Gigabit POE+ switches to the rack.

I’ve already swapped the stock fans with Noctua NF-A4x20 FLX fans as well. Omada is reporting a fan fault with it, but that’s merely due to the RPM of the Noctua fans being far lower compared to the stock fans. But that also means it’s practically inaudible from my office.

Note: I’m aware of a lot of negative reviews on this switch that indicate an… oddly short lifespan. And I’ll definitely be posting an update if this switch dies sooner rather than later. (Given the fan swap, I doubt TP-Link will honor the warranty.)

TP-Link TL-SX3008F

The CRS317 has 16 SFP+ ports. This is overkill for my home network. At the time I bought it, though, my only other option for reasonably-priced 10GbE was the MikroTik CRS305, which has only 4 SFP+ ports plus a Gigabit uplink port.

MikroTik wouldn’t introduce the 8-port CRS309 till later in 2019. And there really was no point in changing over to it at the time. After adding the aforementioned 2.5GbE switch to my rack, removing the connection for the EAP670, only six (6) ports were being used:

  1. Mira
  2. Amethyst
  3. Nasira
  4. Virtualization server
  5. Uplink from TL-SG3210XHP-M2
  6. Uplink to router

But most of the ports sitting dormant isn’t my reason for changing this out. Performance is the main reason here. The issue is either the 98DX8216 controller, or it is MikroTik’s SwitchOS.

Either way, my 5Gb Internet connection revealed the MikroTik’s limitations when it comes to its switching functions. After putting the TP-Link switch into service, I was finally able to get 5Gb from my desktop with other clients also connected to the switch.

And other random speed tests I’ve done since taking that screenshot have produced similar results. Even one taken in the middle of a Saturday afternoon.

So the MikroTik’s switching capability is a massive bottleneck. I could easily do 10Gb to Nasira, such as when I’m syncing my camera card dump folder (especially after the platform upgrade), but I’m typically the only one accessing it at any given time. But the speed test screenshot above shows that it chokes off when multiple clients are trying to tunnel through a single port – such as the one linking up to the router.

Now don’t get me wrong. The MikroTik CRS317 is a decent switch. And it was an inexpensive way to get 10GbE in a quiet package – especially if you change out the fans.

Its initial MSRP was about 400 USD, but you could easily find it for less. But MikroTik bumped that to 500 USD, with a lot of sellers making that the shelf price, making it difficult to recommend this switch when there are better options available at the same price point. Just as an immediate example, TP-Link has their own fanless 16-port 10GbE SFP+ switch for about 500 USD (as of this writing) – the TL-SX3016F – that, if similar to the TL-SX3008F, is likely to also perform much better.

There may be tweaks you can make in RouterOS – configuring it in “bridge mode” – to allow it to perform better. But the TP-Link switch is performing better out-of-the-box.

MikroTik started the trend of bringing 10GbE to the home lab in an inexpensive package that was also very quiet. Both with the CRS305 and CRS317. But competition at MikroTik’s price point revealed its weaknesses. To stay competitive, MikroTik should consider releasing a new switch to replace the CRS317 that performs much better.

“Flesh Cult of Carnism”

“Flesh Cult of Carnism”? Tell me you’ve lost sight of reality without telling me you’ve lost sight of reality.

In all seriousness, manufacturing phrases like that means you really need to take a few miles worth of steps back and re-evaluate your psychological standing. This shows you’re so deep in your ideology, the Mariana’s Trench is like a crack in the pavement. Being vegan is one thing. But manufacturing phrases like this and posting propaganda like this to the Internet is more about re-justifying to yourself the choice to be vegan and shows a massive choice-supportive bias that is well beyond the point of delusion.

Get help.

It’s final (for now) form

Build Log:

New specs:

The chassis and power supply were the previous of each in Mira before I upgraded to the beQuiet Dark Base 900 and EVGA 1000 G6. The former was to get more space for HDDs. The latter was because I upgraded to the RTX 3070 and needed additional PCI-E connectors but couldn’t find my original cable kit. The power supply is overkill for this use case, but at least it’s being put to use again.

I chose the Machinist mainboard after learning about it through the Craft Computing YouTube channel. Specifically the below video, which also made my decision on the CPU.

The mainboard supports everything you’d expect from an X99 mainboard, including “Above 4G decoding”. It does NOT support bifurcation, at least that I could find, but you might be able to mod the BIOS to include that – at your own risk, of course. So I’m not going to be buying another one of these for my NAS unless I need more than 64GB RAM for some reason.

But I will eventually buy one for upgrading my virtualization machine later this year since it supports up to 256GB RAM (8x32GB) and the E5-2699v4 (22 cores/44 threads), which will make for one hell of a home lab. I might even consolidate the NAS and virtualization together to one box. Not using TrueNAS SCALE’s virtualization, but putting TrueNAS in a VM, which would require going back to Proxmox since VirtualBox does not support PCI passthrough with version 7.

As mentioned previously, using ECC RAM isn’t required here, but it’ll help merely because of how much bandwidth will be going through this. Plus the price (at the time I bought it) was only 10 USD per stick on eBay brand new. So… why NOT use it?

So for now, this is the form this router will take. It’ll be interesting to see how long this will last, and I anticipate hopefully not needing to do any hardware changes on this unless there is some incompatibility with OPNsense. Which shouldn’t happen unless the FreeBSD developers go off their rocker and start removing support for older hardware from their operating system.

* * * * *

Update (2023-03-31):

Consider this kind of a lateral move. I pulled the MR9S board for the Machinist MR9A Pro MAX (buy it on eBay). It’s a slightly smaller motherboard with only four (4) DDR4 slots instead of 8, but still able to operate in quad-channel. The MR9S will be going into my virtualization server, so keep an eye out for that. (Update: Check that out here.)

The road to 5Gb

Build Log:

In the mention of my custom router, I talked about how Google Fiber would soon be introducing 5Gb and 8Gb service. Recently I was upgraded to the 5Gb service and… let’s just say it isn’t what I expected.

I had a feeling this would be the outcome as well.

The service overall felt snappier compared to the 2Gb service. But maintaining line speeds above 2Gb or 3Gb was proving difficult, even during off-peak hours. Running a speed test from the router using the SpeedTest CLI demonstrated this. Upload speeds would have no issue breaking 4Gb or even 5Gb, but download speeds would typically max out at 3Gb.

So what gives? In short, it’s the router itself. I just don’t think the APU can keep up with the demand. It has no issue keeping up with 2Gb service or less. But beyond 2Gb it becomes inconsistent.

But I’m not about to switch back to using Google’s router. For one that would require adding back in the 10GbE RJ45 SFP+ module, which runs hot, and the active cooling to go with it. Or using a media converter.

So instead, I need to upgrade my custom router. The big question is what platform to jump to: 990FX or X99? Now reading that question, you’re probably already shouting “How is that even up for debate?”

Current specs

Before going too far, here’s what I’m starting with.

CPU:AMD A8-7600 APU with Noctua NH-D9L
Mainboard:Gigabyte GA-F2A88X-D3HP
RAM:16GB DDR3-1600
PSU:EVGA 650 G2
Storage:Inland Professional 128GB 2.5″ SATA SSD
WAN NIC:10Gtek X540-10G-1T-X8 10GbE RJ45
LAN NIC:Mellanox ConnectX-2 10GbE SFP+
Chassis:Silverstone GD09
Operating system:OPNsense (with latest updates as of this writing)

Which path forward?

In a previous article about doing a platform upgrade on Nasira, I mentioned I have a 990FXA-UD3 mainboard from Gigabyte. Talking specifically about how it does its PCI-E lane assignments before revealing, ultimately, that I went with a spare X99 board for Nasira due to memory prices. And that gave the benefit of PCI-E 3.0 as well, which was important for the NVMe drive I was using as an SLOG.

For a router, PCI-Express 3.0 isn’t nearly as important so long as you pay attention to lane assignments. Though for a gigabit router, even that doesn’t matter much. Both cards were on at least 4-lanes and running at their full speeds – 5.0GT/s for both.

So if lanes aren’t the problem, that leaves the memory or processor. And there isn’t much benefit to bumping from DDR3-1600 to DDR3-1866 for this use case. The memory just isn’t going to make that much of a difference here since the memory already provides more than enough bandwidth to handle this use case.

So that leaves the processor.

990FX with FX-8320E

Compared to even an FX-8320E, the AMD A8-7600 APU is underpowered. The onboard GPU is the only benefit in this use case. The FX-8320E doesn’t provide much of a bump on clocks, starting out at 3.2GHz but boosting to 4GHz. Performance metrics put the FX-8320E as the better CPU by a significant margin. The FX-8350 would be better still, but not by much over the FX-8320E.

So while it’s the better CPU and platform on paper compared to the APU and the A88X chipset, is it enough to serve as a 5Gb router?

Well I didn’t try that. I decided to jump to the other spare X99 board instead.

Or, rather, the X99 with i7-5820k

So again you were probably asking why I was even considering the 990FX to begin with? And it’s simply because I had one lying around not being used. Specifically the Sabertooth 990FX from Nasira still assembled with its 32GB DDR3-1600 ECC, FX-8350, and 92mm Noctua cooler. And I actually have a few 990FX boards not being used.

But I also had the Sabertooth X99 board that was in Mira still mostly assembled. It hadn’t been used in a while and just never torn down, so it was relatively easy to migrate for this.

So why the leap to the X99 over the 990FX? In short, it’s the specifications for the official pfSense and OPNsense appliances.

The Netgate 1537 and 1541 on the pfSense front are built using the Xeon-D D-1537 and D-1541, respectively, which are 8-core/16-thread processors, and DDR4 RAM. Both are rated for over 18Gb throughput.

And OPNsense’s appliances use either quad-core or better AMD Ryzen Embedded or Epyc processors. The DEC740 uses a 4-core/8-thread Ryzen with only 4GB DDR4, while the slightly better DEC750 doubles the RAM. Both are rated for 10Gb throughput.

But their DEC695 has a 4-core/4-thread AMD G-series processor and DDR3 RAM, and is rated for only 3.3Gb of throughput. Hmm… that sounds very familiar…

Quad-channel memory is where the X99 platform wins out, compared to dual-channel support for the aforementioned Ryzen and Xeon CPUs. But to get started, I ran with dual-channel since two sticks of DDR4-3200 is all I had available at the moment. If everything worked out, that would be replaced with 4x4GB for quad-channel RAM and a Xeon E5-2667 v4, which should yield overkill performance.

Tell someone this is your router, and they likely won’t believe you.

Here’s the temporary specs:

CPU:Intel i7-5820k with NZXT Kraken M22
Mainboard:ASUS Sabertooth X99
Memory:16GB (2x8GB) DDR4-3200 running at XMP

Side note: I was able to move the SSD onto the new platform without having to reinstall OPNsense. It booted without issue. I still backed up the configuration before starting just. in. case.

So was this able to more consistently sustain 5Gb? Oh yeah!

One rather odd thing I noticed with the speed test, both on the old and new router setups: when trying to speed test against Google Fiber’s server, it capped out at 2Gb. But in talking to the Misaka Network server, shown in the screenshot, it now consistently gets 5Gb at the router.

Note: The command-line tool allows you to specify a server to test against. So going forward with my speed testing from the router, I’ll need to remember that.

With the AMD APU, it wasn’t getting close. And the FX-8320E or FX-8350 on the 990FX probably would’ve done better, but clearly it was best that I jumped right to the X99 board.

So what does this mean going forward?

Road forward

So with the outstanding test results, this will be getting a few hardware changes.

The CPU and memory are the major ones, and the mainboard will also get changed out. Something about either the processor or mainboard isn’t working right, and none of the memory slots to the left of the CPU are working – as the image above shows. This tells me it’s the mainboard, the CPU socket specifically (e.g. bent pins), but it could be the CPU as well.

Either way it means I can’t run quad-channel memory. And while the above speed test shows that quad-channel memory isn’t needed, I’d still rather have it, honestly.

But I have an X99 mainboard and Xeon processor on the way which will become the new router. Quad-channel memory is the more important detail here since Xeons do not support XMP. That does mean saving money on the memory, though, since DDR4-2400 is less expensive.

The Xeon on the way is the aforementioned E5-2667 v4. That’s a 40-lane CPU with 8-cores and 16-threads. Definitely overkill and I’m not going to see any performance improvement compared to the i7-5820k. As mentioned, it does not support XMP, so the fastest RAM I’ll be able to run is DDR4-2400. But in quad-channel.

The Xeon does also allow me to use ECC RAM, and the mainboard that is on the way supports it. While the router chugs along perfectly fine with non-ECC RAM, ECC is just going to be better given the much higher bandwidth this router needs to support.

Throwing a short pass

Build Log:

In the previous iteration, I mentioned my intent to add more NVMe drives to Nasira. Right now there is only one that is being used as an SLOG, which I’m debating on removing. But the desire to add more is so I can create a metadata vdev.

Unfortunately doing that with the Sabertooth 990FX mainboard currently in Nasira is going to be more trouble than it’s worth. So to find something easier to work with, I considered ordering in a Gigabyte GA-990FXA-UD5 through eBay. But I realized I had a GA-990FXA-UD3 lying around unused. So I did some research into whether that would suit my needs.

And it looks like it will.

What’s the issue?

First, let’s discuss what’s going on here.

With the AMD FX processors, the chipset controlled the PCI-E lanes, not the CPU. This was a significant difference between AMD and Intel at the time. Though the CPU now controls the PCI-E lanes and lane counts with Ryzen.

And the 990FX chipset has 42 PCI-E lanes. This surpasses the lane count available on any Intel desktop processor at the time. The Intel i7-5960X had 40 lanes. Only Intel’s Xeon surpassed it, and only if you used more than one of them.

How they were divvied up between slots was up to the motherboard manufacturers, but generally every 990FX board gave you two (2) x16 slots so you could use Crossfire or SLI. What you could run and at what speed it ran depended heavily on the mainboard, since the mainboard determined lane assignments to slots. I’ve previously discussed how the Sabertooth 990FX assigns PCI-E lanes, showing the counter-intuitive chart from the user manual, so now let’s look at the Gigabyte lineup.

Gigabyte released three 990FX board models (with several revisions thereto) as part of their “Ultra Durable” lineup: the GA-990FXA-UD3, -UD5, and -UD7. And each has different lane assignments. The -UD7 is easily the most flexible, guaranteeing four (4) full-length slots at x8 or better. The UD5 guaranteed three (3) slots at x8 or better.

The -UD3 is a little different. That board also has 6 PCI-E slots: 2 x16, 2 x4, and 2 x1. And unlike the -UD5 and -UD7, the -UD3 does not share lanes between any of the slots or onboard features. Each slot has its own dedicated lanes. What you see is what you get. Or, at least, that is what the specifications heavily imply.

Why does this matter?

Obviously lane counts matter when you’re talking about high-bandwidth devices. You shouldn’t just randomly insert cards into slots without paying attention to how many lanes it’ll receive.

While any PCI-E device can operate on as little as just one lane – something anyone familiar with crypto-mining can attest – you definitely want to give bandwidth-critical devices all the lanes they require. SAS cards. 10GbE NICs. NVMe SSDs. You know, the hardware in Nasira.

So when the NVMe SSD I installed as an SLOG reported up that it had only a x1 link, I needed to swap slots to get it running at a full x4. The Sabertooth 990FX divvies up its PCI-E lanes in a very counter-intuitive way, leading me to believe the NVMe drive would have its needed 4 lanes in the furthest-out slot where I wanted to run it. And it turned out that wasn’t the case.

Had I swapped out the board sooner for the -UD3 I have on hand (it wasn’t available when I initially built Nasira), I wouldn’t have run into that issue.

That this was all on a 990FX mainboard is immaterial. Indeed the issue is more acute on many Intel mainboards unless you’re running one of the Extreme-edition processors or a Xeon due to PCI-E lane count limitations.

And many mainboards have a mix of PCI-E versions, so you need to pay attention to that as well to avoid, for example, a PCI-E 3.0 card being choked off by PCI-E 2.0 speeds. This is why many older 10GbE NICs are PCI-E 2.0×8 cards. PCI-E 2.0×4 has just enough bandwidth for two (2) 10GbE ports, but 1.0×8 really has enough bandwidth for only one (1). While PCI-E 1.0×8 should, on paper, allow for dual 10GbE ports, in practice you won’t see that saturated on such PCI-E 1.0 mainboards.

And 3.0 x4 10GbE NICs, such as the Mellanox ConnectX-3 MCX311A, will run fine in a 2.0 x4 slot – such as the slots in my virtualization server and the X470 mainboard in Mira. And I think it’s only a matter of time before we see PCI-E 4.0×1 10GbE NICs, though they’ll more likely be PCI-E 4.0×2 or x4 cards to allow them to be used in 3.0 slots.

Thermals is the other consideration. You typically want breathing room around your cards for heat to dissipate and fans to work. SAS cards can run hot, so much so that I wanted to add a fan to the one in Nasira after realizing how to add one to the 10GbE NICs in my OPNsense router. And even for 10mm fans, I need at least one slot space available to give room for the fan and airflow.

So with all of that in mind, I swapped out the Sabertooth 990FX board for the ASUS X99-PRO/USB 3.1.

Wait, hang on a sec…

So after initially jettisoning the idea of a platform upgrade, why am I doing a platform upgrade? In short… memory prices right now. I was able to grab 64GB of DDR4-3200 RAM from Micro Center for about 200 USD (plus tax) – about 48 USD for each 2x8GB kit. Double the memory, plus quad-channel.

And PCI-E 3.0. That was the detail that pushed me to upgrade after looking at the PCI-E lane assignments with the 5820k, which is a 28-lane CPU. Fewer lanes compared to the 990FX, but still enough for the planned NVMe upgrade. (4 lanes to the 10GbE NIC, 8 to the SAS card, 16 to the NVMe carrier card.) While upgrading to the 5960X is an option to get more PCI-E lanes – they’re going for around 50USD on eBay as of when I write this – it isn’t something I anticipate needing unless I upgrade the SAS card.

It’s also kind of poetic that it’s my wife’s X99 mainboard and i7-5820k that will be the platform upgrade for Nasira. Since acquiring that board and processor freed up her Sabertooth 990FX and FX-8350 to build Nasira in the first place.

Performance

So how does the new platform perform compared to the old? Well this probably speaks for itself:

That is a multi-threaded robocopy of picture files from a WD SN750 1TB to one of the Samba shares on Nasira. That’s the first time I’ve ever seen near. full. 10GbE. saturation. That transfer rate is 1,025,054,911 bytes per second, which is about 997.6 megabytes per second. I never saw anything near that with the Sabertooth 990FX. Sure I got somewhat better performance after adding the SLOG, but it’s clear the platform was holding it back.

More and faster memory. Faster processor. PCI-E 3.0.

But… ECC….!!!

Hopefully by now the religious zealotry and doomsday catastrophizing around not using ECC with ZFS has died down. Or does it persist because everyone is copying and pasting the same posts from 2013? It seems a lot of people got a particular idea in their heads and just ran with it merely because it made them sound superior.

The move to the 5820k does mean moving to non-ECC RAM. And no, there isn’t nearly the risk to my pool that people think… I went with ECC initially merely because the price at the time wasn’t significantly more expensive than non-ECC, and the mainboard/processor combination I was using supported it.

And when I wrote the initial article introducing Nasira, I said to use ECC if you can. Here, though, I cannot. The X99 board in question doesn’t support ECC, and neither does the processor. And getting both plus the ECC DDR4 is not cheap. It’d require an X99 mainboard that supports it, plus a Xeon processor. Probably two Xeons depending on PCI-E lane counts and assignments. And as of when I write this, the memory alone would be over 50 USD per 8GB stick, whereas, again, the memory I acquired was under 50 USD per pair of 8GB sticks.

But, again, by now the risk of using non-ECC with ZFS has likely been demonstrated to have been well and truly overblown. Even Matt Ahrens, one of the initial devs behind the ZFS filesystem, said plainly there is nothing about ZFS that requires ECC RAM. So I’m not worried.

And if your response to this is along the lines of, “Don’t come crying when your pool is corrupted!”, kindly fuck off.

Because let’s be honest here for a moment, shall we? It’s been 7 years since I built Nasira. In that time, there have probably been thousands of others who’ve taken up a home NAS project using FreeNAS/TrueNAS and ZFS. With a lot of those likely also using non-ECC simply to avoid the expense needed to get a platform that supports ECC RAM along with the memory itself. A lot of them likely followed a similar story to how I first built out Nasira: platform upgrade that freed up a mainboad/processor, so decided to put it to use. Meaning desktop or gaming mainboard, desktop processor or APU, and non-ECC DDR3 or DDR4.

Now presuming a small percentage of those systems suffered pool corruption or failures, how many of those could be legitimately attributed to being purely because of non-ECC RAM with no other cause?

In all likelihood – and let’s, again, be completely honest here – it’s NEXT. TO. NONE. OF. THEM.

And with Nasira, if anything is going to cause data corruption, it’s likely to be the drive cables, power cables, or the 10+ year-old power supply frying something when it gives up the ghost. Which is why I’m looking to replace it later this year for the same reason as the other pair of 4TB hard drives: age.

Again, use quality parts. Use a UPS. Back up the critical stuff, preferably offsite.

Now that’s not to say there is no downside to not using ECC, as there is one: you’ll get quite a lot of checksum errors during scrubs.

Current specs and upgrade path

So with the upgrade, here are the current specifications.

CPU: Intel i7-5820k with Noctua NH-D9DX i4 3U cooler
RAM: 64GB (8x8GB) G-Skill Ripjaws V DDR4-3200 (running at XMP)
Mainboard: ASUS X99-PRO/USB 3.1
Power: Corsair CX750M green label
Boot drive: ADATA ISSS314 32GB
SLOG: HP EX900 Pro 256GB
HBA: LSI 9201-16i with Noctua NF-A4x10 FLX attached
NIC: Mellanox ConnectX-3 MCX311A-XCAT with 10GBASE-SR module

The vdevs are six (6) mirrored pairs totaling about 54TB.

Soon I will be adding a metadata vdev, which will be two NVMe mirrored drives on, likely, a Sonnet Fusion M.2 4×4 carrier card. The SLOG will be moved to this card as well. That card doesn’t require PCI-E bifurcation, unlike other NVMe expansion cards like the ASUS Hyper M.2 x16 and similar cards, since it uses a PLX chip. But that’s why the Sonnet Fusion card is also more expensive. (X99 mainboards almost always require a modded BIOS to support bifurcation.)

There’s also the SuperMicro AOC-SHG3-4M2P carrier card. But that is x8, compared to x16 for the Sonnet Fusion. And the manual says it may require bifurcation whereas, again, the Sonnet Fusion explicitly does not.

There are off-brand cards as well. And 10Gtek sells NVMe carrier cards as well that do or do not need bifurcation. Most of what you’ll find is x8, though. 10Gtek has a x16 card, but I can’t find it for sale anywhere. And I may opt for a x8 card over the Sonnet Fusion since overall performance is unlikely to completely saturate the x8 interface under typical use cases. And PCI-E 3.0×8 is far, far more bandwidth than can be saturated with even 10GbE.

So stay tuned for updates.

Pool corruption!

So in the course of this upgrade, I suffered pool corruption. Talk about bad timing on it as well since it happened pretty much as I was trying to get the new mainboard online with my ZFS pool attached to it. So was it the non-ECC RAM? Have I been wrong this entire time and will now repent to the overlords who proclaim that one must never use non-ECC RAM with ZFS?

Yeah, no.

Initially I thought it was a drive going bad. TrueNAS reported one of the Seagate 10TB drives experienced a hardware malfunction – not just an “unrecoverable read error” or something like that. A lot of read errors and a lot more write errors being reported in the TrueNAS UI. And various error messages were showing on the console screen as well with the drive marked as “FAULTED”.

Thankfully Micro Center had a couple 10TB drives on hand, so I was able to pick up a replacement. Only to find out the drive wasn’t the issue as the new drive showed the exact same errors. The problem? The drive cable harness. If only I’d thought to try that first.

Something about how I was pulling things apart and putting them back together damaged the cable. And that it affected only one of the drives on the harness was the confusing bit. I’m sure most seeing what I observed would’ve thought the same, that the drive was going instead of the cable harness.

Unfortunately the back and forth of trying to figure that out resulted in data corruption errors on the pool, but thankfully to files that I could rebuild or re-download from external sources or restore from a backup. An automatic second resilver on the drive, which started immediately after the first finished, saved me from needing to do that and corrected the data corruption issue. At the cost of another 16 hour wait to copy about 8TB of data, about the typical 2 hours per TB I’ve seen from 7200RPM drives. (5400RPM drives tend to go at 2.5 hours per TB.)

So lesson learned: if TrueNAS starts reporting all kinds of weird drive errors out of the blue, replace the drive cable harness first and see if that solves the problem.

On the plus side, I have a spare 10TB drive that I thought was dead. But it came at a cost I wouldn’t have had to spend if I was a bit more diligent in my troubleshooting. Again, lesson learned.

Since the resilver finished, the pool has been working just fine. Better, actually, than when it was attached to the AMD FX, though the cooling fan on the SAS card is probably helping there, too.

Coming full circle

Build Log:

When I first built Nasira almost 7 years ago, I knew the day would come when the first pair of 4TB hard drives would be pulled and replaced. Whether due to failure or wanting to evict them for larger capacity drives. In late 2021 I wrote about needing to replace one of the second pair of 4TB drives due to a drive failure.

Now it’s for needing more storage space. First, here are the current specifications:

CPU: AMD FX-8350 with Noctua NH-D9L
Memory: 4x8GB Crucial DDR3-1600 ECC
Mainboard: ASUS Sabertooth 990FX R2.0
Chassis: Rosewill RSV-L4500 with three 4-HDD hot-swap bays
Power: Corsair CX750M (green label)
OS: TrueNAS SCALE 22.12
Storage: 2x 16 TB, 2x 4 TB, 4x 6 TB, 2x 10TB, 2x 12 TB

Somehow, despite its bad reputation, the Corsair CX750M green label I bought back in 2013 is still chugging along with no signs of failure. Yet. But it’s connected to a pure sine wave UPS and running under a modest load at best, so that “yet” is likely a ways off.

Due to our ever-expanding collection of movies and television shows – of which Game of Thrones on 4K was the latest acquisition, at around 300GB per season – plus the push to upgrade our 1080p movies to 4K releases, where available, we were fast running out of room. Plus my photography really took off last year, so I had a lot more RAW photo files than in previous years.

All of that adds up to terabytes of data.

So when I saw that I could get a pair of 16TB drives for 500 USD – yes, you read that right – I just couldn’t pass them up. A single 16TB drive for less than I paid for a pair of 4TB drives 7 years ago.

So out with the old, and in with the new.

Swapping ’em out

Replacing the drives was straightforward using TrueNAS’s user interface. It’s the same process you’ll follow to replace a dead drive. The only difference is you’re doing it for all drives in a vdev. And since my pool is made up of nothing but mirrored pairs, I’m replacing just two drives.

Here’s where having a drive map will come in very handy. I mentioned in my aforementioned article about the drive failure that you should have a chart you can readily reference that shows you which drive bay has which HDD so you eliminate the need to shut down the system to find it. And it’s difficult to overstate how handy that was during this exercise.

The first resilver finished in about 9 hours, 46 minutes, or about 107 MiB/s to copy 3.59 TiB. The second resilver went a little quicker, though, finishing in a little over 6-1/2 hours and running at an average shy of 160 MiB/s. The new drives are Seagate Ironwolf Pro drives, ST16000NE000 specifically, which their data sheet lists as having a max sustained transfer rate of 255 MB/s.

So now the pool has a total raw capacity of 54 TB, effective capacity (as reported by TrueNAS) of 48.66 TiB.

The pool also showed the capacity immediately after the second 4TB drive was replaced and the resilver had just started. If this was a RAID-Zx vdev, it wouldn’t show the newer capacity till the last drive was replaced. This was one of the central arguments for going with mirrored pairs I raised in my initial article.

Replacing more drives

It’s quite likely that later this year I’ll replace the other 4TB pair with another 16B pair. Less for needing space, more because of the age of the drives. That second pair is where one had to be replaced, and the other drive is approaching 7 years old. Sure no signs of dying that I can see, no SMART errors being reported on it, but probably still a good idea to replace it before ZFS starts reporting read errors with it.

And when I replace those, I’ll have a much faster option: removing the mirrored pair from the pool rather than replacing the drives in-place. This will ultimately be much faster since the remove operation will copy all the data off to the other vdevs – meaning it’s only copied once. Then just pop out the old drives and pop in the new ones, as if I was adding more drives to the pool instead of merely replacing existing ones.

Had I realized that option was already there, I would’ve used it instead of relying on rebuilding each disk individually.

And while the option of removing a vdev entirely isn’t available for RAID-Zx vdevs, it’ll likely be coming in a later ZFS update. Removing mirrored vdevs was likely a lot easier to implement and test up front.

Why replace when you can just add?

Let’s take a brief aside to discuss why I’m doing things this way. Replacing an existing pair of drives rather than adding new drives to the pool. There are two reasons.

The main reason is simply that I don’t have any more available drive bays. Adding more drives would require finding an external JBOD enclosure or migrating everything – again! – into another 4U chassis that can support more hot-swap bays. Or pulling out the existing hot-swap enclosures for 5×3 enclosures, which is just kicking the can down the road. Or… any other multitude of things just to get two more drives attached to the pool.

No.

But the secondary reason is the age of the drives that I replaced. The two drives in question had been running near continuously for almost 7 years. They probably still have a lot of life in them, no doubt, especially since they were under very light load when in service, and will be repurposed for lesser-critical functions.

Yes I’m aware that meant getting 12TB additional storage for the price of 16TB, something I pointed out in the article describing moving Nasira to its current chassis. But then if you’ve ever swapped out existing storage for new, you’re also only getting less additional storage compared to what you paid for. Paying for a 2TB SSD to replace a 1TB, for example.

Next steps

I’ve been considering a platform upgrade. Not out of any need for performance, but merely to get higher memory capacities. But ZFS in-memory caching seems to be a lot more under control migrating from TrueNAS Core to SCALE. And the existing platform still works just fine with no signs of giving up the ghost.

But the next step for Nasira is taking advantage of another new ZFS feature: metadata vdevs. And taking full advantage of that will come with another benefit: rebalancing the pool. Since fully taking advantage of it will require moving files around – moving them off and back onto the pool or moving them around.

And special vdevs is a great feature to come to ZFS since it allows for a hybrid SSD/HDD setup, meaning the most frequently-accessed data is now on high-speed storage. Deduplication has the same benefit with a dedup vdev.

Whether you’ll benefit is, of course, dependent on your use case.

In my instance, two of my datasets will benefit heavily from the metadata vdev: music and photos. Now I do need to clean up the photos dataset since I know there are plenty of duplicate files in there. I have a main “card dump” folder along with several smaller folders to where I copy the files specific to a photo shoot. Overall that dataset contains… several tens of thousands of files.

And the music folder is similar. Several hundred folders for individual albums, meaning several thousand tracks. And since my wife and I tend to stream our music selection using a Plex playlist set to randomize, the benefit here is reduced latency jumping between tracks since the metadata will be on higher-speed, lower-latency storage. The TV folder is similar to the music folder in that we have several thousand individual files, but contained in fewer folders.

The movies folder, though, won’t really benefit since it’s only a few hundred files overall.

Really any use case where you have a LOT of files will benefit from a metadata vdev. And it’ll be better than the metadata caching ZFS already does since it won’t require accessing everything first before you see the performance benefit. Nor do you have to worry about that cached data being flushed later and needing to be refreshed from slow disks since you’re supposed to build the special vdev using SSDs.

Now I just need to figure out how to get more NVMe drives onto Nasira’s AMD 990FX mainboard…

Virtualization server gets more storage

An NVMe solid-state drive in a dual-Opteron server… Just ponder that for a moment. Why in the world would anyone do that?

The big reason: storage is cheap. And for 80 USD, a 2TB NVMe solid-state drive is really cheap. And given this is a much older virtualization server, there is no need to go with anything high end.

Specs:

  • CPU: 2x AMD Opteron 6278
  • RAM: 64GB Registered DDR-3 1600Mhz
  • Storage: Samsung 850 EVO M.2 500GB

Recall that back in March 2018, I replaced an older dual-Xeon HP workstation with a dual-Opteron server setup for virtualization. Going away from a system made in the late 2000s to one with hardware from the early 2010s. But in doing that I was doubling the available core count. From a dual quad-core with HyperThreading, so 8 logical cores per processor, to two processors with 16 cores each. Later I upgraded the RAM to 64GB Registered ECC – after I accidentally bought registered sticks for Nasira and couldn’t sell them off.

And in building the system, I wanted to eliminate cables as best as possible. The CPU and ATX power connectors to the mainboard were unavoidable. But if a power or data cable could be avoided, I wanted to avoid it. The fans are powered off the mainboard, the GPU is onboard, so that leaves the storage.

And here, an SSD was the obvious choice. I had a 500GB Samsung 850 EVO I mistakenly bought for my wife’s upgrade to an i7-5820k for a mainboard that wouldn’t support it, and a StarTech M.2 to 2.5″ enclosure to use it in something else. But the enclosure still requires a power and data cable. So how to get around that? Thankfully I was able to buy a PCI-E adapter board that handled the power and data, so no additional cables.

Storage requirements

For most virtualization setups, 500GB is more than enough. My Plex VM sits on 32GB storage and uses about… half of it. (It runs off Fedora Server.) I have an OpenVPN instance on another VM that’s also 32GB and also running off about half of the space. And my only other virtual machine (at this moment, at least) is a mail server sitting on 64GB, but using 1/4th of that.

I’d been planning to upgrade the storage for a while as there are other projects I want to get into. And when I saw Micro Center having a sale on their Inland NVMe SSDs, and saw a 2TB NVMe SSD for only 80 USD, there was no way I could say No.

Alongside that I found an adapter board that could take one each of SATA M.2 and NVMe M.2 on the same board. It does require a SATA cable for the SATA M.2, unlike the previous adapter board, but nothing more. Both drives are powered by the PCI-E slot.

Wait, it works? But… bottleneck!

So did the system even recognize the drive? Well of course it did. And I had no reason to think it wouldn’t.

NVMe SSDs are PCI-Express devices after all, and the PCI-Express specification means that a PCI-Express 3.0 device can be used in a PCI-Express 2.0 slot. I already have that in Nasira, actually, where I’m using an NVMe drive as an SLOG.

But how well does it perform? Better than the SATA drive, I’ve definitely noticed. Plex is a lot snappier and the VMs load much faster. System updates on each VM are faster, too. And that along with the much better capacity was the point of that exercise.

It’s also a QLC drive with a rated top synchronous read speed only just a little higher than what PCI-E 2.0×4 can provide, so it was never going to saturate a PCI-E 3.0×4 connection anyway. And under this use case will never saturate a 2.0×4 connection. But it’s still be far better than a SATA SSD and doesn’t need any cables.

I was after the storage real estate, primarily. That it came in an NVMe SSD that I could install with an interface board and not have to worry about additional cables is the major bonus.

Cooling everything down

10GbE cards can run hot. Very hot, actually. So much so that I’ve actually considered watercooling the one in Mira. But as I discovered building my OPNsense router, the solution is simple: quiet 40mm fan and VHB tape to stick it to the heatsink. Problem solved. You don’t need to use a Noctua fan specifically, as there are plenty of quiet 40mm fans on the market. I just happened to have a Noctua 40mm fan that I wasn’t using for anything.

Goodbye, Proxmox!

As of the time I installed the new NVMe SSD, the server was still running Proxmox 5. And not even the latest minor version of that. Merely upgrading it to the latest 5.x version, let alone installing Proxmox 7 – the latest version as of this article – would require… a lot of work.

The easiest route would be to jettison the VMs and install Proxmox 7 clean. Trying to upgrade in-place would’ve been… “time consuming” wouldn’t adequately explain it. But that would only get me up to the latest version. Keeping it up to date is the greater chore.

Without a support subscription – €190 (€95 per CPU socket) per year for this box for the lowest tier – the only way to get minor version updates to keep Proxmox updated is through the DVD image. Then there’s the continual nagging whenever I log in that I don’t have a subscription:

So… I’m done with it. Just completely done with it.

So back to VMware, then, or what?

Hello, VirtualBox!

I was jettisoning the existing VMs regardless. Plex is easy to migrate, I no longer use the OpenVPN VM since building an OPNsense router, and the mail server was migrated to a physical box.

But for a much smoother and flexible upgrade path going forward, I moved to VirtualBox and Docker. And I went the full headless route, meaning creating and controlling the VMs through the command line. Sure it means creating VMs is a little more of a chore without a script to automate the process. Which is something that’ll be relatively easy to set up since my VMs will usually have pretty similar settings – core count, storage space, or memory will vary as needed. But the upgrade path is a LOT more flexible.

How so?

Ubuntu and Fedora (among others) allow for in-place upgrade to the next major version. My Plex VM, for example, had been getting upgraded in-place (using the dnf-plugin-system-upgrade package) since I first built this virtualization server with a fresh VM for Plex. That was Fedora 27. Didn’t need to touch it till now when I created the new VM with VirtualBox.

And VirtualBox can be upgraded via the official repository or – as is the case already with Plex, unless you enable the repository – manually on my own watch. Docker containers allow similar flexibility. Being able to use Windows Remote Desktop instead of the browser to interact with the VM’s terminal is also a bonus.

Now sure, updates on the bare metal system does mean shutting down all the VMs. But I’d have to do that with Proxmox or any virtualization system anyway.

Building a router

Build Log:

Amazing that it’s been… 6 years (as of this writing) since I decided to pursue 10GbE.

First trying to build a custom switch, then dropping all that when I learned that a lot of retired Quanta 10GbE switches dropped on eBay. Then dropping that switch two years later for the far quieter, lighter, and just better overall MikroTik CRS317. Even ordering it direct from Latvia. And then last year replacing the fans with the far quieter, Noctua NF-A4x20 FLX.

So why am I now talking about building a router?

Google Fiber’s buggy interface

Before Google Fiber, I was with Time Warner Cable (now Spectrum), and I used my own cable modem and router. Never had any issues as a result. With Google Fiber, though, we were given their router box from the outset. As much as I don’t like not being able to use my own hardware, I didn’t really have a choice here. (Or so I thought, actually… Apparently I could’ve used my own router from the outset, but their documentation didn’t make it look that way.)

Google Fiber has changed how their routers are configured a few times. Initially, like most every router out there, you connected to it directly via the IP address. Then they made it so everything is configured by the Google Fiber site. The latter was better, since it allowed you to handle things remotely but still securely, such as enabling or disabling any port forwarding, allowing you to enable/disable it more-or-less on demand from anywhere.

Recently this has become more frustrating and buggy. Port forwarding in particular. Plus I didn’t have nearly as much control over other aspects as I would like.

Thankfully Google Fiber has an account option allowing me to use my own router and put theirs into “bridge mode”. So I did just that and switched over to using the MikroTik CRS317 as the router.

[Insert Nuke’s Top 5 voice-over]: It did not go well.

RouterOS performance

Sure port forwarding was far easier than using Google Fiber’s buggy interface. But performance… fell off a cliff. Instead of getting 2Gb down, I was getting around 500Mb. Something my research told me was largely unavoidable. Both with RouterOS versions 6 and 7.

Hardware is the primary reason. It’s just too underpowered with a dual-core ARM 32-bit processor running at only 800Mhz. That’s more than capable as a 10GbE switch, especially if you’re not loading up all of the ports. (I’m using 7 of 16 as of this writing, one being a link to a MikroTik CSS610.) As a router, though… not so much.

So the solution then is… building my own router using spare hardware I have lying around.

Requirements and Specs

The requirements are simple: gateway between the MikroTik switch and the Google Fiber box while being able to handle 2Gb up, 1Gb down without a problem. So what level of hardware would work?

Linus Tech Tips most recent video about building a router used an old Dell Optiplex 7010 with an Intel i5-3770. And with that being just a Gigabit gateway, the CPU was barely being touched.

And the hardware for the official pfSense appliances is also very lightweight. The Netgate 4100 is the lightest that would still meet my requirements. And it has an Intel Atom C338R 1.8GHz dual-core processor with 4GB RAM and sipping only a few watts of power.

I’m going a little overkill merely because I have this lying around not being used:

CPU:AMD A8-7600 APU with Noctua NH-D9L
Mainboard:Gigabyte GA-F2A88X-D3HP
RAM:16GB DDR3-1600
PSU:EVGA 650 G2
Storage:Inland Professional 128GB 2.5″ SATA SSD
WAN NIC:10Gtek X540-10G-1T-X8 10GbE RJ45
LAN NIC:Mellanox ConnectX-2 10GbE SFP+
Chassis:Silverstone GD09
Operating system:OPNsense (with latest updates as of this writing)

Okay, not all of it I had lying around. The 10Gtek card I needed to acquire, along with replacing the fans in the chassis, but that was it.

Now why a 10GbE card for the WAN link when I only have 2Gb service? So I don’t need to upgrade it later.

Google Fiber is rolling out 5Gb and 8Gb full-duplex service starting early 2023, so I’m already set for either option. I don’t need to swap out any hardware to support it. And with the 10GbE switch as the backbone of my home network with a 10GbE card in mine and my wife’s desktop systems, we’re already well positioned to take full advantage of it.

And if your router needs to handle faster-than-Gigabit traffic to the Internet, pay attention to PCI-E lanes with your mainboard and processor combination, in particular with slot bandwidth when you have certain slots populated to ensure you’re not cutting off bandwidth to your card(s). 2.5GbE NICs should run in a PCI-E 2.0×1 slot without issue. 5GbE and 10GbE cards require additional consideration.

Thankfully the FM2+ board and APU have enough lanes. The PCI-Express slot with the Mellanox card is wired for full x16 while the full-length slot with the 10Gtek card is wired for x4. PCI-E 2.0×4 is more than enough to handle 10GbE.

And to keep the NICs running at peak performance and cooler temperatures while still remaining nearly silent, I used 3M VHB to attach a Noctua 60mm fan to the 10Gtek NIC, and a Noctua 40mm fan to the Mellanox.

And I went with OPNsense due to it running on the newer version of FreeBSD – pfSense still uses FreeBSD 12 as of this writing but will update to version 14 with the next major release, which isn’t slated to release until July 2023.

OPNsense and Mellanox

The Mellanox card wasn’t being used out of the gate. Some searching led me to an obscure article mentioning the solution. I needed to create the file /boot/loader.conf.local with this line, which comes from the FreeBSD documentation:

mlx4en_load="YES"

But that leaves the question of why OPNsense does not have support for Mellanox cards enabled by default. Given how popular Mellanox cards are with DIY and homelab setups, they really need to have that enabled by default in future distributions. TrueNAS has that support by default. And I’m pretty sure pfSense has it, too.

So why did OPNsense not do that?

Router-hosted VPN

I have been relying on OpenVPN for a while. First installing it in a Docker container, then moving to a dedicated virtual machine. Neither was optimal, but it was really the only way I could have a self-hosted VPN.

OPNsense allowed me to move the VPN service to the router, allowing me to jettison one of my VMs. This cuts out the extra steps of the router sending traffic to what is, in essence, a second router to determine where to send the traffic.

OpenVPN is installed by default with OPNsense, but I took this as a chance to change over to the lightweight and better-performing Wireguard. And the VPN performance has been much snappier as well. Moving to Wireguard was probably a lesser part of that performance jump compared to being able to have the VPN service on the router.

Going wireless

WiFi 6 is integrated into the Google Fiber router. I do have an older Tenda AC1900 wireless router, but I wanted to keep the WiFi 6 capability. Enter TP-Link and their EAP670 WiFi 6 access point. It has a 2.5Gb RJ45 port that can also be powered via POE+ or the included 12V adapter. I have it connected directly to the 10GbE switch through another RJ45 adapter.

The beauty here is not just cost – I found it for about $150 at Micro Center – but expansion. If I need greater coverage of my house, I can install a second and set up a virtual machine as an Omada controller for hand-off with all of that configuration staying local. It also has the capability for guest networks, though I haven’t used this yet.

Performance and recommendations

My network configuration is now back to what it once was but with a couple slight improvements.

First being the custom router itself. Objectively and subjectively, it’s allowing for a much better connection to the Internet. The speed test when I put the new router into service was higher than the initial speed tests when I first got the Internet service upgrade. Probably about 15% better and it was the first time I saw >2000Mbps on the downlink during a speed test.

And there are two reasons for that improvement. The custom router being one, being able to perform a lot better than the Google Fiber router. The hardware providing the physical connections being the other.

In my last article about the CRS317, I said I used a MikroTik S+RJ10 module to connect the switch to the Google Fiber router. That’s a very high latency connection. Even with a Cat7 cable. Higher still than using dedicated RJ45 hardware. It’s just the nature of the beast.

This changeover allowed me to use an optical fiber connection between the switch and router – the first time I’ve been able to do that. Optical fiber has virtually zero latency across short runs.

And the connection from the router to the Google Fiber box is going through dedicated RJ45 hardware, not an SFP+ RJ45 module that gets very hot. No, seriously. Even with a fan, it was running at over 60°C continuously while the optical fiber modules had no issue with temperature. And with this upgrade, I was able to remove the fan I had blowing down onto the SFP+ module.

So what can you take away from this if you want to build your own router?

1. Have a high-performance switch as the backbone for your network

Avoid the cheap desktop switches. Like the ones that are under $30 for 8 ports.

Two things to look for are 1. whether it supports full-duplex and 2. the switch bandwidth. The switch bandwidth should be higher than the all the ports combined at half-duplex – e.g. an 8-port GbE switch should have switch bandwidth higher than 8Gbps. If the switch specifications don’t even mention “switch bandwidth”, then don’t bother with it as your network’s backbone.

The uplink of the switch will also matter as you’ll need to make sure it’s faster than your Internet connection. So if you’re sticking with Gigabit Ethernet but have a faster-than-Gigabit Internet connection, then something like the MikroTik CSS610 will be perfect as a backbone switch. Just make sure, again, to use an optical fiber connection between that switch and your custom router.

2. Build the router with only one (1) WAN and LAN port, if possible

Don’t build your custom router to also act as a switch. Build it only as a router. This means one port for the LAN, one for the WAN. The LAN port goes to your backbone, the WAN port to your modem or, in my case, ISP-provided router configured to act as a bridge. Even if you want to segment your network so one part is isolated from another, you can generally accomplish that far better and still maintain line-speed or near line-speed performance with a managed switch – e.g., the MikroTik CSS610.

Both ports should be also faster than your Internet connection. For example, if you have a Gigabit Internet connection, buy 2.5GbE NICs. This should ensure that you are able to max out your Internet connection. And if you have less-than-Gigabit Internet, don’t rely on any onboard Ethernet controller unless it’s an Intel chip.

Your custom router will rely on software for moving packets around, so keep it relegated to just one task – moving packets into and out of your home network while blocking everything else you didn’t explicitly request. Having it also move packets between other interfaces will only degrade performance.

So if you’re acquiring hardware to make your custom router, stick with a single dual-port card. I have two separate cards only because I’m using different media – optical fiber between the router and switch, Cat7 between the router and the Google Fiber box. Just make sure the mainboard and processor combination will have enough PCI-E lanes to allow for it. Use an AMD APU or integrated Intel graphics where possible to free up slots and lanes.

3. Connect only the switch to the router. Nothing else.

Sure this kind of seems like a duplicate of #2, but I’m mentioning it in case you decide to use a card with more than two ports.

The switch will handle everything about funneling traffic to and from your router. And if you have any other services on your network, it can prevent traffic from clashing so you can still access those services (e.g., a Plex Media Server) without impacting or being impacted by anyone else’s Internet activity. Provided you aren’t relying on a cheap switch.

4. Don’t forget the UPS

Unfortunately OPNsense appears to support only APC via a plugin you can install, but that only matters if you require monitoring and auto-shutdown. Make sure to get one rated for about… double what your router requires to operate and pay attention to the half-load battery runtime.