Revisiting bottlenecking, or why most don’t use the term correctly

Previously I wrote a long rant-ish article about the term bottlenecks, particularly with regard to AMD vs Intel. In that article I tried to demonstrate, among other things, how the term “bottleneck” is misused and, frankly, overused. I didn’t exactly do the greatest job in that article, so I’ll revisit the term here, especially in light of some things I’ve learned over the last year.

Working on the Colony West project modified my perspective on the topic, or rather informed it significantly more. The project’s ultimate goal was a rack hosting three systems. One of the systems was a Minecraft server built on an AMD Athlon 64 X2 3800+. The other two systems would be distributed computing systems, one running an AMD Athlon 64 X2 4200+, and the other running an AMD FX-8320E.

In terms of computing, the word “bottleneck” is very disproportionately levied against the AMD FX processor, and typically with regard to higher-end graphics cards. This became especially true after the GTX 900 series was introduced. Anyone who posts anywhere that they are building a system with an AMD FX processor and a GTX 900 series graphics card will likely get the term “bottlenecking” thrown at them — typically by someone who doesn’t understand the term, which seems to be practically everyone who uses it.

The ready assumption is easy to paraphrase: the AMD FX processor is always a bottleneck, period, end of story, no discussion. And the adjacent assumption is this: Intel processors are never a bottleneck, period, end of story, no discussion. The number of people who say in threads regarding the AMD FX processor “Stop defending AMD” is telling on that mark.

The more accurate statement is simply this: all components in a computer can be a bottleneck. The Intel i7-5960X can be a bottleneck. The Xeon processor can be a bottleneck. The AMD FX-9590 can be a bottleneck. The Titan X can be a bottleneck. The almost 300,000 cores (16-core Opteron processors) and 18,688 GK110 Tesla GPUs that comprise the Titan supercomputer can be bottlenecks.

The speed of light can be a bottleneck.

This was recognized and demonstrated by the late Grace Hopper, RADM, USN (ret.). One of Adm. Hopper’s many contributions to computing was simply to recognize that computers must become smaller to become faster — something that today seems so obvious. She was famous for carrying around “nanosecond wires” — strands of wire that were about 30cm long, the distance that light can travel in one nanosecond. Initially they were used to demonstrate why the speed of light is a limitation to satellite communication, but it served to also demonstrate why all components will have a ceiling with regard to bandwidth and processing power.

It is one of the reasons multi-core processors have been the norm and why multi-processor mainboards have been the norm in high-performance servers: when you can’t expand performance linearly, you expand it laterally through parallel processing.

So now that we have that out of the way, let’s talk about how the term is typically applied: CPUs and graphics cards. I will also explain why my Athlon 64 X2 processor is not “bottlenecking” the GTX 680.

What is a bottleneck?

So what exactly is a bottleneck, in the proper sense of the term? In short, it is an inefficiency in a process.

An optimal process is one wherein no part of the process must wait to do work. So if you have three steps to make a widget, and one person for each step, you would want to make sure that persons 1, 2, and 3 all take about the same amount of time to perform their steps. The goal is to minimize idle time.

If person 1 takes significantly longer than person 2, then person 1 is said to be an inefficiency — a “bottleneck” — in the process. Now if person 2 takes significantly longer than person 1 to do their task, then person 2 is even worse of an inefficiency. Not only is he holding up person 3, but he’ll actually force person 1 to slow down to prevent a backlog of work.

Note that whether a process has a bottleneck has nothing to do with the overall time it takes to perform the process. Only the time to complete one or more parts of the process in relation to the entire process.

A complementary question, however, is whether that happens to be the nature of the process being performed. Will that task always take longer? If so, the process may need to be redesigned or re-implemented by bringing on additional personnel. If person 1’s task takes only 5 minutes, but person 2’s task requires 10 minutes, then you’ll want two workers on step 2 of the process for every person on step 1.

The primary focus of managing the process and the workers at each step is minimizing idle time and ensuring the process can move smoothly. Some idle time may be desirable, especially when you’re talking about people, but too much is detrimental.

In alleviating inefficiencies or improving the process, the manager (and typically involving consultants) will look at each step. To determine if the inefficiency is ultimately unacceptable, the process manager will evaluate their options to determine if correcting the inefficiency could be more costly than just living with it.

Typically significant improvements are needed to justify making them so as to avoid a situation where correcting inefficiencies provides an overall loss or barely any gain.

Bottlenecks and computers

It should be quite easy to see how this applies to a computer. The “people” is the hardware, and the task they are trying to perform is the application or game you are trying to run. But it’s not a clean analogy. In most processes, the manager is not a part of the process. Instead the manager delegates tasks and oversees the operation.

In a computer, on the other hand, the task manager is not only overseeing the process, but doing a significant amount of the work. This is why it is not proper to apply the term “bottleneck” to a computer, especially the central processing unit (CPU), since the central processing unit is the process. At the least it is not proper to say that a CPU will bottleneck a graphics card, or any other hardware for that matter, but instead to say that all the other hardware will “bottleneck” a CPU. Sit back and relax, we’re about to get very, very technical, so do try to keep up.

There are two types of operations a CPU performs: blocking and non-blocking.

The colloquial for a “non-blocking” operation is “fire and forget”. The CPU tells the hardware to do something and then the CPU goes off and does something else immediately. It doesn’t care about the result and won’t wait for one, so the hardware is not blocking the CPU.

Then there are “blocking” operations — which comprise the vast, vast majority of tasks and instructions a CPU carries out. If “non-blocking” means the CPU won’t wait for a result, “blocking” means the CPU will wait however long is necessary for that result. Some processes will have a “timeout” value associated to them, meaning there is a ceiling to how long the CPU will wait. But it is quite easy to see how these work, and readily explains why the single most restrictive “bottleneck” in your system will always be storage and network devices.

So again, the CPU is the process. This is why you will almost always see performance improvements in your system by going with a better processor.

Central processing units work through an instruction set, a set of instructions that have been hard-wired into the CPU. Here’s the interesting part: the CPU does not have any instructions for talking to any device. Instead CPUs talk to memory addresses, and every instruction the CPU carries out is about manipulating memory addresses. If you don’t believe me, look up the AMD64 and Intel x86 instruction sets and you’ll see that there isn’t even an instruction for writing text to the screen. Instead everything happens by manipulating data at memory addresses.

That is at the instruction set level — i.e. the actual instructions the CPU is running. If you’ve never studied assembly language, consider yourself lucky to have remained insulated away from the granularity of all of that.

So then, why is it improper to say that a CPU bottlenecks a graphics card?

What’s often omitted from the discussion is simply that the CPU and other hardware work in tandem, never in isolation. It is rare that upgrading either the processor or graphics card, or going with multiple graphics cards, will result in no performance improvement. On LinusTechTips, I said this:

Upgrading the CPU or GPU will always improve performance in a gaming system. I don’t know of a situation where this won’t be true. You could have a GTX 480 with a Skylake CPU and, provided you don’t run into any incompatibility concerns, I’d wager it’ll outperform a Sandy Bridge with that same graphics card, and likely quite significantly. Sure we can argue there’s a ceiling, but I’d wager that you’ll run into incompatibilities before that ceiling becomes a concern.

Same with graphics cards. Pair a Titan X with a Sandy Bridge and it’ll outperform a GTX 480 with a Sandy Bridge. And if you have two Titan Xs in SLI with a Sandy Bridge — again, assuming no compatibility concerns — it’ll likely outperform a single Titan X with a Sandy Bridge, though the performance certainly won’t cleanly scale. But take those two Titan Xs and put them with a Skylake and you’ll see significantly better performance compared to the same on a Sandy Bridge. Knowing this makes the question on “bottlenecking” not an easy one to answer, and also shows the massive misuse of the term, because the question of whether there is a bottleneck still comes down to the process you’re trying to perform since.

Specifically, the question is to the output and requirements of the process. Will the hardware combination deliver an adequate level of performance? That depends on how that is defined, and comes down to other variables involved — monitor, resolution, refresh rate of the monitor, and the FPS and response level the combination can deliver. If the answer is no, then figure out what to upgrade.

And that brings me to the main point, that the question is more of optimal level of performance.

Optimal level of performance means meeting or exceeding a desired level of output. For any process, that should be the focus. Not how well any one part of the process performs, though it does matter, or whether there is something better out there (because there will always or eventually be something better), but whether it is meeting or exceeding your expectations.

In the case of an application, the desired result is typically defined as completing a particular task within a particular period of time — the lower the better. This will often be defined in requirements when determining what equipment to purchase. For a game, the desired result is typically defined by the frame rate, refresh rate of the monitor, resolution, and visual quality settings for the game.

Whether a system achieves the desired result is up to its owner. And if the system does not achieve the desired result, whether the deviation is acceptable or needs to be corrected, and the evaluation on correcting it is also up to its owner.

The AMD FX processor

But whether a system can achieve a particular level of performance seems to be cause for confusion. For one, there are a lot of AMD FX naysayers who seem willing to just make shit up. Numerous times I’ve seen a lot of statements about the AMD FX processor that make me wonder whether the individuals making the statements have ever actually had an FX processor and under what kind of configuration. I’ve seen written several times that the FX processor shouldn’t be used for MMOs or online play. Another commenter said that the FX processor cannot deliver anything more than 25 or 30 frames per second at 1080p in AAA titles.

My only thought seeing statements like that is “where the hell are they getting that information?” As I said in the previous article, if you believe what some of these people say, the level of performance I can get from my system and my wife from her previous system is impossible.

As such, the question of the level of performance a system, in particular an AMD FX system, can provide has been subject to the “shifting of the goalposts” fallacy — “well the FX is fine up to [insert graphics chip here], but bottlenecks everything beyond that”… And all the while it is presumed that no Intel processor “bottlenecks” high-end graphics cards (setting aside for a moment the incorrect use of the word). And the definition of whether a graphics card is “bottlenecked” is whether it runs at 100% or not during a game, with it being declared “bottlenecked” if it never does.

Whether the overall system achieves a desired level of performance — the true definition of whether a system has any undesirable inefficiencies (i.e. bottlenecks) — seems to never be part of the discussion.

The Athlon 64 X2 and the GTX 680

I said that the GTX 680 is not being “bottlenecked” by the processor, going on the incorrect usage. Many would certainly dispute that, mainly because many likely won’t ask what I’m doing with this setup, or understand how this setup could possibly work. The PCI-Express 1.0a lane is fast enough that it isn’t constricting the card for the tasks being performed, so having a newer PCI-Express standard won’t provide any improvement. A faster processor will only provide for a faster transition between tasks, and the difference is likely to be largely insignificant.

As such because the demand on the PCI-Express system bus is relatively small, the CPU is able to transfer the data to the GTX 680 without breaking a sweat. And the system is able to run through the Berkeley tasks without any problem with minimal CPU usage — the Linux GeForce driver doesn’t provide any GPU usage statistics, unfortunately.

For the requirements and demands that are placed on this system, the only system bottleneck is the GTX 680. I could alleviate that bottleneck by purchasing a GTX 980, 980Ti, or Titan X, even an nVidia Tesla — provided such cards won’t be limited by the PCI-Express 1.0 lane. But the gains to me — a quicker accumulation of BOINC credits — are not worth that expense (especially the Titan X and Tesla). Again, CPU usage is minimal, so a faster processor won’t help. The Ethernet connection is faster than my Internet connection, so there is no improvement to be made on that front, and only 1GB of RAM is being used, and switching to faster RAM likely won’t improve anything either.

Compared to a GTX 770 connected to an FX board (the GTX 770 is a re-branded GTX 680), the X2 won out because of the operating system setup (Linux on the X2 versus Windows 8.1 Pro on the FX), finishing a similar task about 25 seconds faster (265 seconds compared to 290 seconds, about 8.6% difference) than the FX/GTX 770 combination.

So for the tasks in question, a faster CPU or even a newer platform would be a waste of money since there would likely be no significant gain in performance.

Inefficiencies in your system

So when it comes to your system, again the question to ask is whether you are getting a desired level of performance. If you are not, evaluate what to upgrade. Would more memory be better? Would a newer graphics card be better, or would you be better off replacing the mainboard and processor with a newer platform?

Unfortunately you are largely not going to find a good answer to these questions in any online forum. As soon as you post the specifications to your system, they will be cherry-picked, and you will be told to upgrade a certain way. Not suggested, told. Don’t even consider anything else. Buy only what they tell you to. Think I’m joking on that? I’ve seen what happens when someone mentions they have an AMD processor. They are told to buy Intel.

This problem is especially evident on the Linus Tech Tips forum. One person who talked about water cooling an AMD system was met with the “have you considered upgrading?” response twice before a more reasonable person chimed in with “how about not just shooting people down, but also offering to help…that way they don’t feel like you’re insulting them”.

Another person was told “Instead of wasting money on water cooling, spend money on a real processor that is capable of pushing two R9 290Xs.” Yet another person who wanted a custom loop for an FX-6300 was also met with “Don’t waste your money” and “Your money will be much better spent buying an Intel CPU”. Apparently people forget that water cooling a budget system can be more about learning than the benefit the custom loop will provide. I learned a ton water cooling mine and my wife’s systems, and that knowledge was poured into building a high-end system for a friend.

And more recently another person who asked about putting an AIO on an FX-8320 was basically given the treatment of “that money is better spent switching to Intel”, a response I called “more condescending than helpful” by comparing it to “pushing someone to buy a new car when all they need is to have their HVAC or cooling system repaired”. I guess everyone forgot that virtually every AIO on the market includes mounting hardware for both AMD and Intel sockets.

With any potential upgrade path, there is, obviously, going to be a ceiling — a point beyond which you won’t see any significant improvements to performance — and a point of diminishing returns, wherein the benefit starts to decline dramatically the higher up you go. And if what you’re attempting to run is very poorly implemented software, the point of diminishing returns is hit much, much sooner. As I said in the previous article, you’ll need significantly faster hardware to overcome poorly implemented software — something we are seeing with DirectX 12 benchmarks.

When it comes to gaming performance, the important question is actually the frame rate versus your monitor’s refresh rate. If you have a 60Hz monitor (or television) and your frame rates are consistently over 60 frames per second for what you currently play, it’s pointless to upgrade because you won’t actually see the performance improvement. Go too high on the frame rate and you’ll start to observe a phenomenon called “tearing“.

Gaming performance, and improvements thereto, is also not a straightforward topic to address because the CPU and graphics systems work in tandem. Improving either will improve your gaming performance. If you have an SLI or Crossfire capable graphics card, you will see performance improvements by going from one card to two, even if you’re running an FX processor.

Instead, again, what to look for is idle time, as that points out inefficiencies depending on your expectations for the system. For example if you’re running a program that is very CPU intensive but not very GPU intensive, then you’d be focusing only on the CPU. If you’re running a program that is heavily GPU intensive, such as a game or 3D modeler, you’d look at both, but the GPU would be more important, and you’d likely want to see GPU usage higher than CPU usage depending on what you’re doing — I realize there are applications that will tax both the CPU and GPU to pretty significant degrees.

In the X2 graphics host, I would expect CPU usage to remain minimal because I don’t have any Berkeley CPU tasks running on that system, only OpenCL tasks. In a game, however, I would expect the graphics usage to be significantly higher than the CPU. I should not expect the CPU usage to ever max out.

This isn’t to say there is no benefit to upgrading. It just may not be nearly as pronounced as one might expect — unless it’s been at least 5 years since your last upgrade. In terms of gaming, I’ve posited before that there is no point to switching from an AMD FX processor to any Intel processor if your system is already delivering frame rates that exceed your monitor’s refresh rate for the games you play with all quality settings maxed out — or at quality settings with which you are satisfied.

In the case of the X2 graphics host, again, I’ve already demonstrated there won’t be any gains going with a faster processor or newer platform. For what it does, the X2 and single PCI-Express 1.0a lane suffice, and there actually will not be any gain going with a faster mainboard/processor combination.