Why Batman: Arkham Knight’s PC Version Will Not Be Fixed
When Warner Brothers and Rocksteady pulled Batman Arkham Knight from Steam and other online PC storefronts this summer the message seemed to be clear. We understand there is a problem and we are going to fix it. 4 months later and the promised fixes feel like a half step, with Warner Brothers all but admitting defeat. For those unfamiliar with Unreal Engine 3 and the differences between modern console and PC architecture, it seems like the effort was not put in to fix the problem, that more respect should have been given to PC players. That latter part is true, but that respect should have been given from the beginning. Based on the state of the game both at launch and now after the relaunch it is clear that the problems facing Arkham Knight are too dire to be fixed with a simple patch. Understanding why is the point of this article.
A couple things upfront before I begin. First off, I am using the data provided by Eurogamer’s Digital Foundry division for my metrics. Why? I don’t have the money or resources to measure those numbers myself and Digital Foundry has proven to be highly accurate time and again. Secondly, understand that I have no direct knowledge of what actually occurred during the development of this game. That said, I have worked with Unreal Engine 3, the engine behind Arkham Knight, and based both on the nature of the problem, the nature of the fixes, and the state the game is in, I am fairly certain that my assessments are largely on point. But please be aware that in the end all I am doing is guessing. It’s a highly educated guess based on direct experience developing with Unreal Engine 3 and other predictive streaming engines, but I could be wrong. Even so, I hope this article will prove useful to those people trying to understand how a game can have so much trouble running on systems that seem to be so much more powerful than current generation consoles, and hopefully provide a little technical perspective behind what makes open world games so difficult to build in the first place.
The very first thing that we have to discuss here is the concept of predictive streaming. In game engine terms, a streaming system is a system that takes an asset (say a barrel) from its place on a hard drive and puts it into the video memory used by the graphics card to render that barrel in the game. All game assets are initially stored on a hard drive when installed, and are put into video memory when the graphics card needs to use them. Early games basically made entire levels that tried to stay under the limits of graphics memory of the time. But as processors and graphics cards became more powerful while RAM (random access memory) remained more stagnant, developers needed to come up with a way to stream in new graphics assets into a level incredibly rapidly. The problem was that despite how fast the transfer from hard drive to RAM occurred, it would always be a fraction of a second too late if the game only tried to stream in new assets at the moment the player approached them. Basically, the whole game would be a tiny fraction of a second behind the actions of the player, which would cause motion sickness. Those people who tried early VR demos have experienced this issue firsthand, as it is the exact same issue developers faced all over again when working on that technology.
Arguably the first team that truly mastered the art of predictive streaming was Naughty Dog with Jak and Daxter. Wanting to build a large world without any load times, they realized they were going to have to stream in new assets and remove uneeded ones as the player moved through the game. To do this quickly enough they built an algorithm that would predict where the player would be most likely to go from any point in the game. Using that in combination with fine tuning by hand, they were able to have the assets in RAM just before the graphics card needed to render them. Criterion’s Renderware technology which was used extensively on the PS2 also made use of a predictive system. While predictive streaming was used largely only for PS2 games during that era, the next generation saw a massive shift in development. The PS3 and Xbox 360 both had incredibly powerful CPU’s, but their 512 MB of RAM made it impossible to make use of all that power without streaming. The developer that first cracked this very difficult nut was Epic Games with Unreal Engine 3, and they cracked it so early and so well that a massive number of games last generation made use of the engine.
Unreal Engine 3, though, had its limits. Mainly, it was very good as streaming in and out small areas worth of assets. It was not very good when it came to streaming out large numbers of assets very rapidly. Hence, Unreal Engine 3 was rarely used for developing large scale open world games, and this lone fact largely explains the prevalence of linear shooters last generation. They could render incredibly detailed worlds, as long as they could get them into and out of memory fast enough. Every engine last generation focused exclusively on improving streaming quality. It was the single largest limiting factor last generation, and it was a limit from the day the consoles launched.
Noting these issues, Microsoft and Sony both decided that they would not make the same mistake twice. Hence both put a massive 8 GB of memory in their new systems. And both consoles used an APU structure, also known as a system on a chip. Developed for use in tablets and smartphones, APU’s combine graphics cards and CPU’s into a single chip that also stores the RAM directly on the same piece of silicon. Hence it contains all parts of the computer but a hard drive on a single chip, a system on a chip. The core advantage of an APU is that it takes up less space for the amount of power it gives you, while also using less energy and therefore running cooler. For a handheld device it is a perfect fit. But it also solves a significant issue for game developers in the world of streaming and solving this issue is what has allowed open world games to become so prevalent this generation compared to last.
To understand the solution you have to first understand the problem. So let me try to explain it to you using our barrel example from above. In a traditional computer the graphics card and the CPU both have their own pool of memory. A graphics card might, for example, have 2 GB while the CPU might have 8 GB. CPU RAM and GPU RAM are different, each built to work most effectively with the types of data they will be dealing with. And both pools are physically separate, and that physical separation can present an issue since the RAM used in graphics cards is much more expensive and therefore less of it is usually included in computers. The issue lies with what happens when the CPU has predicted a graphics card will need an asset, but the graphics card does not have enough memory to store that asset. On a console the developer would simply have to redesign the level so that asset was needed later. On a computer, where users have any number of different hardware configurations, it is left up to the player to lower graphics settings to a point where performance isn’t impacted. With an APU, though, all of the RAM in the system, in this case about 5 GB (3 GB are reserved for the operating system and various Apps), can be used by either the CPU or the GPU. So if the Graphics card needs more than 2 or 3 GB of RAM it can tap into the unused RAM left by the CPU. And since both consoles have the same amount (although not the same type) of RAM, and both use the same AMD Jaguar APU, a developer like Rocksteady can carefully design their game so that they use the very maximum amount of RAM that they will have at any one moment, no more and no less.
Now imagine, though, that you are not Rocksteady, but the handful of programmers at some distant studio tasked with bringing the game to PC. On PC, around 99% of players are not going to have 5 GB of graphics RAM. In fact there are only a handful of graphics cards in the world with that much memory, and all but one were not released until this year, some after the launch of Arkham Knight. That one card that was out when Arkham Knight was launched? The Geforce Titan Black which runs for $1000. And while you can now get a graphics card with 6 GB of RAM for around $500, that simply means that now 1% of players have access to as much memory as console gamers, not 0.001%. So as the programmers assigned this task how do you solve this problem? Well what you do is store any additional assets that can’t be loaded into graphics RAM into the CPU’s memory and then transfer it as soon as the graphics memory becomes free. That moment it takes to transfer the asset between system memory and graphics memory? Those are those tiny hitches that players have been complaining about. You can artificially create those hitches in any Unreal Engine 3 game by going into a games configuration file and limiting how much graphics memory it can use. And you can reduce or remove hitches in any Unreal Engine 3 game by going into the same configuration file and increasing the pool of available memory.
So now, hopefully you understand the problem, the cause of these hitches. Now lets examine why a solution is so tough to create. When designing a game for the PC, most developers create every asset at a variety of different levels of detail. That way players with less memory can simply choose assets with lower detail to use. But if Rocksteady never took the PC version into account when designing assets, such low quality models might not exist. Furthermore, an APU has a major drawback, a rather massive one in fact. Due to the amount of space required to put every part of a computer on a single chip, no single component can be especially powerful. That is why you have a separate graphics card on desktop computers. It allows for more processing power. So for Rocksteady designing for an APU, they want to build their world so that it uses as little CPU and GPU resources as possible.
Now a key problem from streaming engines is that a lot of CPU resources are used telling the graphics card to delete certain assets and then render new ones. These are called draw calls and in the previous generation and in every console generation leading up to it, those draw calls were the lesser of two evils. There was more spare CPU power than spare memory, so developers would split larger, more complex assets into smaller chunks and only load the specific parts that a player could see at any one time. On the PS3, Sony’s internal teams went one step further, storing assets multiple times on the disc so that when an asset was needed by the game the disc would have to spin the shortest amount of time before reaching the needed data. The PS3, though, and its Cell processor, had a ton of CPU power but very, very, limited memory resources. This generation the problem is the exact opposite. So to limit the number of draw calls and leave more CPU resources open for other tasks, developers might take several smaller assets that are always rendered together and merge them into a single, larger asset. That way the CPU has to only make a single call to draw them, and the extra RAM space they take up is less valuable than those freed CPU cycles.
So if you think about this, and you put yourself in Rocksteady’s shoes, you would design a system that takes all that I’ve just described into account and turns it up to its most extreme. The result will be a streaming system that was never designed to move in and out a lot of individual assets, and assets that aren’t split into numerous individual pieces. So how do you overcome that issue? The answer is that you either build an entirely new streaming system while also cutting apart all of the existing assets and building lower quality versions of them, or you don’t. There really is no other option, hence why I said that Rocksteady would not be able to patch Arkham Knight by the end of the year with a fix this summer. By my estimation it would take either a large team of artists and programmers about half a year, or a smaller team at least a year if not two to build the system I was describing and create all the new art assets required to make it work. Either option would cost millions of dollars and delay any other projects Rocksteady was to be working on. Knowing this I predicted that at best they were going to have a team of programmers try to improve the streaming system using the current assets. Considering the degree of improvement shown since launch, I would say that is what they have done. But no matter how good your predictive code, nor how extensive your fine tuning, if you don’t have the assets there is a limit to what you can do. And while a handful of programmers can fix code, cutting up art assets requires a large team of artists and that was money that I had always doubted Warner Brothers would spend. Therefore they had only one other option. Require that gamers have Windows 10 to play the game on max settings and make a DirectX12 port to try and alleviate some of the issues caused by DX11 overhead, while also requiring that those users have a ton of video RAM. Making a DX12 version, though, still seemed beyond Rocksteady’s capabilities, and so their solution is even more extreme. Require Windows 10 and 12 GB of CPU memory to play the game on the highest settings. Now you might ask how that fixes the problem. Well the reason is that it takes out a step from the process. As I said at the start of the article, all assets start out on a hard drive before being transferred into RAM as needed. 12 GB of RAM is the amount of space required to store every asset in the game in system RAM. Basically, Rocksteady didn’t solve the problem. See, as a game runs over time certain assets might not get deleted for any number of reasons. Sometimes this problem, called a memory leak, is easily solved. Basically someone forgot to tell the game to delete an object. Other times, deleting an object can be a very complicated process, and the amount of time needed to fix some leaks seem to have been too much to ask of the few programmers tasked with building the new streaming system. So over time, performance degrades back to the levels seen at launch or worse.
But what if you didn’t need to stream in any data from the hard drive at all? What if you just stored it all in the system RAM and then swapped it into the graphics RAM as needed? That solution, AKA the brute force solution, seems to have been the only option left to Rocksteady considering the staggering scale of the task and the very limited amount of time they were given to achieve it.
So is there anything you can do as a player if you don’t have 12 GB of RAM? Well, if you have a solid state drive make sure the game is fully stored on that. If you are building a new computer right now and have selected one of Intel’s new CPU’s, include an SSD that connects through a PCIe slot and utilize the new DDR4 RAM. Those aren’t solutions, just a brute force way of lessening the impact. In the end, the streaming solution needs to be fixed for the RAM requirement to go away, while I highly doubt anything will solve the issues facing those who have less than 4 GB of graphics RAM. Simply put, the cost of solving that problem simply won’t be worth the money to Warner Brothers.