Comparison with PC parts


 In this article we didn’t provide examples of what a similarly setup AMD GPU on the PC could produce and how fast it ran. The reason why we didn’t do that is simple: it’s meaningless. And not meaningless in the way GigaFLOPS values are taken out of the context and thrown left and right as if they were the one and only true benchmark (they’re not). No, knowing how a certain part performs inside a Windows PC, with its architecture, the size of the OS layer, the various API layers and driver quality, literally tells you nothing about how that chip would perform inside an embedded machine. The Wii U will have its own software layers, but unlike a PC, these wouldn’t have to accommodate a large spectrum of different hardware and configurations. In fact, developers’ opinions on the matter of API bloat have recently sprung back into the news with chatter from AMD head of developer relation Richard Huddy and id Software John Carmack, to name two examples, saying that API overheads on PC (and iOS) could claim as much as 50% of any parts. Obviously, that is not to say that because Wii U is a closed system that its API calls and OS overhead will be slim and responsive. It’s a new platform, and new platforms require some time to get their software stacks optimized. With that said, Nintendo has the power (pun intended) to let developers on Wii U circumvent any API shortcomings by allowing low level access to the hardware. Of course, you cannot have ease of development and low-level access in a single same sentence without having some negative form in there as well.

Another plus brought by being a closed platform is the possibility to use a given architecture to its fullest, without being limited by API software only limitations. And by that I’m not solely talking about the usual issues with limited draw calls and such, I’m also hinting at being able to use part of the chip that are not exposed through Direct3D or OpenGL (sans proprietary extensions). The main candidate for such a prime time in the case of RV7x0 and pre DX11 AMD GPU hardware is the tessellator engine. That tessellator might not be the most impressive of all once compared to NVIDIA's but it could carve itself a place inside every Wii U developer’s toolbox.

Direct X10, 10.1, 11 architecture: does it matter to GPU U?


Using RV7x0 and its DirectX 10.1 architecture as starting point for building GPU U, DX11 compliance doesn’t matter much. The computational building blocks of the GPU, the SIMDs in RV7x0 are more than capable of taking GPGPU workloads and the RV7x0, just like the R6x0 architecture has a hardware tessellator. The API refinements that exist in DX11 may seem irrelevant to Wii U, given it doesn’t use DirectX as API. However, we must keep in mind that, for example, the tessellation mechanism implemented in pre-DX11 ATI parts is somewhat clunky, requiring two-passing. Another example would be that DX11 also introduces atomic operations that are useful in quite a few occasions, and pre DX11 do not include such functionality.

The refinements made to the SIMDs (VLIW4, instead of VLIW5) and texture sampling units in Cayman compared to RV7x0 can, in contrast, have an impact on the performances of the GPU U.