Game Tests 2 & 3

Vertex Skinning

Game Test 2 & 3 both utilise virtually exactly the same rendering paths, with the primary difference being the look and content of the scene. NVIDIA has a couple of reservations about this test, the first of which being the rendering process Futuremark has adopted.

Let's take a look at the first of NVIDIA's reservations for these two tests:

"The portion of this algorithm labeled "Skin Object in Vertex Shader" is doing the exact same skinning calculation over and over for each object. In a scene with five lights, for example, each object gets re- skinned 11 times. This inefficiency is further amplified by the bloated algorithm that is used for stencil extrusion calculation. Rather than using the Doom method, 3DMark03 uses an approach that adds six times the number of vertices required for the extrusion. In our five light example, this is the equivalent of skinning each object 36 times! No game would ever do this. This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs.

It's unfortunate that 3DMark03 does not truly emulate Doom or any other game by skinning each object only once per frame, caching the skinned result, and using that cached result in the multiple passes required for shadows. This would have been a balanced approach that allows both the vertex and pixel/raster portions of the graphics engine to run at full speed. Designing hardware around the approach used in 3DMark03 would be like designing a six lane on ramp to a freeway in the freak case that someone might drive an earthmover on to it. Wasteful, inefficient benchmark code like 3DMark03 force these kinds of designs that do nothing to benefit actual games."

It's quite easy to see NVIDIA's point here. Futuremark has adopted a method whereby the geometry for the objects are stored in video RAM and skinned via the Vertex Shader and the skinning has to occur for each pass. Some titles will do the character skinning on the CPU and upload that to the graphics board, which means the skinning has to be done only once, but requires CPU cycles to do it.

There are advantages and disadvantages to doing it both ways. One of the main disadvantages for skinning via the CPU is that the number of characters on screen and their complexity is directly affected by the power of the CPU available on the machine it's being played on, and with elements such as physics calculation and possibly visibility culling systems running on the CPU this can quickly become a bottleneck. On high end boards such as Radeon 9700 PRO and GeForce FX most titles available today are CPU limited all the way up into the highest resolutions. Another issue that arises with skinning via the CPU is that for each frame the geometry needs to be loaded across the AGP bus, which is generally known to be a bottleneck within the system, especially if textures are being addressed system RAM as well. On the flipside, when the skinning is done in such a fashion as has been adopted here it puts an increased load on the graphics board which will stress low end boards heavily.

However, NVIDIA's statement claims that the method Futuremark has employed increases the geometry load to such as point that it's bottlenecked by the Vertex Processor. Our testing would appear to suggest otherwise, as the fill-rate graphs from GT2 do not have a flat slope upwards, as you would expect from a purely Vertex Shader limited benchmark (as shown here), but have more of a curve to them indicating a reasonable balance between Vertex limited and fill-rate limited areas of the benchmark, and it is obviously getting more fill-rate limited at higher resolutions. Let's single out the GeForce4 Ti4600's benchmarks scores:




4600 18.5 14.1 10.7 7.6 5.7
Vertex Limited 18.5 18.5 18.5 18.5 18.5

I've added to the benchmarks an extra generated data point line. If the test were purely Vertex limited then the FPS would stay roughly consistent through each of the resolutions. The second line illustrates the case if this was purely Vertex Shader limited. As the fill-rate graph shows, by 1024x768 we'd expect to see twice the number of pixels displayed per second than are actually being displayed, which indicates that this is actually relatively fill-rate bound even by this resolution; obviously the gap increases as the resolutions further scale upwards.

Futuremark indicates that they have sought this method in order to make the benchmark as dependant on the graphics card as possible, and to cut down the dependency on other elements of the PC system for any actual 3D rendering. By this method the CPU is free to concentrate on other elements that are required in such a gaming scenario. If we look at the wide spread of performances from the various cards, with their differing fill-rates and vertex processing abilities, at 640x480 (traditionally the most CPU limited situation) in the GT2 fill-rate graph from our 3DMark performance article, we can see that Futuremark does actually appear to have achieved this to some degree of success. If the test were highly dependant on the CPU, the starting point for all those boards would be much closer together.

Both methods appear to be valid with no single method being the be-all-and-end-all way of processing it. In fact, as graphics processors increase in power, and the number of functions they can achieve per pass increases, the method Futuremark has used is likely to be adopted more, because with fewer passes there will be less need to re-skin the objects. NVIDIA's issue appears to stem from the fact that on current DX8 PS1.1 processors large numbers of passes are required under the method Futuremark has adopted.

An interesting discussion has broken out at OpenGL.org on the methods the skinning options brought up by the 3DMark discussion, and this can be seen here. The discussion features personnel from both NVIDIA and ATI.