Beyond3D - T&L Investigated

T&L Investigated - Page 18

Published on 11th Jan 1999, written by Kristof Beets for Consumer Graphics - Last updated: 27th Apr 2007

12. I was looking at NVIDIA's tree demo and I maxed everything out. Switching it to wire-frame mode, I saw massive amounts of overdraw, something I expected but never really realized. How will the issue be addressed as more and more smaller polygons are used with T&L and more overdraw occurs?

Well, overdraw is otherwise known as "depth complexity." The only way to be really able to handle more depth complexity is to increase fill-rates, an issue I'll discuss belowâ€¦

Other Issues:

1. The SDR/DDR memory issue. There are now 2 types of synchronous memories being developed, single data rate (SDR) memories (which is what is used widely on today's 3D accelerators) and double data rate (DDR) memories (which are just now being developed and are not used currently). The SDR memories transfer one piece of data on the rising edge of each clock, whereas DDR memories transfer one piece of data on the rising edge of each clock and one piece of data on the falling edge of each clock (i.e. 2 data transfers occur per clock). As a result, at least in theory, DDR memories should be roughly twice as fast as SDR memories when run at the same clock frequency. The problem right now is that DDR memories are not widely available and are very expensive. We believe in talking with various memory vendors that the majority of the NV10-based boards available for this year will only use SDR memory. This is an issue that has not been widely discussed, but is very important. As the NV10 only has 128-bit physical data bits to memory (just like the TNT2), using SDR memory means that the NV10 only has the same amount of memory bandwidth, clock for clock, as the TNT2. This means, for fill-rate limited applications (see discussion below) that the GeForce will likely perform very similar to a TNT2 based board. And, the real interesting thing that may happen is that since the TNT2 chips were "binned" for very high frequencies (the Hercules TNT2 Ultra product runs at 183 MHz core, 200 MHz memory I believeâ€¦) and since the NV10 will not be "binned" that there may be some games which actually run faster on a fast TNT2 Ultra board than on a GeForce board (as there will be more memory bandwidth available on a high speed TNT2 Ultra board). This is going to be interesting to watch unfold. But the key message here is that it doesn't look like DDR memories are going to be available in any sort of volume in the near future, so look for most NV10 boards to use SDR memories, and as a result not exhibit substantially improved performance over TNT2 Ultra boards. And, while on the subject, recognize that a NV10 board which uses SDR memory really should be called a "128-bit" board (because it only transfers 128 bits of data per clock), not the "256-bit" claims that Nvidia would like everyone to believe for all NV10-based boards.

2. Fill-rate versus geometry acceleration. It is important to recognize that there are lots of factors which go into overall performance of a 3D accelerator, but 2 of the bigger factors of performance include geometry throughput (basically, how many triangles per second can the graphics subsystem sustain) and fill-rate throughput (basically, how many pixels or texels per second can the graphics subsystem store). In general terms, and based on today's fast CPUs and 3D accelerators, games are typically geometry limited at low resolutions (e.g. 640x480), and fill-rate limited at higher resolutions (e.g. 1024x768 and above). Also, to make matters worse, games which utilize 32-bit rendering at higher resolutions are even more fill-rate limited. Let's take a look at Quake3 frame rates running on a TNT2 board (Pentium3-600 Quake3 v1.08, q3testdemo1, vsync off):

           640x480 1024x768 1600x1200
16bpp 84.5        49.7          19
32bpp 56.4        29.8          11

Note that at 640x480, 16bpp color that the performance is the highest of the tests. This is because the fill-rate requirements are lowest for that resolution and color depth. Also note that the 640x480, 16bpp color test result is the maximum frame rate achievable on that given CPU with no additional geometry acceleration at any given resolution and pixel depth. Remembering that resolution and pixel depth do not increase CPU and/or geometry performance requirements, notice that as either the resolution increases or the color depth increases that performance drops. The performance drop is because the 3D accelerator becomes fill-rate limited. Another way to look at this is that even if we have an "infinitely fast" CPU or geometry processor that our overall frame rate will not increase at all if we are fill-rate limited. For example, the 1024x768 at 32bpp frame rate (29.8) will not increase substantially, if at all, if a faster CPU or hardware geometry were used instead of the 600 MHz CPU used for the testing. This is a very important point relative to the NV10 announcement, as a result of their modest fill-rate improvements, and also remembering the discussion above that the NV10 memory bandwidth and thus resulting fill-rate will be limited by the SDR-only memory availability, the NV10 may not show substantial real-world performance improvements for the resolutions and color depths that gamers want to play! Based on the data available, it suggests that the geometry acceleration capability in the NV10 really only buys performance for very low resolution (640x480) and low color depth (16bpp) configurations when using SDR memories. But clearly gamers are not going to "step backward" back into 640x480 and 16bpp rendering. Gamers want 1024x768 and high, 32bpp rendering. But, it appears as the first SDR-only NV10 boards will not substantially improve upon the performance of a TNT2 for those video settings.

T&L Investigated - Page 18

Page Navigation