12. I was looking at NVIDIA's tree demo and I maxed everything
out. Switching it to wire-frame mode, I saw massive amounts of overdraw,
something I expected but never really realized. How will the issue be addressed
as more and more smaller polygons are used with T&L and more overdraw
occurs?
Well, overdraw is otherwise known as "depth complexity." The only way to
be really able to handle more depth complexity is to increase fill-rates,
an issue I'll discuss below…
Other Issues:
1. The SDR/DDR memory issue. There are now 2 types of synchronous memories
being developed, single data rate (SDR) memories (which is what is used
widely on today's 3D accelerators) and double data rate (DDR) memories (which
are just now being developed and are not used currently). The SDR memories
transfer one piece of data on the rising edge of each clock, whereas DDR
memories transfer one piece of data on the rising edge of each clock and
one piece of data on the falling edge of each clock (i.e. 2 data transfers
occur per clock). As a result, at least in theory, DDR memories should be
roughly twice as fast as SDR memories when run at the same clock frequency.
The problem right now is that DDR memories are not widely available and
are very expensive. We believe in talking with various memory vendors that
the majority of the NV10-based boards available for this year will only
use SDR memory. This is an issue that has not been widely discussed, but
is very important. As the NV10 only has 128-bit physical data bits to memory
(just like the TNT2), using SDR memory means that the NV10 only has the
same amount of memory bandwidth, clock for clock, as the TNT2. This means,
for fill-rate limited applications (see discussion below) that the GeForce
will likely perform very similar to a TNT2 based board. And, the real interesting
thing that may happen is that since the TNT2 chips were "binned" for very
high frequencies (the Hercules TNT2 Ultra product runs at 183 MHz core,
200 MHz memory I believe…) and since the NV10 will not be "binned" that
there may be some games which actually run faster on a fast TNT2 Ultra board
than on a GeForce board (as there will be more memory bandwidth available
on a high speed TNT2 Ultra board). This is going to be interesting to watch
unfold. But the key message here is that it doesn't look like DDR memories
are going to be available in any sort of volume in the near future, so look
for most NV10 boards to use SDR memories, and as a result not exhibit substantially
improved performance over TNT2 Ultra boards. And, while on the subject,
recognize that a NV10 board which uses SDR memory really should be called
a "128-bit" board (because it only transfers 128 bits of data per clock),
not the "256-bit" claims that Nvidia would like everyone to believe for
all NV10-based boards.
2. Fill-rate versus geometry acceleration. It is important to recognize
that there are lots of factors which go into overall performance of a 3D
accelerator, but 2 of the bigger factors of performance include geometry
throughput (basically, how many triangles per second can the graphics subsystem
sustain) and fill-rate throughput (basically, how many pixels or texels
per second can the graphics subsystem store). In general terms, and based
on today's fast CPUs and 3D accelerators, games are typically geometry limited
at low resolutions (e.g. 640x480), and fill-rate limited at higher resolutions
(e.g. 1024x768 and above). Also, to make matters worse, games which utilize
32-bit rendering at higher resolutions are even more fill-rate limited.
Let's take a look at Quake3 frame rates running on a TNT2 board (Pentium3-600
Quake3 v1.08, q3testdemo1, vsync off):
640x480 1024x768
1600x1200
16bpp 84.5 49.7
19
32bpp 56.4 29.8
11
Note that at 640x480, 16bpp color that the performance is the highest of
the tests. This is because the fill-rate requirements are lowest for that
resolution and color depth. Also note that the 640x480, 16bpp color test
result is the maximum frame rate achievable on that given CPU with no additional
geometry acceleration at any given resolution and pixel depth. Remembering
that resolution and pixel depth do not increase CPU and/or geometry performance
requirements, notice that as either the resolution increases or the color
depth increases that performance drops. The performance drop is because
the 3D accelerator becomes fill-rate limited. Another way to look at this
is that even if we have an "infinitely fast" CPU or geometry processor that
our overall frame rate will not increase at all if we are fill-rate limited.
For example, the 1024x768 at 32bpp frame rate (29.8) will not increase substantially,
if at all, if a faster CPU or hardware geometry were used instead of the
600 MHz CPU used for the testing. This is a very important point relative
to the NV10 announcement, as a result of their modest fill-rate improvements,
and also remembering the discussion above that the NV10 memory bandwidth
and thus resulting fill-rate will be limited by the SDR-only memory availability,
the NV10 may not show substantial real-world performance improvements for
the resolutions and color depths that gamers want to play! Based on the
data available, it suggests that the geometry acceleration capability in
the NV10 really only buys performance for very low resolution (640x480)
and low color depth (16bpp) configurations when using SDR memories. But
clearly gamers are not going to "step backward" back into 640x480 and 16bpp
rendering. Gamers want 1024x768 and high, 32bpp rendering. But, it appears
as the first SDR-only NV10 boards will not substantially improve upon the
performance of a TNT2 for those video settings.