1. T&L can have a negative impact on the render efficiency
of an accelerator mainly because most traditional renderers are optimized
to handle larger polygons. Now T&L introduces small triangles and
that makes the "external" memory access very inefficient... With today's
efficiency with large triangles hovering around 80% what will happen to
the efficiency if we introduce T&L? How can we avoid such a drop?
There is no doubt that as the triangle size decreases, the overall memory
bandwidth efficiency can also decrease. In fact, as a general rule, the
wider the memory interface the less efficient that interface will have
for small triangles. So, for something like the GeForce using DDR memories
(which gives you an effective 256-bit path to the memory), the memory
efficiency will be lower for small triangles. Hardware vendors have recently
attempted to address this issue in 2 ways: (1) pixel caches, and (2) tile-based/region-based
rendering architecture. The first way of improving memory efficiency,
pixel caches, basically includes a cache between the backend of the pixel
pipeline and the memory interface. This cache behaves much the same way
a CPU cache works, in that in theory a large number of small writes are
converted into a fewer number of larger writes. This will result in the
case for small triangles in increased memory efficiency. The second method,
resorting to tile-based or region-based rendering, is certainly more radical.
In a scheme like this, an entire scene full of triangle information is
stored in the driver or in the hardware, and then screen-space "chunks"
of the scene are then rendered. Since all rendering happens to memory
on-chip, the memory efficiency can be quite good. However, there are numerous
problems with tile-based architectures that probably deserves a whole
interview unto itself….Note that the recent BitBoys announcement of Glaze3D
is NOT a tile-based rendering architecture in the pure definition. What
they are talking about when they claimed "tile rendering" is a method
used to map X,Y spatial coordinates into physical memory pages. But using
a tiled memory mapping, greater memory efficiency can be utilized. We've
been using tiled memory mapping since the original Voodoo Graphics so
it's certainly nothing revolutionary. The only products which are a pure
"tile renderer" that we are aware of are the PowerVR chips.
2. Many people hope that T&L will remove the need to upgrade
to the newest and latest processor and motherboards from Intel. Now NVIDIA
mentions in its Fast Write paper a bandwidth of 90bytes per triangle between
the CPU and the 3D card. With 10 million polygons this turns into a 900Mb
stream which is way more than any memory structure can handle... Is NVIDIA
exaggerating? Are there was to lower this number? Maybe a local geometry
cache?
Sure, there are lots of ways to reduce this amount of bandwidth. The first
is, as you mention, a geometry cache (also referred to as a "vertex cache"
by some hardware vendors). In this scheme, the frontend of a 3D accelerator
will hold enough of the previous vertices (for example, vertex caches
in the range of 16-32 entries are now prevalent) such that when triangle
strips or meshes are uses the information in the cache can be used instead
of requiring that data to be re-transferred over the AGP or PCI bus. This
does, however, require good programming practices by the application itself
so that the vertex cache hit rate is sufficiently high.
A second technique used to reduce geometry traffic is to compress the
geometry data itself. I do not believe anyone is currently doing this,
but there are several geometry compression schemes which have been published
in the past which can dramatically lower bandwidth required for geometric
data. Likely both techniques will need to be used in order to really achieve
10+M tris/sec sustained in a real-world game (assuming of course the 3D
accelerator had enough geometry and fill-rate to actually render that
many tris/sec…).
3. Lightmaps and dual texturing allow great effects, T&L now
introduces new types of lights... will vertex lighting replace lightmaps?
Can you achieve the same effects and quality? Or should they exist next
to each other and be used depending on the situation?
It is possible, with very fine triangle tesselation, to have vertex lighting
replace dual-texturing when used for lightmaps. However, the tesselation
factor can be quite large in order to generate the same visual effect.
As an example, a wall composed of a single Quad utilizing dual texturing
for lighting can likely be reproduced using vertex lighting, but that
single wall would then likely have to be composed of maybe 10-20 triangles.
So, not using light maps can result in a geometric explosion if you're
not careful. So, I think that what you'll see is that developers continue
to use light maps and dual texturing capability for the "world" environment,
and spend their increased triangles on greater character complexity and
definition (you'll notice in almost all games today that light maps are
not used for the polygonal characters, but are used all over the place
for the "world" environments….).
Another thing to keep in mind here is that dual texturing is used for
many more things than just light maps. Detail texturing, for example,
is now being used quite commonly and cannot be simply done using vertex-based
lighting. So, I don't think there's any question that multi-texturing
and vertex lighting will certainly co-exist moving forward…