1. T&L can have a negative impact on the render efficiency of an accelerator mainly because most traditional renderers are optimized to handle larger polygons. Now T&L introduces small triangles and that makes the "external" memory access very inefficient... With today's efficiency with large triangles hovering around 80% what will happen to the efficiency if we introduce T&L? How can we avoid such a drop? 

There is no doubt that as the triangle size decreases, the overall memory bandwidth efficiency can also decrease. In fact, as a general rule, the wider the memory interface the less efficient that interface will have for small triangles. So, for something like the GeForce using DDR memories (which gives you an effective 256-bit path to the memory), the memory efficiency will be lower for small triangles. Hardware vendors have recently attempted to address this issue in 2 ways: (1) pixel caches, and (2) tile-based/region-based rendering architecture. The first way of improving memory efficiency, pixel caches, basically includes a cache between the backend of the pixel pipeline and the memory interface. This cache behaves much the same way a CPU cache works, in that in theory a large number of small writes are converted into a fewer number of larger writes. This will result in the case for small triangles in increased memory efficiency. The second method, resorting to tile-based or region-based rendering, is certainly more radical. In a scheme like this, an entire scene full of triangle information is stored in the driver or in the hardware, and then screen-space "chunks" of the scene are then rendered. Since all rendering happens to memory on-chip, the memory efficiency can be quite good. However, there are numerous problems with tile-based architectures that probably deserves a whole interview unto itself….Note that the recent BitBoys announcement of Glaze3D is NOT a tile-based rendering architecture in the pure definition. What they are talking about when they claimed "tile rendering" is a method used to map X,Y spatial coordinates into physical memory pages. But using a tiled memory mapping, greater memory efficiency can be utilized. We've been using tiled memory mapping since the original Voodoo Graphics so it's certainly nothing revolutionary. The only products which are a pure "tile renderer" that we are aware of are the PowerVR chips.

2. Many people hope that T&L will remove the need to upgrade to the newest and latest processor and motherboards from Intel. Now NVIDIA mentions in its Fast Write paper a bandwidth of 90bytes per triangle between the CPU and the 3D card. With 10 million polygons this turns into a 900Mb stream which is way more than any memory structure can handle... Is NVIDIA exaggerating? Are there was to lower this number? Maybe a local geometry cache?

Sure, there are lots of ways to reduce this amount of bandwidth. The first is, as you mention, a geometry cache (also referred to as a "vertex cache" by some hardware vendors). In this scheme, the frontend of a 3D accelerator will hold enough of the previous vertices (for example, vertex caches in the range of 16-32 entries are now prevalent) such that when triangle strips or meshes are uses the information in the cache can be used instead of requiring that data to be re-transferred over the AGP or PCI bus. This does, however, require good programming practices by the application itself so that the vertex cache hit rate is sufficiently high.

A second technique used to reduce geometry traffic is to compress the geometry data itself. I do not believe anyone is currently doing this, but there are several geometry compression schemes which have been published in the past which can dramatically lower bandwidth required for geometric data. Likely both techniques will need to be used in order to really achieve 10+M tris/sec sustained in a real-world game (assuming of course the 3D accelerator had enough geometry and fill-rate to actually render that many tris/sec…).

3. Lightmaps and dual texturing allow great effects, T&L now introduces new types of lights... will vertex lighting replace lightmaps? Can you achieve the same effects and quality? Or should they exist next to each other and be used depending on the situation?

It is possible, with very fine triangle tesselation, to have vertex lighting replace dual-texturing when used for lightmaps. However, the tesselation factor can be quite large in order to generate the same visual effect. As an example, a wall composed of a single Quad utilizing dual texturing for lighting can likely be reproduced using vertex lighting, but that single wall would then likely have to be composed of maybe 10-20 triangles. So, not using light maps can result in a geometric explosion if you're not careful. So, I think that what you'll see is that developers continue to use light maps and dual texturing capability for the "world" environment, and spend their increased triangles on greater character complexity and definition (you'll notice in almost all games today that light maps are not used for the polygonal characters, but are used all over the place for the "world" environments….).

Another thing to keep in mind here is that dual texturing is used for many more things than just light maps. Detail texturing, for example, is now being used quite commonly and cannot be simply done using vertex-based lighting. So, I don't think there's any question that multi-texturing and vertex lighting will certainly co-exist moving forward…