Milton’s question - NVIDIAs claim that fewer lights are better doesn't necessarily indicate that they mean "less than eight" or "more than eight but less than 12", for example.  So, which is it?  In real-world use, is there a situation where you can use up to eight light sources without a performance penalty, or is this just a marketing trick?  Under which circumstances does a developer get eight "free" hardware lights?

Again, this shows how difficult things are. Yes, you can have situations where using 8 lights is free, with free meaning: there will be no fps slowdown. If we take a look at the overview graph from NVIDIA then it shows that a bit less than 2 million vertices can be handled with full lighting calculations (8 local lights).  Now, imagine a game that uses only 1 million polygons per second (for example due to a CPU limit = very good AI) or is limited to 1 million polygons because your 3D accelerator doesn’t have sufficient fill-rate (there are many other possibilities for a limited throughput). If the game runs at only 1 million polygons per second, then the game will run just as fast with 1 light as with 8 lights (because the Lighting limit of 2 million is not reached at all). But, if a game runs at 6 million polygons per second then we will see a problem: 6 lights is more than the lighting hardware can handle. This lighting part of the hardware will thus become the bottleneck. So, your performance will be capped at below 2 million since that is the “peak” throughput of the hardware.  

Now, this introduces something interesting: a hardware limit is a “hard” limit. Hard limits are hard limits, you can’t move around them, you can’t do anything to make the hardware lighting go faster, except clock frequency. So, if you want more you’ll have to overclock. With software you can use tricks, you can take shortcuts to make things go faster, hardware does not allow shortcuts. A limit is a limit. Now this is why people talk about scaling. Hardware T&L cannot scale, it has a limit. CPUs and software don’t have this. Today’s engines scale with better CPUs, you can buy an Athlon and see an increased speed, you can buy a 700Mhz Pentium III and it will be faster than one at 500MHz. With Hardware T&L it doesn’t work, you might be able to tweak another 10Mhz, but you can’t go out and get a 500MHz GeForce - you also can’t use a hack since the hardware is hardwired. 

So, there are situations where lights can be for free, but there can also be cases where lights are not for free. Say you run the 3Dmark2000 T&L Throughput benchmark using a resolution of 2048x1536. This resolution will probably be fill-rate limited, so you’ll see little to no performance drop for 4 or 8 lights, relative to the one light situation. 

The same discussion is possible for many other things. Actually, one of my key-statements is that “nothing comes for free.” Everything you do costs something, usually it costs memory bandwidth and if that resource (be it bandwidth or lighting calculations or even memory) has run out, you’ll slow down. However, if that resource is still available you won’t slow down. Take “free” trilinear filtering. Trilinear can be free if you have bandwidth left when doing bilinear, and same is true for 32-bit color. It can be for free if you have a lot of bandwidth left even after doing 16-bit color. So everything “an be for free but it never is for free. The answer to the question “is it for free” should always be followed by, “under these specific conditions.” Without knowing the conditions you can’t say that an effect is for “free.” Marketing calls everything for free, but it almost never is. It’s probably true only under some special condition. 

Milton’s question - Doesn't NVIDIA claim that the GeForce GPU can handle up to 15 million triangles per second?  I thought I read that a CPU like the Pentium III at 500MHz could push only 3 million FPS as best.  So why is the CPU suddenly so much faster than the GPU?  Is it because we fell short of the limits of either (150,000 triangles)?  If so, has NVIDIA produced a product that is too far ahead of its time?

It all depends on how tricky your software engine is. I talked to a hardware developer (I won’t name who to avoid unnecessary biased comments) and they said that if you want you can generate 10 million polygons per second using a P3 set-up without T&L, it all depends on how smart you handle things. There are object formats and ways to describe a scene that are much faster than doing the general-purpose math that Direct3D hardware and software uses. So, I have no doubt that if you really wanted, you could come up with a benchmark that shows an Athlon or a P3 being faster than a GeForce. But the big thing of course is that this would not be general, it would be a one of those dreaded tech demos. So, 3 million or 5 million or whatever, it really depends on what you are doing and how you are doing it. Coders find tricks and hacks every day. The issue is this: adding extra lights in a software engine is much less problematic, you can do a lot more caching (CPUs have a huge cache compared to a 3D chip) and data re-use can be done much smarter; and this is exactly why the software engines results are so fast. NVIDIA's GeForce is limited by an unbalanced design. Their transform hardware is much faster than their lighting hardware. Essentially, the peak graph from them shows it: 15 million vertices can be transformed, but only 2-6 million can be lit depending on the situation. And this brings me to even another point: T&L can’t handle polygons it handles vertices. Polygons are confusing when you talk about T&L speed (we’ll have an article about this soon).  

Overall, the whole 3D acceleration thing has become very complex. Explaining why an Athlon is faster than a P3 is not an easy thing since there are very little people out there that actually know how it all works. The same is going to become true (or is already true) for 3D acceleration. These 3D chips are getting more and more complex every day and understanding why one is slower or faster than another is going to be very difficult. Even worse, we will get situations where A is faster than B while in another situation B is faster than A. Thus making comments and conclusions will not get easier and a lot of people will have to dive into tech papers to understand just what is going on. If we don’t make this effort, then we will be at the mercy of the companies. If NVIDIA says something…is it true? The same is true for 3dfx, S3, Matrox and all of the others.  Do you want press people to question and analyse it or do you want to find out that it isn’t true yourself when a game runs like a dog? Reviewing a 3D accelerator is serious business and it cannot be done in 2 days (although some websites don’t seem to agree with that… but the content of those article also shows it). We live in a complex world and deciding what is better or deciding what is true is very difficult. This is especially true when you have various version of the truth. I can say: 

GeForce has a Peak Throughput of 5 million polygons per-second 

But I can also say: 

GeForce has a Peak Throughput of 15 million polygons per-second 

Both are true and can be proven. It’s just how you measure and define it (how it works will be explained here soon). It’s confusing and a lot of people will drop out because it gets to complex. But there is a risk with that: if you don’t understand it they will tell you anything they want.