Battle of the Compilers

From all the topics discussed so far it should be clear that different hardware will have potentially very different optimal programs. And this is where compilers come in. AMD and Intel are both very keen to make sure that any program extensively used for benchmarking is compiled into an optimised executable for their CPU, if a benchmark program is not optimised then they will be shouting “foul” as hard as they can and declare the benchmark unfair. With vertex shaders things are a bit different than for full CPUs, vertex shader programs are delivered in a non-compiled form by the application to the driver, this non compiled form can be a high-level language or a low-level language, no matter which - the form the program is received in has to undergo compilation towards the internal format of the hardware. An overview:

Application
High-Level Language (Cg, HLSL, OGL2.0, ...)
(Pre-)Compiler (Cg, Microsoft DirectX)

Low-Level Language (Vertex Shader Assembly Language)
Driver
Compiler (to Native Hardware Format)
Hardware

Optional

The driver level compiler will determine how effective the hardware is utilised since this compiler not only translates the input to hardware format, this complier should also try to generate optimised instructions for the hardware format. This can mean re-ordering of instructions (remember latency), changing register usage (re-use registers, use different registers taking into account the hardware structure), recognise patterns matching functionality directly supported by the hardware (e.g. replace MUL and ADD by MAD, replace multiple MULs by a POW, etc), unroll constant based loops and branches (avoid wasting cycles on jumping around in instruction memory), inline function calls (insert the code multiple times to avoid costly jump and return), etc.

In the end this is where the real battle for performance will be won or lost. No matter how good the hardware is, if the driver level compiler delivers crap code then the hardware will perform crap. Mediocre hardware with a clever driver level compiler can quite easily beat excellent hardware with a poor driver level compiler. So if hardware, that looks good on paper, does not meet your expectations than there is still some hope… that a new driver release will result in massive gains.

Now obviously this driver level compiler is no excuse to write poor code, these driver level compilers can only do so much, so don’t expect them to turn poor code into perfect code… in the end you can only rely on your own hand coding to make sure that things really do run optimal.

Conclusion

At the end of all this I hope that its clear that there is no such thing as a “vertex shader”, there are all kinds of different implementations possible and all have their own advantages and disadvantages. Its important to realise the essence of a good driver level compiler and developer education in terms of hardware efficiency. I at least we can move away from thinking that 2 shaders are always better than 1 shader...


  • Please feel free to comment on this article here.

  • Join other forums discussions here.

  • Read Other Beyond3D articles here.

  • Beyond3D home.