The Rendering Modes
As with NVIDIA's SLI, ATI's CrossFire operates via a selection of different rendering modes, two of which will be fairly familiar from NVIDIA's SLI rendering modes. The first rendering mode is Alternate Frame Rendering (AFR) which has one board rendering one frame and the second board rendering the next. In this mode the slave and master graphics processor will send alternate frames to the compositing chip on the master board which will ready them for display in sequence. This rendering mode operates with one frame of latency as the image being rendered is one frame behind the frame the CPU is processing in order to correctly interleave the frames to gain performance. This mode is least effective where data (usually render to texture data) is rendered and then kept for a number of frames, commonly used in such effects as motion blur, as the data needs to be passed between each board each frame, slowing the potential rendering performance - in this case one of the other rendering modes is likely to be adopted. As both boards are dealing with entirely separate frames both geometry and fill-rate scales in this mode.
ATI are also using a rendering mode which they have dubbed "Scissor Mode", which bears similarities with NVIDIA's Split Frame Rendering mode. This mode has one board rendering a certain portion of the screen and the other rendering the rest. In this mode the each board effectively clips the screen at the crossover point so that geometry that has been transformed to screen-space and pixel fill-rate scales, however pre-transformed data isn't related to screen-space hence processing needs to be completed on board boards, ergo vertex shader processing will not scale.
ATI's scissor mode differs in two ways from NVIDIA's. First, although the split level can be adjusted such that each board renders a different quantities of the screen, dependant on how the load in the game is apportioned across the screen, ATI will not dynamically adjust the split on a per frame (or per multiples of frame) basis, instead having a default and per application split value that will stay at that level for the duration of the running a particular application - ATI believe that the CPU overhead in doing this calculation could reduce performance, especially since multiple rendering is going to be more CPU limited anyway. Also, ATI's scissor mode can either operate with a horizontal or vertical screen split.
Another mode of rendering over multiple boards ATI offer is the much talked of "Supertile" mode, and this mode is actually directly taken from the from the hardware within the R300/R420 line that enables them to operate in the likes of the Evans & Sutherland massively scalable RenderBeast image render-farms. As the name suggests the screen is divided into multiples of alternate tiles that are then distributed between each of the graphics boards in the rendering system (two in the case of current CrossFire implementations) and on completion of the frame both sets of tiles are sent to the composite engine in the master board which will correctly marry the two sets of tiles to generate a single image.


As the tile sizes are fairly small, 32x32 pixels large in fact, load balancing of the image is effectively handled implicitly as objects that are more difficult to render, or regions of the screen that have more overdraw, will inevitably span multiple tiles which automatically results in the load being distributed across the two board. Without the need for any calculated load balancing on a per-frame basis, and the fact that the tiling mechanism is achieved by the graphics chips, effective load balancing should be achieved without much on an impact in terms of CPU overhead.
Although this mode may suggest that texture cache coherency will be reduced, it is not actually any worse than is already implicitly built into the rending pipelines of the R300/R420 series graphics processors as the quad rendering pipelines on these series are already MIMD and operating on their own separate 16x16 pixel tile anyway, so the texture cache coherency for Supertiling is no worse than is already the case for ATI's 12 or 16 pipeline single boards as they are. Supertiling has the same scalability properties as Scissor mode, with the rendering power increasing post screen space transformation.
ATI's preferred rendering mode is the Supertiled mode and in DirectX this is enabled by default when all the CrossFire elements are put into place, such that all titles will operate and ATI believe a default gain from between 10% to 60% over one board will be visible, under certain rendering scenarios. Supertiling takes a little more effort to get to work under OpenGL and as the majority of games utilise DirectX ATI have put most of their efforts into DirectX for the time being; one of the other modes should be enabled by default for OpenGL titles. For games that ATI have tested and see that they gain more performance or are more compatible in one rendering mode over another, ATI will hardcode that rendering mode in the drivers and it will not be user selectable, at present - hopefully ATI will alter this in the future to allow end users to try their own settings through the profiling mechanism.
All of these modes, however, are more or less subject to the same multiple-rendering issues that we outlined in our SLI article here.