AMD R600 Overview

Click for a bigger version

We use an image display overlay for image enlargement at Beyond3D (if your browser supports it), but it might be worth having the enlargement open in a new brower window or a new tab, so you can refer to it as the article goes by. The image represents a (fairly heavily in places) simplified overview of R600. Datapaths aren't complete by any means, but they serve to show the usual flow of data from the front of the chip to its back end when final pixels are output to the framebuffer and drawn on your screen. Hopefully all major processing blocks and on-chip memories are shown.

R600 is a unified, fully-threaded, self load-balancing shading architecture, that complies with and exceeds the specification for DirectX Shader Model 4.0. The major design goals of the chip are high ALU throughput and maximum latency hiding, achieved via the shader core, the threading model and distributed memory accesses via the chip's memory controller. A quick glance at the architecture, comparing it to their previous generation flagship most of all, shows that the emphasis is on the shader core and maximising available memory bandwidth for 3D rendering (and non-3D applications too).

The shader core features ALUs that are single precision and IEEE754 compliant in terms of rounding and precision for all math ops, with integer processing ability combined. Not all R600 SPUs are created equal, with a 5th more able ALU per SPU group that handles special function and some extra integer processing ops. R600, and the other GPUs in the same architecture family, also sports a programmable tesselation unit, very similar to the one found in the Xbox 360. While DirectX doesn't support it in any of its render stages, it's nonetheless programmable using that API with minimal extra code. The timeframe for that happening is unclear, though.

That's the basics of the processor, so we'll start to look at the details, starting with the front end of the chip where the action starts.