192xGT200 + 1068xBloomfield: 295 TFlops Supercomputer

Wednesday 23rd April 2008, 02:00:00 PM, written by Arun

French website PC INpact broke the news of an upcoming GPU-accelerated supercomputer, ordered by France's CEA for delivery in early 2009 from Bull. The cluster's performance confirms that GT200 will be rated at 1TFlop and that Nehalem/Bloomfield will clock up to at least 3GHz.

PC INpact claims that the overall machine will sport peak performance of 295 TFlops, with 103 TFlops coming from the CPU and 192TFlops coming from the GPUs. Le Monde further confirms the performance target and indicates delivery will take place in early 2009. This would make it one of the world's first (if not the first) large-scale GPU-accelerated supercomputer.

Sadly, PC INpact got the specifics wrong. There won't be a 8-core Nehalem in that timeframe, and neither could there be a 2TFlops single-chip GPU, especially given the qualification times in this market. Just like Conroe, Nehalem/Bloomfield will sport a 128-bit ADD and a 128-bit MUL per core; at 3GHz, that means 24 GFlops per core. Multiply that by the four cores per chip, then by 1068, and you get to ~103TFlops.

As for the GPUs, obviously the 1U Tesla module sports 4 GPUs, not 2. So assuming this supercomputer is indeed based on GT200 and the per-module config is similar, that gives us 1TFlop per chip/board. Alternatively, if the TDP was substantially higher than G80's (which seems unlikely given the level of binning possible in that market), it might be possible that the module only sports 3 GPUs; that would result in 1.33 TFlops per GPU. Once again, that is not the most likely scenario.

So, that gets us right back to where we were back in... May 2007, with Michael Hara's claim in an investor conference that their next-generation would be 'close to 1 TFlop'. Of course, we were all assuming that was referring to G92, and that obviously didn't turn out to be the case. We also assumed G92 would be the first chip supporting FP64; now, it has become clear however that's GT200 - which is also what makes the chip more attractive in such supercomputer deals.

In practice, this doesn't represent a lot of revenue for NVIDIA; it's very likely less than one million dollars for a company which is consistently delivering sales of more than one billion dollars per quarter lately. However, it does highlight Tesla's momentum in the GPGPU market. How soon, if ever, will that represent a substantial part of NVIDIA's profits? Nobody knows.


Discuss on the forums

Tagging

nvidia ± intel, gt200, bloomfield, nehalem, gpgpu, supercomputer


Latest Thread Comments (12 total)
Posted by Megadrive1988 on Thursday, 24-Apr-08 00:14:17 UTC
How do we compare this to G80? Is it like G80's estimated 1/2 TFLOP (Nvidia's programmable figure) or like its estimated more 'realistic' peak 1/3 TFLOP?Obviously one would hope GT200's 1 TFLOP of SP floating point performance is comparable to G80's 1/3 TFLOP, instead of the 1/2 TFLOP, thus making GT200 roughly 3x G80 instead of 2x.

Posted by CarstenS on Thursday, 24-Apr-08 16:01:26 UTC
Quoting Arun
Yeah, it's SP.
That'd be highly unusual in SC-space, wouldn't it?

Posted by Arun on Thursday, 24-Apr-08 16:34:50 UTC
Quoting Quasar
That'd be highly unusual in SC-space, wouldn't it?
SP for the quoted numbers; that doesn't mean the chips don't support DP. As for your comment in the other thread, G80 doesn't support DP obviously but for GT200 it's rumoured to be 1/4th the performance of SP, presumably for MULs. Maybe ADDs will be 1/2th, which would bring MADDs to 1/6th. Also, for MULs, if you consider the hidden MUL and assume it's properly exposed in GT200 and nothing else changed (both of which I doubt), then that'd mean MUL is 1/6th and ADD is 1/2th. But once again, 1/4th would still be the most quoted figure I suspect.

Posted by CarstenS on Friday, 25-Apr-08 08:22:07 UTC
Quoting Arun
SP for the quoted numbers; that doesn't mean the chips don't support DP.
I'm pretty sure, those TFLOPs mentioned are in fact DP ones - everything else would be totally unusual in the Supercomputing space. See also (http://www.top500.org/project/linpack)


Quoting Arun
As for your comment in the other thread, G80 doesn't support DP obviously but for GT200 it's rumoured to be 1/4th the performance of SP, presumably for MULs. Maybe ADDs will be 1/2th, which would bring MADDs to 1/6th. Also, for MULs, if you consider the hidden MUL and assume it's properly exposed in GT200 and nothing else changed (both of which I doubt), then that'd mean MUL is 1/6th and ADD is 1/2th. But once again, 1/4th would still be the most quoted figure I suspect.
I'm still not sure what to make of all this. AFAIK AMD did promote only the 64 "fat ALUs" to be 64-bit capable - thus having one fifth of the SP-FLOPS available.

As for GT200 I think everything from as low as 1/8th (as mentioned by Farhan (http://forum.beyond3d.com/showpost.php?p=1039031&postcount=18)) to as high as 50 percent (as described in this PDF (http://www.cs.virginia.edu/~ktd3q/docs/ssr_gpu_ije.pdf) (linked by Jawed some time ago) is possible.

If it's indeed one eigth, and I think that's definitely possible given Nvidias track record of making a feature available first with almost no focus on speed, the DP-figure of 192 TFLOPS would nicely break down to 1 SP-TFLOPS per single GPU.

edit:
As for G80 - that was bad thinking on my part.

Posted by aaronspink on Friday, 25-Apr-08 08:55:11 UTC
Quoting Quasar
I'm pretty sure, those TFLOPs mentioned are in fact DP ones - everything else would be totally unusual in the Supercomputing space. See also (http://www.top500.org/project/linpack)
You are either way too optimistic OR way too trusting of PR blurbs if you think they are going to get 1 DP TFLOPs per card with GT200.That or you think that 400 watt graphics cards are going to be mainstream.Aaron Spinkspeaking for myself inc.

Posted by randomhack on Friday, 25-Apr-08 09:11:27 UTC
Heres the bull press release :http://www.wcm.bull.com/internet/pr/rend.jsp?DocId=350329&lang=enThey mention 1068 8-core nodes. That probably means 2 quad-core CPUs/nodes and therefore that is perfectly capable of giving 103 DP TFlops,Then they also mention 48 GPU nodes with 512 cores each. GPUs will provide 192 TFlops. So they are expecting 192/(48*4)=1 TFlops per 128 cores. Now the question is whether these tflops are DP or SP. Very confusing :(

Posted by CarstenS on Friday, 25-Apr-08 09:14:04 UTC
Quoting aaronspink
You are either way too optimistic OR way too trusting of PR blurbs if you think they are going to get 1 DP TFLOPs per card with GT200.

That or you think that 400 watt graphics cards are going to be mainstream.

Aaron Spink
speaking for myself inc.
No, there are 48*512 GPUs. Together they give 192 DP-TFLOPs, that'd be 128 GFLOPS (edit: DP) per GPU.

Posted by randomhack on Friday, 25-Apr-08 09:26:32 UTC
Quoting Quasar
No, there are 48*512 GPUs. Together they give 192 DP-TFLOPs, that'd be 128 GFLOPS (edit: DP) per GPU.
:O
Check your math :)
Nobody puts over 24000 GPUs together either.

Posted by CarstenS on Sunday, 27-Apr-08 09:47:08 UTC
Quoting randomhack
:O
Check your math :)
Nobody puts over 24000 GPUs together either.
To the first one: Right *doh*
But 24k processors - regardless if C- or GPUs - doesn't sound all that much in SC-space apart from my apparent inability to correctly use the windows calc. ;)

Posted by CarstenS on Monday, 26-May-08 18:51:35 UTC
So, how about 4 TFLOPS per one of those 48 Units - being some kind of Tesla parts then? If there's really 512 'processors' in each of them, that'd be about 8 GFLOPS per 'processor' which i highly doubt in any way i can imagine right now.


Add your comment in the forums

Related nvidia News

So long, Chris, and thanks for all the fish
NVIDIA GF100 graphics architecture details
NVIDIA Fermi: new GPU architecture, starting with GF100
NVIDIA release OpenCL GPU drivers for Linux and Windows
NVIDIA GeForce GTX 275 at $250 to fight HD 4890
A look at NVIDIA's SLI Multi-OS and new Quadros
Ahead Nero gets CUDA support for video encoding
G92b renamed again, this time for notebooks
NVIDIA GeForce GTS 250 announced
New NVIDIA display driver for Windows 7 beta