Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 01-Oct-2012, 18:14   #1
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default Parallella: A Supercomputer For Everyone

Not strictly related to GPUs but most people working on GPGPU computing are in the HPC field so you may be interested.

There is a new project looking for money on Kickstart: http://www.kickstarter.com/projects/...r-for-everyone

With the the HPC scene pretty much dominated by GPUs (i.e. SIMD or SIMT or whatever they are called today), trying a pure MIMD route could be interesting. Porting existing applications will still not be a trivial task (i.e. Parallella looks a lot like the old Transputer) but it could be a lot easier than with SIMD.
Dade is offline   Reply With Quote
Old 02-Oct-2012, 01:24   #2
Grall
Invisible Member
 
Join Date: Apr 2002
Location: La-la land
Posts: 4,984
Default

They have a very wonky way of counting chip performance by adding up the clock speed of all the processor cores. That's just not proper, from any point of view.
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)."
-Phil Plait
Grall is offline   Reply With Quote
Old 02-Oct-2012, 03:23   #3
Brad Grenz
Senior Member
 
Join Date: Mar 2005
Location: Oregon
Posts: 1,692
Default

The performance per watt and price don't seem that impressive.
Brad Grenz is offline   Reply With Quote
Old 02-Oct-2012, 08:09   #4
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,346
Default

Its off-chip bandwidth seems to be just too low. Of course, such project is probably still good for people to experiment with MPI or OpenMP (i.e. not for real works, but can learn how to do proper parallel programming).
pcchen is offline   Reply With Quote
Old 02-Oct-2012, 09:55   #5
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,016
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by Grall View Post
They have a very wonky way of counting chip performance by adding up the clock speed of all the processor cores. That's just not proper, from any point of view.
Yeah, that was a forgivable marketing misstep in the early days of dual-core processors, when conveying the advantage of a dual-core was deemed difficult, but now it just makes them look amateurish and really hurts their credibility.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is online now   Reply With Quote
Old 02-Oct-2012, 10:05   #6
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
Default

Quote:
Originally Posted by Dade View Post
(i.e. Parallella looks a lot like the old Transputer)
oooh. Deja vu indeed.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 02-Oct-2012, 15:06   #7
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,816
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by Grall View Post
They have a very wonky way of counting chip performance by adding up the clock speed of all the processor cores. That's just not proper, from any point of view.
That reminds me of the times, when Intel competitors were slapping the silly "P-rating" numbers all over the place.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 02-Oct-2012, 15:47   #8
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

If you think their MHz added numbers you should look at the performance "comparison" video they made:

http://www.youtube.com/watch?feature...&v=4sMWbaV1sRQ

I think this could be the worst benchmark test I've ever seen.

As for the actual product.. can anyone think of any applications where pure MIMD is a win, but only with 32KB of fast memory per core, and an emphasis on single-precision floating point arithmetic? It would be interesting to learn more about their core uarch, but unfortunately you're told to contact their sales division if you want their hardware manual (way to put your first foot forward in embracing that open hardware philosophy you go on about on your KS page guys). They do say somewhere that it's superscalar; I wonder if this is a simple VLIW. You'd have to think that the alleged power budget allocated for these cores wouldn't allow for much, if anything in the way of branch prediction, so I doubt they'd be any good at general purpose code even if it did exhibit extreme and explicit memory locality.
Exophase is offline   Reply With Quote
Old 02-Oct-2012, 16:32   #9
McHuj
Member
 
Join Date: Jul 2005
Location: Austin, Tx
Posts: 408
Default

I think you guys would be surprised how often companies add up MHz. I've seen it at least in two different companies now. They're both in the signal processing fields so it maybe specific to that industry,

From my experience, these types of processors are a good match for prototyping work of signal processing tasks. For example in a wireless receiver you may have a very simple processing chain look like: resampler->FFT->Channel Estimate->Equalizer->Symbol Demapper->Decoder. In theory, you can parallelize those components and map the whole chain onto the processor. This approach is basically FPGA like, but written in C (year right, it's all ASM) instead of RTL. If you can actually run your tasks in real time, you can actual test your performance before taping out an ASIC. Many companies do this by running the ASIC RTL on FPGA. For a final product I don't think this makes sense.

However, if a task is highly parallel and requires a lot of processors, you are better off using one processor with a wide SIMD vector since it will be much more power efficient. Chances are that a task spread out over a whole processor like this will be bottlenecked by the interprocessor communications network. If that communication isn't deterministic, well, good luck both debugging and designing it in way that it works efficiently without overhead.

If you're tasks are serial, chances are they will be the bottlenecked by the single thread performance of each core. It's doubtful these types of cores are powerful at all.

Without a revolutionary BW/Shared memory innovation, I don't think these things will ever work. Transputer right.
McHuj is offline   Reply With Quote
Old 02-Oct-2012, 22:05   #10
Vitaly Vidmirov
Member
 
Join Date: Jul 2007
Location: Russia
Posts: 96
Default

Quote:
Originally Posted by Exophase View Post
It would be interesting to learn more about their core uarch
So far the most detailed description is here:
http://www.adapteva.com/wp-content/u...apteva_mpr.pdf
Vitaly Vidmirov is offline   Reply With Quote
Old 03-Oct-2012, 10:39   #11
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by Vitaly Vidmirov View Post
So far the most detailed description is here:
http://www.adapteva.com/wp-content/u...apteva_mpr.pdf
Thanks, the paragraph "Silicon on a Shoestring Budget" is interesting a funny to read, I had no idea it was possible to develop a CPU with such a little amount of resources.
Dade is offline   Reply With Quote
Old 03-Oct-2012, 18:53   #12
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by Vitaly Vidmirov View Post
So far the most detailed description is here:
http://www.adapteva.com/wp-content/u...apteva_mpr.pdf
Thanks, that was pretty thorough. Like I suspected, branch prediction is minimal. I wonder if the pipeline is has interlocks at all, or if everything produces delay slots. I also wonder what the efficiency of this is like vs a multicore VLIW FP DSP like C67x. I guess the big power and area savings comes in the lack of cache and the way the memory is laid out. They're actually specifically positioning this for DSP tasks - surely explicit parallelism of some form (be it SIMD or VLIW) is a win there? Does having a bunch of fully independent scalar cores really help you?

Actually, what I really wonder is how it compares with Ziilabs stemcell array SoCs, because on paper it looks pretty similar, although it appears to have a somewhat different core interconnect topology.
Exophase is offline   Reply With Quote
Old 07-Oct-2012, 18:49   #13
codedivine
Member
 
Join Date: Jan 2009
Posts: 215
Default

Epiphany architecture reference manuals released.
http://www.kickstarter.com/projects/...everyone/posts

I have backed the project. I am actually pretty excited about this.
codedivine is offline   Reply With Quote
Old 08-Oct-2012, 02:27   #14
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,019
Default

Why are you excited about it?
3dcgi is online now   Reply With Quote
Old 08-Oct-2012, 09:07   #15
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by 3dcgi View Post
Why are you excited about it?
Because:

- it is a MIMD (and not a SIMD like all GPUs). It can solve some kind of problem a lot better than GPUs even with a lower peak performance.

- it has an extremely good GFLOPS/Watt ratio.

- the dies size is 2mm^2 (the guessed die size of a AMD 7970 is 354mm^2). There is a lot of room for improvements.

- it usable in many embedded applications where GPUs aren't very practical because of their size/cooling/power requirements. Good luck installing a Fermi GPU in a small drone.

- they want to open source all the software.

- you can buy one at 99$ and you get a complete PC (ethernet, usb, hdmi, etc.) capable to run Ubuntu and where you can test OpenCL, etc. applications. It has a great educational value.
Dade is offline   Reply With Quote
Old 08-Oct-2012, 09:47   #16
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
Default

Quote:
Originally Posted by Dade View Post
Because:

- it is a MIMD (and not a SIMD like all GPUs)
Some GPUs are MIMD.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 08-Oct-2012, 21:07   #17
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by Simon F View Post
Some GPUs are MIMD.
I'm not aware of any GPU recently manufactured by AMD/NVIDIA/Intel and base on a MIMD architecture. Is there something available for mobile platforms ?
Dade is offline   Reply With Quote
Old 08-Oct-2012, 23:40   #18
Davros
Darlek ******
 
Join Date: Jun 2004
Posts: 9,486
Default

If this is a supercomputer why are they showing benchmarks comparing it to a cortex A9 ?
__________________
Guardian of the Most holy Two Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 08-Oct-2012, 23:53   #19
Gerry
Member
 
Join Date: Feb 2002
Posts: 586
Default

Quote:
Originally Posted by Dade View Post
I'm not aware of any GPU recently manufactured by AMD/NVIDIA/Intel and base on a MIMD architecture. Is there something available for mobile platforms ?
Who else makes GPUs other than those companies? Someone Simon could be closely associated with?
Gerry is offline   Reply With Quote
Old 09-Oct-2012, 00:34   #20
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,016
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by Dade View Post
I'm not aware of any GPU recently manufactured by AMD/NVIDIA/Intel and base on a MIMD architecture. Is there something available for mobile platforms ?
Even a GCN CU is MIMD:



And Tahiti has 32 CUs running simultaneously. It just happens to be SIMD on top of it.

It's MIMerD: Multiple Instructions, Multipler Data
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is online now   Reply With Quote
Old 09-Oct-2012, 02:23   #21
McHuj
Member
 
Join Date: Jul 2005
Location: Austin, Tx
Posts: 408
Default

I do like they're way of addressing data. My previous experience with stuff like this required MPI and it was a nightmare to manage the data. Data partitioning and movement had the potential to be as hard, if not harder, to handle efficiently in software than the partitioning of the operations.

I do wonder how efficient the system is when all processors are loaded and you're hitting memory from all over the chip.
McHuj is offline   Reply With Quote
Old 20-Oct-2012, 18:14   #22
CouldntResist
Member
 
Join Date: Aug 2004
Posts: 244
Default

Quote:
Originally Posted by Simon F View Post
Some GPUs are MIMD.
In those MIMD GPUs, the "I" stands for vector instruction, and the "D" stands for vector data. So it's not the same MIMD I was taught in school.
CouldntResist is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:38.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.