Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 16-Sep-2012, 17:12   #101
entity279
Member
 
Join Date: May 2008
Location: Romania
Posts: 339
Send a message via Yahoo to entity279
Default

Quote:
Originally Posted by gongo View Post
Could we be waiting for Haswell-E parts with the 8 cores sku..
Yes, we will be waiting for those.

Personally at least, I wouldn't want to waste my nice after-market cooler for a 77W processor.
entity279 is offline   Reply With Quote
Old 16-Sep-2012, 17:42   #102
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,125
Default

you will have to wait a long time for Haswell-E.
Haswell will coexist with Ivybridge-E and we don't know anything about a 2011 socket successor or any future CPU on that socket. Haswell might be skipped, Ivybridge-E will be the high end till 2014 then maybe Intel moves to ddr4 for its high end and servers, where ddr4 may be useful in getting truckload amounts of memory.

I've read that memory chips use a ridiculously large amount of power in datacenters, by the way. imagine racks upon racks of PCs loaded with 256GB ram, 10Gb networking, loads of VMs, sprawling databases etc.
so, on the consumer side, Intel releases a stop-gap socket, the 1150, which still supports ddr3. but servers (and 2011 is a server socket, in addition to high end desktop) need ddr4 sooner and Intel might not bother with new ddr3 sockets.

Last edited by Blazkowicz; 16-Sep-2012 at 17:48.
Blazkowicz is offline   Reply With Quote
Old 16-Sep-2012, 21:07   #103
green.pixel
Senior Member
 
Join Date: Dec 2008
Location: Europe
Posts: 1,230
Default

Quote:
Originally Posted by gongo View Post
Could we be waiting for Haswell-E parts with the 8 cores sku..i dont understand why Intel dont want to go higher on the 95W Haswell quad desktop sku..
It will take quite a while before you see any significant use of an 8C CPU for gaming.
__________________
"A Revolutionary Age is an age of action; the present age is an age of advertisement, or an age of publicity: nothing ever happens, but there is immediate publicity everywhere." - Søren Kierkegaard
http://culture.vg/ | http://storyofstuff.com/ | http://www.chomsky.info/ | http://www.artrenewal.org/
green.pixel is offline   Reply With Quote
Old 16-Sep-2012, 21:28   #104
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,125
Default

Intel doesn't want to sell you an 8 core CPU with a H61 chipset or its next gen equivalent while they could milk you for a X79 instead.
Blazkowicz is offline   Reply With Quote
Old 27-Sep-2012, 22:22   #105
shiznit
Member
 
Join Date: Nov 2007
Location: NoVA
Posts: 213
Default

Found the links to the webcasts:

http://intelstudios.edgesuite.net/id...S001/index.htm

http://intelstudios.edgesuite.net/id...S001/index.htm
shiznit is offline   Reply With Quote
Old 05-Oct-2012, 13:05   #106
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by Exophase View Post
I don't think I understand something: are the two of you arguing in favor of the mask loop approach or are you trying to say you think that's what Haswell is actually using? Because that is definitely not how gather is specified in the AVX2 documentation. For better or worse, this isn't what Intel is doing.

And do you also not think adding a branch on vector mask instruction is a pretty big shift?
Mea culpa. Assumption is the mother of all fuck ups, - and all that. I assumed Haswell used an implementation similar to Knights Corner.

It does indeed looks like Haswell can interrupt in the middle of its gather instruction with the result of a partially completed gather stored in registers (data+mask). Which then begs the question how they've done that.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 05-Oct-2012, 17:28   #107
Raqia
Member
 
Join Date: Oct 2003
Posts: 320
Default

Nice exposé:

http://www.anandtech.com/show/6355/i...l-architecture
Raqia is offline   Reply With Quote
Old 05-Oct-2012, 17:50   #108
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,125
Default

Yes Knight Corner is like Intel throwing the usual x86 SIMD extensions out the window, and branching to do something different. I don't know enough to tell what was thrown out (e.g. does it support SSE2 or not, even x87 etc., on non-FP stuff does it even meet i686)
Blazkowicz is offline   Reply With Quote
Old 05-Oct-2012, 18:05   #109
liolio
Ohio frog
 
Join Date: Jun 2005
Location: Ohio, USA
Posts: 4,172
Default

Quote:
Originally Posted by Raqia View Post
Well they might have cut the bullshit about Apple and comparing the A6 to Intel products especially in an article speaking of Haswel, which ARM based CPU may never ever touch the performances.
As for mobile, let wait for the pain next year with OoO Atom shipping on 22nm process, I would bet it is (finally) going to hurt).
EDIT
+1 to my self the focus on Apple is really on the verge of F-Boyism on that one, crazy.
Comparing ultrabook and laptop to tablet... that is non sensical and short sighted.
Next year Intel will have awesome and I would bet unmatched products to power Windows 8 (which in turns will have matured) to power "serious" tablets ( I mean akin the high end MSFT so a viable substitute to laptops), thanks to the new atom dual and quad core configuration supporting 4GB of ram and more and so on.

Last edited by liolio; 05-Oct-2012 at 18:31.
liolio is offline   Reply With Quote
Old 08-Oct-2012, 10:17   #110
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by liolio View Post
Well they might have cut the bullshit about Apple and comparing the A6 to Intel products especially in an article speaking of Haswel, which ARM based CPU may never ever touch the performances.
A
EDIT
+1 to my self the focus on Apple is really on the verge of F-Boyism on that one, crazy.
Comparing ultrabook and laptop to tablet...
The A6 CPU is the first internally developed ARM micro architecture from Apple, and quite an ambitious one. It appears to be faster on a clock normalized basis than Cortex A-15 and it appears to have a much better memory subsystem than any other mobile SOC CPU.

Apple acquired significant CPU design know-how when they bought PA Semi and Intrinsity. Some of these guys know how to build high power processors.

Can A6 compete against Haswell in desktops and high power laptops ? Hell no. But in a Mac Book Air form factor it might just be competitive with Intel's offerings. Intel is threatened by this; Notice how much of the Haswell material is about power savings rather than outright compute performance.

I can't wait to see the next iPad with an A6 in a larger power envelope.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 08-Oct-2012, 11:35   #111
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,816
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by Gubbi View Post
Can A6 compete against Haswell in desktops and high power laptops ? Hell no. But in a Mac Book Air form factor it might just be competitive with Intel's offerings. Intel is threatened by this; Notice how much of the Haswell material is about power savings rather than outright compute performance.
Definitely. Intel actually decided to sacrifice some L3 access latency in Haswell, by separating its clock domain from the CPU cores with the single goal to shave off few watts of power.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 08-Oct-2012, 23:08   #112
ninelven
PM
 
Join Date: Dec 2002
Posts: 1,370
Default

Quote:
Originally Posted by Gubbi
It appears to be faster on a clock normalized basis than Cortex A-15
Are there any A-15s out yet to compare it against?
__________________
//
ninelven is offline   Reply With Quote
Old 09-Oct-2012, 01:26   #113
tunafish
Member
 
Join Date: Aug 2011
Posts: 366
Default

Quote:
Originally Posted by Gubbi View Post
Apple acquired significant CPU design know-how when they bought PA Semi and Intrinsity. Some of these guys know how to build high power processors
It's interesting to note that there are more chip designers who worked on K8 working at Apple than there are working at AMD. PA Semi was one of the favourite destinations for chip designers when they fled AMD.
tunafish is offline   Reply With Quote
Old 09-Oct-2012, 05:13   #114
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by ninelven View Post
Are there any A-15s out yet to compare it against?
Krait seems to be a good proxy for A15.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 09-Oct-2012, 05:28   #115
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by rpg.314 View Post
Krait seems to be a good proxy for A15.
Krait has nothing to do with Cortex-A15.
Exophase is offline   Reply With Quote
Old 09-Oct-2012, 05:41   #116
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Exophase View Post
Krait has nothing to do with Cortex-A15.
Which is why I used the word proxy. I expect Krait and A15 will end up rather close.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 09-Oct-2012, 06:52   #117
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by rpg.314 View Post
Which is why I used the word proxy. I expect Krait and A15 will end up rather close.
Based on what exactly? Why would you expect Krait to represent Cortex-A15 any better than A6? If anything the one released closer in time would be more likely to be representative wouldn't it?
Exophase is offline   Reply With Quote
Old 09-Oct-2012, 08:27   #118
tunafish
Member
 
Join Date: Aug 2011
Posts: 366
Default

A15 and Krait both have 3-wide decode and 128-bit FPU, but that's pretty much where the similarities end. Krait has a shorter pipeline and a low-latency, really weird cache subsystem. In comparison, A15 will have higher latencies and higher clocks. Whether the power consumption will blow up when it's taken to those clocks is a whole another (and as of yet unknown) issue.

I don't think that Krait is in any way a good proxy for A15 performance. In fact, I simply think that there isn't enough published data on A15 to make any sort of informed judgement yet.
tunafish is offline   Reply With Quote
Old 09-Oct-2012, 10:11   #119
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by ninelven View Post
Are there any A-15s out yet to compare it against?
No exact science was used in my estimate, I was going by the claimed 40% IPC improvement of A15 vs A9.

The interesting point is of course how power consumption compares with Krait and A15 at a given performance level.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 09-Oct-2012, 10:44   #120
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by tunafish View Post
I don't think that Krait is in any way a good proxy for A15 performance. In fact, I simply think that there isn't enough published data on A15 to make any sort of informed judgement yet.
We know A15 is a 3-wide superscalar OOO with a 40+ entry ROB, and we know NEON instructions are now tracked by the ROB (rename tables for both ARM and NEON registers), it also has a wider memory subsystem. It is going to be a fair bit faster on normal integer/fp code and a lot faster on NEON code.

Wiki says Krait is OOO, but I can't find that claim anywhere in any of the Qualcomm PR material.

The only immidiate difference seems to be the length of the pipeline, 11 stages for Krait and 15 stages for A15. The long pipeline indicate a higher operating frequency target, and together with all the virtualization support ARM has added, it seems to me A15 is really targetted more at low power servers than mobile SOCs.

I expect A15 to be faster than Krait, but burn more power being so.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 09-Oct-2012, 19:04   #121
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by tunafish View Post
A15 and Krait both have 3-wide decode and 128-bit FPU, but that's pretty much where the similarities end. Krait has a shorter pipeline and a low-latency, really weird cache subsystem. In comparison, A15 will have higher latencies and higher clocks. Whether the power consumption will blow up when it's taken to those clocks is a whole another (and as of yet unknown) issue.

I don't think that Krait is in any way a good proxy for A15 performance. In fact, I simply think that there isn't enough published data on A15 to make any sort of informed judgement yet.
There's a ton of published information on Cortex-A15, it's Krait that we know close to nothing about. Your information, taken from AnandTech, pretty much sums it up, where "short pipeline" and "low latency" are incredibly vague descriptions. In actuality we don't know what the fetch bandwidth is, we don't know what its integer execution unit resources are, we don't know if it can support simultaneous loads and stores, we don't know how deep its reordering capabilities are, we don't know what its branch prediction is like.. these are all things we have pretty good descriptions of for Cortex-A15. Sure it may be established that they both have 128-bit NEON units, but what's the latency like - are you going to take at face value that its SIMD is lower latency just because Anand says it has a smaller pipeline? There's way too much missing information, and there's definitely a lot of room where Cortex-A15 could outperform Krait, and unless ARM's estimations of how it'll perform vs A9 are totally unrealistic it will outperform Krait.

Note that a lot of Cortex-A15's long pipeline is in a frontend that can be partially bypassed if code is running from the loop buffer.

Quote:
Originally Posted by Gubbi
No exact science was used in my estimate, I was going by the claimed 40% IPC improvement of A15 vs A9.
That 40% number only applies to Dhrystone. ARM gave numbers of 50% improvement on integer code and 100% improvement on FP (presumably SIMD, maybe also including integer SIMD?) and memory bound stuff. They also cited that they had an internal goal to improve typical IPC by 50% over Cortex-A9, which they feel they've met.

You have to consider that, aside from the benchmark being abject garbage, there's just less room to grow with Dhrystone. It all fits in L1 cache, uses pretty predictable branches, and spends a lot of time in library functions that can be hand optimized. So Cortex-A15's strengths aren't going to benefit it as much as it'll benefit real programs.
Exophase is offline   Reply With Quote
Old 10-Oct-2012, 03:46   #122
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Exophase View Post
Based on what exactly? Why would you expect Krait to represent Cortex-A15 any better than A6? If anything the one released closer in time would be more likely to be representative wouldn't it?
Because Scorpion core came out about a year before A9 and it wasn't a bad proxy.

By proxy I mean <20% difference. YMMV with this metric.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 10-Oct-2012, 04:37   #123
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

20% difference is good for a proxy? Seriously? Do you even have anything really showing Scorpion to A9 being a typical < 20% at same clock speed?

Scorpion is much closer to A8 than A9, making the latter comparison over the former seems totally disingenuous :/
Exophase is offline   Reply With Quote
Old 10-Oct-2012, 04:54   #124
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Exophase View Post
20% difference is good for a proxy? Seriously?
Good enough for me. It's a pretty bad metric if you are doing an in depth comparison, no doubt about that. But in terms of the user experience with actual apps, I think this much difference is not perceptible.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 10-Oct-2012, 10:07   #125
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by Exophase View Post
That 40% number only applies to Dhrystone. ARM gave numbers of 50% improvement on integer code and 100% improvement on FP (presumably SIMD, maybe also including integer SIMD?)
IMO, the 40% IPC increase in Dhrystone is the upper limit we will see for IPC improvements. Memory latency doesn't magically go away, so a real workload that busts out of cache is going to see less IPC improvement.

The 50% performance improvement is with frequency improvements AFAICT (at a fixed power consumption level)

The 100% FP is only for SIMD code. The A9 doesn't track data dependencies on NEON registers. Using NEON instructions thus effectively turn the A9 into an in-order processor. The A15 has two remap tables, one for ARM registers and one for NEON registers. That and the wider datapaths is going to improve SIMD code immensely, but much less for regular FP code.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:01.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.