Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 10-Oct-2012, 16:15   #126
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by Gubbi View Post
IMO, the 40% IPC increase in Dhrystone is the upper limit we will see for IPC improvements. Memory latency doesn't magically go away, so a real workload that busts out of cache is going to see less IPC improvement.
That's like saying that the performance difference between Cortex-A8 and Ivy Bridge is purely down to their cache and main memory latencies. There's a big continuum of performance opportunities based on how well you can a) extract parallelism and b) schedule to hide latency. Cortex-A15 makes big advances on both fronts. Without knowing the weaknesses of what you're starting with that's a pretty blind statement. Given that Cortex-A15 doesn't actually add much to the execution resources on the integer side, over Cortex-A8 and A9, I'd say it really is all about better management of said resources.

Besides that, Cortex-A9 implementations do tend to have relatively high L2 latency and relatively high main memory latency, so there's plenty of room for improvement; the former can actually be delivered by ARM since the L2 is tightly coupled with the CPUs again.

What really confuses me is how you can make this statement while simultaneously saying A6's CPU is higher performing - does only it get to magically make latency go away?

Quote:
Originally Posted by Gubbi View Post
The 50% performance improvement is with frequency improvements AFAICT (at a fixed power consumption level)
No it isn't. There's no ambiguity in what ARM said.

Quote:
Originally Posted by Gubbi View Post
The 100% FP is only for SIMD code. The A9 doesn't track data dependencies on NEON registers. Using NEON instructions thus effectively turn the A9 into an in-order processor. The A15 has two remap tables, one for ARM registers and one for NEON registers. That and the wider datapaths is going to improve SIMD code immensely, but much less for regular FP code.
The 100% number was NOT just given for SIMD.

Everything you said about OoO applies to scalar VFP in Cortex-A9 vs Cortex-A51 just as much as it implies to NEON. The word isn't back yet but it's also possible that there are two "real work" VFP pipes (ie, 2x scalar FMADDs)
Exophase is offline   Reply With Quote
Old 10-Oct-2012, 20:04   #127
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by Exophase View Post
That's like saying that the performance difference between Cortex-A8 and Ivy Bridge is purely down to their cache and main memory latencies. There's a big continuum of performance opportunities based on how well you can a) extract parallelism and b) schedule to hide latency. Cortex-A15 makes big advances on both fronts.

Without knowing the weaknesses of what you're starting with that's a pretty blind statement. Given that Cortex-A15 doesn't actually add much to the execution resources on the integer side, over Cortex-A8 and A9, I'd say it really is all about better management of said resources.
The A9 has 2-wide instruction decode and retirement, a 24 entry reorder window and 4 dispatch ports.

The A15 has 3-wide decode and retirement and a 40+ entry reorder window. None of the material I've seen is more detailed than "40+" ROB entries and none says how many dispatch ports or execution units it has. The A15 is designed for higher operating frequency. Higher operating frequency generally increases latency measured in cycles to the memory subsystem, if ARM combats this with a new cache architecture, fine, but it still means that the amount of ROB entries per peak instruction throughput per cycle roughly stays the same, so don't expect more than 50% IPC increase.

Quote:
Originally Posted by Exophase View Post
What really confuses me is how you can make this statement while simultaneously saying A6's CPU is higher performing - does only it get to magically make latency go away?
It's not my claim. The A6 enjoys a 60% IPC increase vs A9 as per Anandtech's tests. ARM claims a 40% (or 50%) IPC increase of A15 over A9. I merely tried to connect the dots.

We don't know anything about the A6 other than it has a kickass memory subsystem. Where does the performance come from ? Is it 4-wide? Does it have multi ported D$? How big is the reorder buffer? Does it have memory disambiguation ?

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 10-Oct-2012, 21:21   #128
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by Gubbi View Post
The A9 has 2-wide instruction decode and retirement, a 24 entry reorder window and 4 dispatch ports.
Where did you read that it has a 24 entry reorder window? Or 4 dispatch ports for that matter?

This is the best Cortex-A9 reference I've seen: http://www.docstoc.com/docs/73399229...roarchitecture

When they say "3+1" dispatch all diagrams would suggest that's either referring to the third port being capable of going to LS vs NEON/VFP, or the separate branch resolution. It's not a real quad dispatch either way.

There's no official documentation on the issue queue, but the diagram draws 6 squares, so the best guess will be that it's 6 wide. Everything else about it suggests a unified scheduler. Given that ARM themselves says that 8 scheduler slots were pushing the upper limit of feasibility in their design constraints for Cortex-A15 it'd be awfully strange if Cortex-A9 had 24, although I suppose it's possible given that they were designed by two totally different teams.

Quote:
Originally Posted by Gubbi View Post
The A15 has 3-wide decode and retirement and a 40+ entry reorder window. None of the material I've seen is more detailed than "40+" ROB entries and none says how many dispatch ports or execution units it has. The A15 is designed for higher operating frequency. Higher operating frequency generally increases latency measured in cycles to the memory subsystem, if ARM combats this with a new cache architecture, fine, but it still means that the amount of ROB entries per peak instruction throughput per cycle roughly stays the same, so don't expect more than 50% IPC increase.
You're not looking very hard for information. http://www.arm.com/files/pdf/AT-Expl...Cortex-A15.pdf

A15 has 8 issue queues (to each execution pipeline) in 5 clusters, each with 8 slots. That's 64 entries total. It can dispatch to each of the 8 pipelines each cycle. The pipelines are 2x simple ALU, 1x branch, 1x MUL, 1x load, 1x store, and 2x NEON/VFP. Note that the ALUs bring back parallel shift + op execution, which was moved to separate stages in A9.

But there's way more to the comparison than just execution window, execution width, and latency to the memory subsystem. I don't think I really need to start listing things.

Quote:
Originally Posted by Gubbi View Post
It's not my claim. The A6 enjoys a 60% IPC increase vs A9 as per Anandtech's tests. ARM claims a 40% (or 50%) IPC increase of A15 over A9. I merely tried to connect the dots.
Are you aware that A6 runs at up to 1.3GHz and therefore was probably running at that clock speed during Anand's tests?

Quote:
Originally Posted by Gubbi View Post
We don't know anything about the A6 other than it has a kickass memory subsystem. Where does the performance come from ? Is it 4-wide? Does it have multi ported D$? How big is the reorder buffer? Does it have memory disambiguation ?
True, we don't know those things, but it seems you don't know a lot about Cortex-A15 too.. 4-wide seems pretty outrageous for a phone chip.

Anyway, back to the original claim - regardless of what you think the maximum improvement Cortex-A15 can bring is, why would you think Dhrystone would be what's representative of the upper limit? Dhrystone is relatively static, predictable, small, and the test is designed so that you can spend a lot of the time in hand tuned ASM. An other words, an easy problem A lot of the hardware in Cortex-A15, quite possibly the majority of it, is designed for problems harder than Dhrystone.

Last edited by Exophase; 10-Oct-2012 at 21:27.
Exophase is offline   Reply With Quote
Old 11-Oct-2012, 10:22   #129
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by Exophase View Post
Where did you read that it has a 24 entry reorder window? Or 4 dispatch ports for that matter?

This is the best Cortex-A9 reference I've seen: http://www.docstoc.com/docs/73399229...roarchitecture
Reorder window size inferred from 56 rename entries with 32 needed for architected state (int+fp).

Dispatch: Page 6 here

Although the diagram is confusing, it does say up to FOUR dispatches per cycle.

Quote:
Originally Posted by Exophase View Post
You're not looking very hard for information. http://www.arm.com/files/pdf/AT-Expl...Cortex-A15.pdf

A15 has 8 issue queues (to each execution pipeline) in 5 clusters, each with 8 slots. That's 64 entries total. It can dispatch to each of the 8 pipelines each cycle. The pipelines are 2x simple ALU, 1x branch, 1x MUL, 1x load, 1x store, and 2x NEON/VFP. Note that the ALUs bring back parallel shift + op execution, which was moved to separate stages in A9.
The amount of instructions in issue queues doesn't say anything about the rename capacity and hence the size of reorder window.

When an instruction is renamed, it is allocated an entry in the commit queue. The only time I've seen the size of the commit queue mentioned was in comp.arch on usenet two years ago, where the number 40 was mentioned.

Quote:
Originally Posted by Exophase View Post
Are you aware that A6 runs at up to 1.3GHz and therefore was probably running at that clock speed during Anand's tests?
No, I wasn't aware of that. I'm surprised Apple doesn't market it as a 1.3GHz processor then.

Quote:
Originally Posted by Exophase View Post
True, we don't know those things, but it seems you don't know a lot about Cortex-A15 too.. 4-wide seems pretty outrageous for a phone chip.
Nobody, outside of ARM, knows much about A15.

Quote:
Originally Posted by Exophase View Post
Anyway, back to the original claim - regardless of what you think the maximum improvement Cortex-A15 can bring is, why would you think Dhrystone would be what's representative of the upper limit? Dhrystone is relatively static, predictable, small, and the test is designed so that you can spend a lot of the time in hand tuned ASM. An other words, an easy problem A lot of the hardware in Cortex-A15, quite possibly the majority of it, is designed for problems harder than Dhrystone.
Dhrystone runs close to the maximum of what the execution core of the CPU is capable of. A real workload is not fully contained in D$ and you then have to contend with memory latencies.

The A15 can execute 50% more instructions per cycle. That also implies that latency of a memory operation grows by 50% measured in instructions even if number of cycles stays the same.In order to get a perfect 50% speedup you'd need to reduce main memory latency to 66%.

Can the A15 do that? Possibly, the tests I've seen of A9 shows a 200ns main memory latency, so there is certainly room for improvement.

Also, datapaths are twice as wide so that'll buy you a lot on throughput workloads (FP and media). The extra bandwidth can also be used for more aggressive prefetch where you effectively trade bandwidth for lower latency

Cheers
__________________
I'm pink, therefore I'm spam

Last edited by Gubbi; 11-Oct-2012 at 10:55.
Gubbi is offline   Reply With Quote
Old 11-Oct-2012, 11:25   #130
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,129
Default

Drhystone? Here's an extremely old benchmark, that not only can be abused with compilation optimisation (thanks wikipedia) but will also typically entirely fit in L1. Nowadays mobile CPU have become like PCs of the past 15 years with a hierarchy of L1, L2 and memory with a huge relative latency, so you're not testing real performance and don't even have an excuse for it.
Blazkowicz is online now   Reply With Quote
Old 11-Oct-2012, 11:43   #131
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by Blazkowicz View Post
Drhystone? Here's an extremely old benchmark, that not only can be abused with compilation optimisation (thanks wikipedia) but will also typically entirely fit in L1. Nowadays mobile CPU have become like PCs of the past 15 years with a hierarchy of L1, L2 and memory with a huge relative latency, so you're not testing real performance and don't even have an excuse for it.
Part of my point. ARM claiming a 50% IPC increase in Dhrystone tells you nothing.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 11-Oct-2012, 16:09   #132
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,553
Default

Quote:
Originally Posted by Gubbi View Post
Reorder window size inferred from 56 rename entries with 32 needed for architected state (int+fp).

Dispatch: Page 6 here

Although the diagram is confusing, it does say up to FOUR dispatches per cycle.
Size of physical register file/rename capability is not the same as reordering capability. Sandy Bridge, for instance, has an instruction window based on the size of its ROB (168 entries), not its integer PRF (144 entries) or floating point PRF (160 entries). You could have zero register renaming whatsoever and still provide reordering.

The document I linked is much more detailed than yours, and makes it pretty clear it doesn't any true capability to dispatch four things in one cycle. The comment is probably counting folded branch resolution as dispatch, which is fair in the sense that it correlates to an instruction that was decoded and issued, but still not what most would consider true dispatch. But this is really nit-picking over details.

Quote:
Originally Posted by Gubbi View Post
The amount of instructions in issue queues doesn't say anything about the rename capacity and hence the size of reorder window.
Sure it does. It's the issue queues that are scanned for instructions to dispatch each cycle. It is literally the pool from which eligible instructions are chosen and when it's full you can't add to the reordering capacity. Maybe you're confused by it being called a "queue." These queues are analogous to ROBs in other processors. ARM makes it very clear in the article I linked that the instruction window is dictated by the size and quantity of these queues.

Of course, since they don't have a unified scheduler, you generally won't come that close to actually utilizing the full reordering capacity, in general it'll probably be < 40 instructions.

Quote:
Originally Posted by Gubbi View Post
When an instruction is renamed, it is allocated an entry in the commit queue. The only time I've seen the size of the commit queue mentioned was in comp.arch on usenet two years ago, where the number 40 was mentioned.
You will see that the instructions are issued to the issue queues after renaming. The number 40 probably came from someone multiplying the 5 clusters by 8 instead of the 8 pipelines (the document I linked indicates that this is the partitioning of the queues)

Quote:
Originally Posted by Gubbi View Post
No, I wasn't aware of that. I'm surprised Apple doesn't market it as a 1.3GHz processor then.
Since when has Apple ever marketed the MHz of anything?

Quote:
Originally Posted by Gubbi View Post
Nobody, outside of ARM, knows much about A15.
Did you even read the document I linked? It's far more detailed than any Cortex-A9 document out there! It's also more detailed than most descriptions Intel or AMD has given for their CPUs. You can find some more information in the publicly visible TRM (like various buffer/cache sizes/associativities).

Quote:
Originally Posted by Gubbi View Post
Dhrystone runs close to the maximum of what the execution core of the CPU is capable of. A real workload is not fully contained in D$ and you then have to contend with memory latencies.
Yes, we both agree on this.

Quote:
Originally Posted by Gubbi View Post
The A15 can execute 50% more instructions per cycle. That also implies that latency of a memory operation grows by 50% measured in instructions even if number of cycles stays the same.In order to get a perfect 50% speedup you'd need to reduce main memory latency to 66%.
It can decode 50% more instructions per cycle. It can fetch 100% more instructions per cycle. It can dispatch at least 100% more instructions per cycle. Its general branch misprediction penalty is larger but its mispredict rate is better. Its loop buffer lets it bypass fetch and most of decode stages, and is probably more capable than Cortex-A9's (larger, can handle two forward branches with unknown predict capability). It can execute loads and stores in parallel. It has wider reordering capability. It has better prefetchers. It can predict indirect branches better than by just using the last thing in the BTB. It can perform shifts and ALU operations in parallel. If I'm reading things right, the load-use latency is generally one cycle where Cortex-A9 is often two. Its L2 is more tightly coupled meaning lower latency in addition to twice the interface width. It has a bigger TLB hierarchy and new partitioning to include both load and store DTLBs.

Taking all that and putting it into a simplistic equation saying that it must need 66% lower MAIN memory latency to achieve 50% better perf/clock on average is a total farce. I don't know what you're doing here. You find out the performance by benchmarking it, but right now the best thing to go on is ARM's claim that it'll get 50% better performance.

Quote:
Originally Posted by Gubbi View Post
Can the A15 do that? Possibly, the tests I've seen of A9 shows a 200ns main memory latency, so there is certainly room for improvement.
You will find that the numbers vary a lot based on which SoC we're talking about, which makes sense since the processor isn't responsible for the rest of the memory interface.

You'll also find that despite some SoCs having main memory latencies over 50% better than others they don't usually get a huge boost in performance. Cortex-A15 is less sensitivity to main memory latency than Cortex-A9 (I'm not claiming how much, but it's definitely less). Do I have to explain why?

Quote:
Originally Posted by Gubbi View Post
Part of my point. ARM claiming a 50% IPC increase in Dhrystone tells you nothing.
I feel like you're not listening to me. ARM claimed 40% higher Dhrystone scores at the same MHz. They claimed 50% higher integer performance in general, again at the same MHz. The latter was not about Dhrystone. They haven't explained it further but some other charts imply this number is from SPEC.
Exophase is offline   Reply With Quote
Old 11-Oct-2012, 20:31   #133
ninelven
PM
 
Join Date: Dec 2002
Posts: 1,370
Default

Eh, actual A-15 hardware will be out soon enough... I am content to wait for real world results.
__________________
//
ninelven is offline   Reply With Quote
Old 12-Oct-2012, 11:03   #134
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,543
Default

Quote:
Originally Posted by Exophase View Post
Size of physical register file/rename capability is not the same as reordering capability. Sandy Bridge, for instance, has an instruction window based on the size of its ROB (168 entries), not its integer PRF (144 entries) or floating point PRF (160 entries). You could have zero register renaming whatsoever and still provide reordering.
The number of rename entries determine how many results you can rename and thus how many instructions you can have in flight. I never claimed the physical register file size had anything to do with it other that you need rename entries to map to non-speculated state.

Quote:
Originally Posted by Exophase View Post
The document I linked is much more detailed than yours, and makes it pretty clear it doesn't any true capability to dispatch four things in one cycle. The comment is probably counting folded branch resolution as dispatch, which is fair in the sense that it correlates to an instruction that was decoded and issued, but still not what most would consider true dispatch. But this is really nit-picking over details.
The document you linked clearly states, on page 7, four instructions can be dispatched per cycle, the diagram clearly shows 4 arrows to exec pipes: Two integer, one LS and one FP/NEON. On page 14 the diagram shows three arrows and one to the branch unit, so you may very well be right. To me, it isn't clear at all.

Quote:
Originally Posted by Exophase View Post
Sure it does. It's the issue queues that are scanned for instructions to dispatch each cycle. It is literally the pool from which eligible instructions are chosen and when it's full you can't add to the reordering capacity. Maybe you're confused by it being called a "queue." These queues are analogous to ROBs in other processors. ARM makes it very clear in the article I linked that the instruction window is dictated by the size and quantity of these queues.
That would make the issue queues equivalent to reservation stations/local ROBs like we know from OOO x86 CPUs.

Without a global scheduler the OOO capabilities are much more limited than an equivalent x86 implementation. A simple integer rich workload with a few loads missing D$ sprinkled in could effectively limit the amount of instructions in flight to the size of the int issue queues, - 16 entries.

AFAICT, if you're right, the only way to get anywhere near the maximum number of instructions in flight is FP/NEON code. There is always a surprising amount of integer chores in FP codes and that way most of the issue queues could be filled (or at least see any action).

Quote:
Originally Posted by Exophase View Post
You will see that the instructions are issued to the issue queues after renaming. The number 40 probably came from someone multiplying the 5 clusters by 8 instead of the 8 pipelines (the document I linked indicates that this is the partitioning of the queues)
Since all instructions except branches and nops produce a result (branches do too in ARM, since the PC is a general register, but I expect it to be special cased), the amount of instructions in flight is limited by the amount of entries in the commit queue where results sits until speculated state is resolved (branches). That queue has 40 entries (read in an ARM document, linked to in a usenet post in november 2010, the ARM document is now nowhere to be found.)

Quote:
Originally Posted by Exophase View Post
Did you even read the document I linked? It's far more detailed than any Cortex-A9 document out there! It's also more detailed than most descriptions Intel or AMD has given for their CPUs. You can find some more information in the publicly visible TRM (like various buffer/cache sizes/associativities).
I did. It is not only far more detailed than any Cortex-A9 document, it is also much more confusing than any document detailing micro architecture I've ever seen from AMD or Intel.

The commit queue looks like a data-full ROB, but it claims to be a PRF OOO implementation. The OOO capabilities looks to be ample except they are limited by the issue queue sizes.

BTW. This is off topic for this thread, move it ?

Cheers
__________________
I'm pink, therefore I'm spam

Last edited by Gubbi; 12-Oct-2012 at 13:23.
Gubbi is offline   Reply With Quote
Old 10-Nov-2012, 20:33   #135
UniversalTruth
Member
 
Join Date: Sep 2010
Posts: 996
Default

Intel to Merge Xeon and Itanium in 2015-2017

Ivy Bridge (Core i3/i5/i7) debuted in 2012
Haswell (Core i3/i5/i7) will debut in early 2013
Ivy Bridge-EP (Xeon E3/E5) should arrive in mid-2013
Ivy Bridge-E (Core i7) debuts in late 2013
Ivy Bridge-EX for critical servers (Xeon E7) debuts in late 2013
Broadwell (Core i3/i5/i7) should ship in early 2014
Haswell-EP (Xeon E3, E5) should ship by mid 2014
Haswell-E (Core i7) debuts in late 2014
Haswell-EX (Xeon E7) is planned for late 2014
Broadwell-EP (Xeon E3 / E5) is planned for mid 2015
Broadwell-E (Core i7) arrives in late 2015
Broadwell-EX (Xeon E7) is planned for late 2016


The new socket could be the one you already know - according to some sources, Intel plans to re-wire the LGA-2011 for Haswell/Broadwell, making it incompatible with Sandy Bridge/Ivy Bridge-based products. The rewiring isn't being done to support new architectures, but rather provide more power - according to documents we saw, Intel plans to introduce 150W and up to 180W parts when Haswell and Broadcom architectures enter the cut throat server business.

Hmm, sounds very nice. 180 W CPU, I want for my desktop machine.
UniversalTruth is offline   Reply With Quote
Old 10-Nov-2012, 22:11   #136
I.S.T.
Senior Member
 
Join Date: Feb 2004
Posts: 2,439
Default

Merging them is highly inaccurate. Merging the support system(Socket, perhaps chipset, etc) is accurate. We won't be seeing itaniums on our PCs, and for good reason.
I.S.T. is offline   Reply With Quote
Old 11-Nov-2012, 03:14   #137
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,129
Default

It could allow an x86 in one socket and an itanium in another, assuming you would want to do that.
Blazkowicz is online now   Reply With Quote
Old 11-Nov-2012, 06:49   #138
Grall
Invisible Member
 
Join Date: Apr 2002
Location: La-la land
Posts: 4,982
Default

Quote:
Originally Posted by I.S.T. View Post
Merging them is highly inaccurate. Merging the support system(Socket, perhaps chipset, etc) is accurate. We won't be seeing itaniums on our PCs, and for good reason.
I haven't read the linked article (yet), but I assume this would be preparation for a move to (relatively) painlessly kill off itanium, since that product is dead anyway.

So, the day intel finally pulls the plug on itanium, customers could drop in x86 chips there instead.
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)."
-Phil Plait
Grall is offline   Reply With Quote
Old 12-Nov-2012, 03:39   #139
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,433
Default

Frankly I don't know why it took intel so long. Back in 2006 roadmaps suggested that Xeons and Itanics will use the same chipsets in the future and ultimately boards could support both chips (I dunno what happened with the "same chipsets" but up to now at least the sockets obviously ended up different). Remember QuickPath was initially known as CSI ("Common System Interface").
mczak is offline   Reply With Quote
Old 12-Nov-2012, 04:29   #140
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,129
Default

Intel has always done minimum service regarding socket compatibility, they had three generations of socket 370 and four of socket 775, each time the motherboards were backwards compatible but never forward compatible (millions of computers are stuck with a pentium 4 and can't get a Core 2 Celeron).
Or there's Socket 1156 and 1155, where everyone has forgotten what the new socket brought to the table already.

Intel is opportunist, they won't care about breaking compatibility if that means the CPU will use 1% less power or something. They are also good at pushing a new platform in the distribution channels. They care more about deadlines and such.
Blazkowicz is online now   Reply With Quote
Old 12-Nov-2012, 11:02   #141
Grall
Invisible Member
 
Join Date: Apr 2002
Location: La-la land
Posts: 4,982
Default

Quote:
Originally Posted by Blazkowicz View Post
Intel is opportunist, they won't care about breaking compatibility
They only don't care because they don't have to. If they hadn't had a virtual monopoly on PC processors there's no way they so casually could shut out their entire existing market with repeated new platforms/sockets that bring only minimal (or even no) improvements.

This may change in the future as stationary computers are being increasingly encroached upon by mobile platforms. CPU sockets may in fact not even survive the end of this decade.
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)."
-Phil Plait
Grall is offline   Reply With Quote
Old 12-Nov-2012, 17:36   #142
I.S.T.
Senior Member
 
Join Date: Feb 2004
Posts: 2,439
Default

http://techreport.com/news/23885/lea...hipset-details

Interesting...
I.S.T. is offline   Reply With Quote
Old 13-Nov-2012, 08:51   #143
HMBR
Member
 
Join Date: Mar 2009
Posts: 160
Default

Quote:
Originally Posted by I.S.T. View Post
it's a shame that again, apart from the PEG the other PCIE ports are still 2.0...

but, correct me if I'm wrong, they are saying that what is today the "PCH" is going to be on the same package as the CPU?
HMBR is offline   Reply With Quote
Old 16-Nov-2012, 02:00   #144
tunafish
Member
 
Join Date: Aug 2011
Posts: 366
Default

Quote:
Originally Posted by HMBR View Post
but, correct me if I'm wrong, they are saying that what is today the "PCH" is going to be on the same package as the CPU?
Yes. But so far, there is information only about the Lynx Point LP, or low power, model. I'd expect this to be in laptops.
tunafish is online now   Reply With Quote
Old 16-Nov-2012, 06:37   #145
Lux_
Member
 
Join Date: Sep 2005
Posts: 206
Default

"Intel’s Haswell CPU Microarchitecture" by David Kanter

Intel has indeed pushed the "mass-market state of the art" forward in many fronts at once. It would be truly sad if it turns out that the mass-market needs peak with dual-core consumption devices, which rules out future Haswell-like big jumps.
Lux_ is offline   Reply With Quote
Old 16-Nov-2012, 09:13   #146
Grall
Invisible Member
 
Join Date: Apr 2002
Location: La-la land
Posts: 4,982
Default

Quote:
Originally Posted by Lux_ View Post
It would be truly sad if it turns out that the mass-market needs peak with dual-core consumption devices, which rules out future Haswell-like big jumps.
Let's be realistic - heavy computing capability in a CPU is only neccessary for those who actually do heavy computing. It's like expecting everyone to buy cars that can pull off competitive times at a dragracing strip - unrealistic! Not all that sad, really. It's simply reality.
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)."
-Phil Plait
Grall is offline   Reply With Quote
Old 16-Nov-2012, 13:38   #147
Lux_
Member
 
Join Date: Sep 2005
Posts: 206
Default

Quote:
Originally Posted by Grall View Post
Let's be realistic - heavy computing capability in a CPU is only neccessary for those who actually do heavy computing. It's like expecting everyone to buy cars that can pull off competitive times at a dragracing strip - unrealistic! Not all that sad, really. It's simply reality.
Starting from Pentium era, up until tablet/smartphone revolution, the CPUs for the regular mass market have also served the heavy computing market (from Intel/AMD standpoint). Most of the x86 users are currently indeed riding the "almost dragsters".
Lux_ is offline   Reply With Quote
Old 16-Nov-2012, 15:30   #148
Grall
Invisible Member
 
Join Date: Apr 2002
Location: La-la land
Posts: 4,982
Default

Yeah, because it made sense from many perspectives to have it work this way, but with Moore's law finally starting to hit the ceiling things are changing - and x86 CPUs are so much more powerful than what the average guy needs anyway it's silly.

When shrinking nodes don't bring any appreciable savings in cost per transistor anymore there's little room to improve performance anyway.
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)."
-Phil Plait
Grall is offline   Reply With Quote
Old 16-Nov-2012, 18:23   #149
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,129
Default

Quote:
Originally Posted by Lux_ View Post
Starting from Pentium era, up until tablet/smartphone revolution, the CPUs for the regular mass market have also served the heavy computing market (from Intel/AMD standpoint). Most of the x86 users are currently indeed riding the "almost dragsters".
I'd put it at the Pentium Pro, which was indeed the fastest CPU on earth at its launch, on par with Alpha. Then the design scaled up to Pentium 3 1GHz and higher, and Athlon matched it.
Nowadays there's only Sparc supercomputers, POWER7 mini-computers and Z mainframes competing with the desktop PC
Blazkowicz is online now   Reply With Quote
Old 27-Dec-2012, 16:57   #150
DSC
Member
 
Join Date: Jul 2003
Posts: 323
Default

http://www.xbitlabs.com/news/cpu/dis...Regulator.html

Quote:
Intel Corp.’s next-generation code-named Haswell microprocessors will not only improve performance and feature some tricks to lower power consumption, but will also feature a secret weapon: integrated voltage regulator module (VRM). The latter will allow to improve granularity of power supply to central processing units and thus further cut power consumption without compromising performance.
DSC is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:30.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.