AMD's Steamroller Detailed: 3rd Generation Bulldozer Coreby Anand Lal Shimpi on August 28, 2012 4:39 PM EST
- Posted in
Today at the annual Hot Chips conference, AMD’s new CTO Mark Papermaster unveiled the first details about the Steamroller x86 CPU core.
Steamroller is the third instantiation of AMD’s Bulldozer architecture, first conceived in the mid-2000s and finally brought to market in late 2011. Committed to this architecture for at least one more design after Steamroller, AMD has settled on roughly yearly updates to the architecture. For 2012 we have the introduction of Piledriver, the optimized Bulldozer derivative that formed the CPU foundation for AMD’s Trinity APU. By the end of the year we’ll also see a high-end desktop CPU without processor graphics based on Piledriver.
Piledriver saw a switch to hard edge flip flops, which allowed for a considerable decrease in power consumption at the expense of careful design and validation work. Performance didn’t change, but AMD saw a 10% - 20% reduction in active power. Piledriver also brought some scheduling efficiency improvements, but prefetching and branch prediction were the two other major design improvements in Piledriver.
Steamroller is designed to keep the ball rolling. It takes fundamentals from the Bulldozer/Piledriver architectures and offers a healthy set of evolutionary improvements on top of them. In Intel speak Steamroller wouldn’t be a tick as it isn’t accompanied by a significant process change (28nm bulk is pretty close to 32nm SOI), but it’s not a tock as the architecture is mostly enhanced but largely unchanged. Steamroller fits somewhere in between those two extremes when it comes to changes.
Front End Improvements
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
|Front End Comparison|
|AMD Phenom II||AMD FX||Intel Core i7|
|Instruction Decode Width||3-wide||4-wide||4-wide|
|Single Core Peak Decode Rate||3 instructions||4 instructions||4 instructions|
|Dual Core Peak Decode Rate||6 instructions||4 instructions||8 instructions|
|Quad Core Peak Decode Rate||12 instructions||8 instructions||16 instructions|
|Six/Eight Core Peak Decode Rate||18 instructions (6C)||16 instructions||24 instructions (6C)|
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller.
The penalties are pretty obvious: area goes up as does power consumption. However the tradeoff is likely worth it, and both of these downsides can be offset in other areas of the design as you’ll soon see.
Steamroller inherits the perceptron branch predictor from Piledriver, but in an improved form for better performance (mostly in server workloads). The branch target buffer is also larger, which contributes to a reduction in mispredicted branches by up to 20%.
AMD streamlined the large, shared floating point unit in each Steamroller module. There’s no change in the execution capabilities of the FPU, but there’s a reduction in overall area. The MMX unit now shares some hardware with the 128-bit FMAC pipes. AMD wouldn’t offer too many specifics, just to say that the shared hardware only really applied for mutually exclusive MMX/FMA/FP operations and thus wouldn’t result in a performance penalty.
The reduction of pipeline resources is supposed to deliver the same throughput at lower power and area, basically a smarter implementation of the Bulldozer/Piledriver FPU.
There’s no change to the integer execution units themselves, but there are other improvements that improve integer performance.
The integer and floating point register files are bigger in Steamroller, although AMD isn’t being specific about how much they’ve grown. Load operations (two operands) are also compressed so that they only take a single entry in the physical register file, which helps increase the effective size of each RF.
The scheduling windows also increased in size, which should enable greater utilization of existing execution resources.
Store to load forwarding sees an improvement. AMD is better at detecting interlocks, cancelling the load and getting data from the store in Steamroller than before.
Post Your CommentPlease log in or sign up to comment.
View All Comments
flgt - Tuesday, August 28, 2012 - linkI doubt it was AMD's master plan to give up the juicy profit margins in high performance and enterprise applications. I'm guessing that AMD would kill to have the revenue that Intel is pulling in from those small number of processors. AMD just can't compete hence the need to fall back to the low margin value business.
CeriseCogburn - Wednesday, August 29, 2012 - linkI had several friends predict the crap that bulldozer is long before it arrived simply by perusing leaks of the achitecture.
If my less than certified yet sentient friends can read the writing on the wall concerning the architecture choices long before they actually arrive...what the heck is wrong with amd's design teams ?
"Can't compete" is usually a phrase I toss out for it's overtly exaggerated usage, but in this case I make an exception.
Somehow amd found some light in the GPU arena concerning the same thing, then their drivers fall flat on their face far too often, ruining the core work.
I certainly hope their new hires can straighten out the mess, but hope for change has not been a well placed bet of late.
brucek2 - Wednesday, August 29, 2012 - linkWhat forum do you think you're on? Yes, if you want to debate the likely impact to AMD's volume sales, overall adoption of this family of chips, etc. there are many factors that are lot more significant than what its peak performance is like (even though you left most of those out, and hint, individual consumer preference has a lot less to do with it than it should.)
But this is not motleyfool or another stock discussion site, nor one that really is much interested with "mainstream consumers" in general. This is a site for hardware enthusiasts, and the big question most of us are going to have is, a) is this a chip we might be interested in having in one of our systems, and b) what technologies does it bring to the table that might be interesting as far as overall technical evolution of computing?
In short, the article is correct, the big question for this forum and this audience is how will it stack up against Haswell.
CeriseCogburn - Wednesday, August 29, 2012 - linkThe answer is the same in both cases, so you complained, then agreed, unwittingly.
r3fug3 - Wednesday, August 29, 2012 - linkIvy's OC issues are not from the die shrinkage... Its from the method used to attach the heat shield.
HighTech4US - Tuesday, August 28, 2012 - link> "Is it cheap, will it do Ebay and can my daughter play the Sims on it?"
> Thats all the criteria needed in most cases.
In that case just get a $200 tablet.
The Nexus 7 would do just fine with those criteria .
jabber - Tuesday, August 28, 2012 - linkAnd there you have the decline of the desktop PC.
Get used to it.
You will all be part of a smaller and smaller club.
swaaye - Tuesday, August 28, 2012 - linkMost people have always bought low end hardware so not much has changed. There are some more options now in tablets but those aren't really a replacement for a notebook/desktop because they have many constraints. My impression is they are used alongside normal computers.
swaaye - Tuesday, August 28, 2012 - linkI should say - alongside or as a supplementary media consumption toy.
CeriseCogburn - Wednesday, August 29, 2012 - linkYes, and PC sales are rising - with the population.
There is a point though, as overall percentage is of course, and has been, of course, not rising, as more gadgets tending toward mobile use are developed, and that has been occurring for some time now.
Unless the world population becomes completely nomadic 24/7/365, PC's are not going away.