Text Size

K10: Barcelona, Shanghai, Quad-Core Opteron, Phenom

Bulldozer Clarifications

AMD's latest line-up of CPUs including Shanghai Opteron, Phenom and Phenom II X4 & X3, Athlon X2.

Bulldozer Clarifications

Postby JF-AMD » Mon Dec 07, 2009 11:37 pm

There have been some questions about the Bulldozer architecture so this should help clear up any confusion.

First, Bulldozer is based on a modular architecture where two integer cores are teamed up with an extra-large FPU to create what we call a Bulldozer module. Bulldozer modules are the basis of all of the designs that will be coming from this architecture, and it’s modular nature not only allows us to build processors with different sized core counts but also provides flexibility for future designs that could allow other modular components like GPUs to be added into the designs. The Bulldozer module is a concept and part of an architectural design, it is not something that the user will come in contact with. For instance, when an Interlagos system boots up, the hardware will see 16 integer cores, not 8 modules. When the OS loads, it will see 16 integer cores, not 8 modules, and the applications will see 16 cores as well. Because of this extremely consistent manner by which the whole system sees the integer core (and not modules), it is only natural that Interlagos will be marketed as a 16-core processor. It would actually be more confusing to call it an 8-core processor, because there is no point where a customer would see 8-cores.

Secondly, there was a question about the amount of die space that is consumed by having 2 integer cores in a module versus just one. Bulldozer was designed to be a modular architecture where 2 integer cores are able to share certain resources where it makes sense (in order to reduce power consumption) yet still retain discrete components in order to ensure great performance and no bottlenecks. It was never designed as a single integer core in each module, so dissecting the module components becomes a bit more tricky. Some have compared this to SMT and made statements that SMT customers could see a modest increase in performance for only a fraction of die space. We believe that our Bulldozer architecture will provide far greater performance gains than SMT with up to 80% greater expected throughput when running 2 threads simultaneously compared to a single thread running on a single integer core. Our engineers estimate that the amount of discrete circuitry that is added to each Bulldozer module in order to allow for a second integer thread to run is ~12% at the core level, but because the integer cores are only a portion of the overall die space , the addition of the second integer core in each module only adds ~5% of circuitry to the total die. We believe this is an excellent balance of greater performance with a very small silicon cost.

Finally, there are those that have suggested that the two integer cores in the Bulldozer module could potentially be merged together into a single core. This is not true. Perhaps they are confusing the functionality of the FPU, which is flexible enough to be split between the two cores in the module, giving each a 128-bit FMAC simultaneously, OR can be combined into a 256-bit FMAC for one integer core to use exclusively if the second integer core does not need any FPU commands in that cycle.

We hope this clarifies the questions that seem to be most prevalent.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby abinstein » Mon Dec 07, 2009 11:44 pm

Thanks A lot! It's I believe the clearest explanation I've seen. :)
Anandtech -- a site which every visit makes me regret my time spent! It is a living testimony of Einstein's quote:"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former."
abinstein
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 7171
Joined: Sat Oct 30, 2004 9:49 pm

Re: Bulldozer Clarifications

Postby JF-AMD » Tue Dec 08, 2009 12:02 am

Since it is the holiday season I'll paraphrase from "A Christmas Story". The clarification was so good that I am exempt from ever having to do another one.

Now where is my daisy red ryder BB gun?
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby brutis » Tue Dec 08, 2009 12:09 am

JF-AMD wrote:Since it is the holiday season I'll paraphrase from "A Christmas Story". The clarification was so good that I am exempt from ever having to do another one.

Now where is my daisy red ryder BB gun?

You'll shoot your eye out with it :wink:
Image
Image
User avatar
brutis
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 6019
Joined: Sat Mar 06, 2004 2:36 pm
Location: so there I was...

Re: Bulldozer Clarifications

Postby MKruer » Tue Dec 08, 2009 12:26 am

brutis wrote:
JF-AMD wrote:Since it is the holiday season I'll paraphrase from "A Christmas Story". The clarification was so good that I am exempt from ever having to do another one.

Now where is my daisy red ryder BB gun?

You'll shoot your eye out with it :wink:


That was so cool I TRIPLE-dog-dare you to stick your tongue on the pole. :mrgreen:
Lian-Li PC-V2000 Plus Aluminum Case; Seasonic S12 Energy+ 550 PSU; Asus M4A785TD-V EVO; Phenom II X4 965 Black Edition C3 @ 4.0Ghz ; Thermalright Ultra-120 eXtreme Rev.C; 8GB OCZ AMD Black Edition @ 1333Mhz; Sapphire Radeon HD 7870
User avatar
MKruer
K10 Opteron (Barcelona) Administrator
K10 Opteron (Barcelona) Administrator
 
Posts: 2439
Joined: Mon Mar 01, 2004 4:21 am
Location: I am not paid to do this, I don't even like to do this, I wonder why am I still doing this?

Re: Bulldozer Clarifications

Postby Polonium210 » Tue Dec 08, 2009 1:16 am

JF-AMD wrote:the addition of the second integer core in each module only adds ~5% of circuitry to the total die.


Now all you have to do is use all of your powers of persuasion to convince
Anand that 5% < 50% :!: This is your mission, Mr Phelps, should you decide to accept it.
Polonium210
K7 Athlon (Argon) Junior Boarder
K7 Athlon (Argon) Junior Boarder
 
Posts: 387
Joined: Tue Jan 30, 2007 4:21 am

Re: Bulldozer Clarifications

Postby piesquared » Tue Dec 08, 2009 5:24 am

Good info, thanks! I'm not sure there is as much 'real' confusion about Bulldozer as it seems though. Most of it appears to be manufactured and intentional confusion to take as much of the positive press away from what was revealed as possible. There are alot of intel employees out there, in one capacity or another, i'm guessing that's where it's coming from. I can't imagine how so many, many of whom are supposed to be experienced enthusiasts with years on different hardware forums around the web, can be that confused. It really is pretty simple to understand.
piesquared
K5 Fresh Boarder
K5 Fresh Boarder
 
Posts: 128
Joined: Sun Sep 24, 2006 5:47 pm

Re: Bulldozer Clarifications

Postby Smartidiot89 » Tue Dec 08, 2009 6:23 am

Great info John!
Smartidiot89
 

Re: Bulldozer Clarifications

Postby superrugal » Tue Dec 08, 2009 9:28 am

Thanks JF, this message is very helpful !! Maybe the "50% additional area" statement is still unofficial or completely not right ?! :?:
superrugal
K5 Fresh Boarder
K5 Fresh Boarder
 
Posts: 103
Joined: Tue Nov 17, 2009 9:39 am

Re: Bulldozer Clarifications

Postby vsary6968 » Tue Dec 08, 2009 11:20 am

JF-AMD wrote:There have been some questions about the Bulldozer architecture so this should help clear up any confusion.

First, Bulldozer is based on a modular architecture where two integer cores are teamed up with an extra-large FPU to create what we call a Bulldozer module. Bulldozer modules are the basis of all of the designs that will be coming from this architecture, and it’s modular nature not only allows us to build processors with different sized core counts but also provides flexibility for future designs that could allow other modular components like GPUs to be added into the designs. The Bulldozer module is a concept and part of an architectural design, it is not something that the user will come in contact with. For instance, when an Interlagos system boots up, the hardware will see 16 integer cores, not 8 modules. When the OS loads, it will see 16 integer cores, not 8 modules, and the applications will see 16 cores as well. Because of this extremely consistent manner by which the whole system sees the integer core (and not modules), it is only natural that Interlagos will be marketed as a 16-core processor. It would actually be more confusing to call it an 8-core processor, because there is no point where a customer would see 8-cores.

Secondly, there was a question about the amount of die space that is consumed by having 2 integer cores in a module versus just one. Bulldozer was designed to be a modular architecture where 2 integer cores are able to share certain resources where it makes sense (in order to reduce power consumption) yet still retain discrete components in order to ensure great performance and no bottlenecks. It was never designed as a single integer core in each module, so dissecting the module components becomes a bit more tricky. Some have compared this to SMT and made statements that SMT customers could see a modest increase in performance for only a fraction of die space. We believe that our Bulldozer architecture will provide far greater performance gains than SMT with up to 80% greater expected throughput when running 2 threads simultaneously compared to a single thread running on a single integer core. Our engineers estimate that the amount of discrete circuitry that is added to each Bulldozer module in order to allow for a second integer thread to run is ~12% at the core level, but because the integer cores are only a portion of the overall die space , the addition of the second integer core in each module only adds ~5% of circuitry to the total die. We believe this is an excellent balance of greater performance with a very small silicon cost.

Finally, there are those that have suggested that the two integer cores in the Bulldozer module could potentially be merged together into a single core. This is not true. Perhaps they are confusing the functionality of the FPU, which is flexible enough to be split between the two cores in the module, giving each a 128-bit FMAC simultaneously, OR can be combined into a 256-bit FMAC for one integer core to use exclusively if the second integer core does not need any FPU commands in that cycle.

We hope this clarifies the questions that seem to be most prevalent.



Thank you so much for the clarification. It very clear this time.
Will the Bulldozer arrive before 2011? It will be nice if it arrive 2H2010.
vsary6968
K6-III Fresh Boarder
K6-III Fresh Boarder
 
Posts: 264
Joined: Thu Dec 06, 2007 12:34 pm
Location: Everett,WA 98204

Re: Bulldozer Clarifications

Postby JF-AMD » Tue Dec 08, 2009 12:38 pm

superrugal wrote:Thanks JF, this message is very helpful !! Maybe the "50% additional area" statement is still unofficial or completely not right ?! :?:


I understand how the 50% message got out there, and it was a misunderstanding. The number is not correct.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby JF-AMD » Tue Dec 08, 2009 12:39 pm

vsary6968 wrote:Thank you so much for the clarification. It very clear this time.
Will the Bulldozer arrive before 2011? It will be nice if it arrive 2H2010.


No
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby gruffi » Tue Dec 08, 2009 6:59 pm

superrugal wrote:Maybe the "50% additional area" statement is still unofficial or completely not right ?! :?:

I think it came from Chuck Moore's presentation some years ago. So it was official. But probably not quite exact or not valid anymore.

@JF-AMD
You spoke about 12% "discrete circuitry" at the core level. Am I right that it doesn't include the circuitry of the widening of existing units to keep two integer cores busy, e.g. the decoder? If yes, are there estimations including this circuitry?
Tradition is not holding the ashes but passing the flame.
User avatar
gruffi
K6-III Fresh Boarder
K6-III Fresh Boarder
 
Posts: 252
Joined: Sun Jun 21, 2009 12:39 pm

Re: Bulldozer Clarifications

Postby JF-AMD » Tue Dec 08, 2009 7:10 pm

Now we are starting to get to the point of diminishing returns. This is only about taking one integer unit out, not changing anything else.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby wuttz » Tue Dec 08, 2009 7:17 pm

JF-AMD wrote:Our engineers estimate that the amount of discrete circuitry that is added to each Bulldozer module in order to allow for a second integer thread to run is ~12% at the core level, but because the integer cores are only a portion of the overall die space , the addition of the second integer core in each module only adds ~5% of circuitry to the total die.


as i understand it, that 50% figure [by anandtech] is wrong.
wuttz
K8 Athlon 64 X2 (Manchester) Elite Boarder
K8 Athlon 64 X2 (Manchester) Elite Boarder
 
Posts: 2990
Joined: Sat Aug 08, 2009 6:48 pm
Location: Pearland, Texas

Re: Bulldozer Clarifications

Postby JF-AMD » Tue Dec 08, 2009 7:21 pm

Yes, it was out of context. 12% at the module level, 5% at the die level.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby wuttz » Tue Dec 08, 2009 7:25 pm

better edit the first post jf, 12% space is at the module level, not at the "core" level. just to make it consistent.
wuttz
K8 Athlon 64 X2 (Manchester) Elite Boarder
K8 Athlon 64 X2 (Manchester) Elite Boarder
 
Posts: 2990
Joined: Sat Aug 08, 2009 6:48 pm
Location: Pearland, Texas

Re: Bulldozer Clarifications

Postby piesquared » Tue Dec 08, 2009 7:37 pm

So JF, I guess that means that Interlagos and Valencia are Bulldozer are 8 and 4 modules on the same die (not package) connected by HT? That still shouldn't affect yield, correct? Maybe you can't answer that though.
piesquared
K5 Fresh Boarder
K5 Fresh Boarder
 
Posts: 128
Joined: Sun Sep 24, 2006 5:47 pm

Re: Bulldozer Clarifications

Postby abinstein » Tue Dec 08, 2009 8:04 pm

piesquared wrote:So JF, I guess that means that Interlagos and Valencia are Bulldozer are 8 and 4 modules on the same die (not package) connected by HT? That still shouldn't affect yield, correct? Maybe you can't answer that though.


...

My guess is Interlagos is two 8-core connected in the same way as Magny-Cours. Valencia should be a 8-core die by itself.

Of course, this allows better yield than making a single die of 16 cores.
Anandtech -- a site which every visit makes me regret my time spent! It is a living testimony of Einstein's quote:"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former."
abinstein
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 7171
Joined: Sat Oct 30, 2004 9:49 pm

Re: Bulldozer Clarifications

Postby piesquared » Tue Dec 08, 2009 10:01 pm

abinstein wrote:
piesquared wrote:So JF, I guess that means that Interlagos and Valencia are Bulldozer are 8 and 4 modules on the same die (not package) connected by HT? That still shouldn't affect yield, correct? Maybe you can't answer that though.


...

My guess is Interlagos is two 8-core connected in the same way as Magny-Cours. Valencia should be a 8-core die by itself.

Of course, this allows better yield than making a single die of 16 cores.



OK, yeah that's what I was wondering, if they would be connected the same way as Magny-Cours. I guess it does make sense. I haven't seen a pic of MC yet though, but I assume it's going to be 2 dies on the same substrate? That's sort of what got me wondering since JF's terminology is usually Bulldozer modules on the same die, so it seemed a little different and thought maybe they developed some technique to cut 2 dies together off of the wafer, and they are connected via HT that way. Yeah, prob over analyzing it way too much!
piesquared
K5 Fresh Boarder
K5 Fresh Boarder
 
Posts: 128
Joined: Sun Sep 24, 2006 5:47 pm

Re: Bulldozer Clarifications

Postby CarlosTex » Wed Dec 09, 2009 12:06 am

JF your explanation was the most clarifying about bulldozer so far. Personally i think i´ve understood pretty much the idea and the concept of bulldozer, it will be a multi-threading monster for sure while saving A LOT of die space, performance per watt will be dramatically improved.

But i have a Big concern, i´m not an engineer but everytime i look into a bullduzer module and i think about it i get nervous. I know that the future is multi-threading but i´m kinda afraid that single thread integer performance won´t be amazing. OK i know that integer execution will be improved against K10.5 but IMHO the logic on these integer cores doesn´t look like it will be enough to rival Intel SandyBridge. I mean Nehalem already has got an edge against K10.5 on integer execution and i don´t know if these improvements on bulldozer will be enough to counter Intel´s SandyBridge. Is everybody sure that 4 pipelines per integer core will be enough? The beauty of this micro-architecture is that it´s easy to throw more hardware power to the design without having to redesign the whole thing. Maybe when bulldozer gets a 22nm shrink they can think of putting at least 6 pipelines per integer core and double the size of L1 caches both instruction and data, increasing the load/store capabilities to keep those pipelines working, and bigger L2 and L3. I don´t think this is so difficult to make because a redesign isn´t necessary, and at least AMD could keep improving its architectures with each shrink. For Intel this is a "tock" to a "tick".

Another small concern is clock speed and how well will the performance scale with clock increase. However i really really hope those lonely 4 integer pipelines will be enough. When i look at them in the module diagrams that are out there i always think of the first superscalar architectures i mean that looks to simple to hope for the performance increase we are expecting.

Remember that i´m talking about single-threaded integer performance, i´m not talking about the FPU.

Oh and makes sense that those two integer cores in a module can´t work together because they are separated but feels like were wasting that circuitry power when doing single-thread. Maybe if the L1 data was shared they could work together i don´t know but i feel that it´s a waste.

Anyway if you totally disagree with my concern or share it the same way i do, please comment!

Thanks
CarlosTex
K7 Athlon (Argon) Junior Boarder
K7 Athlon (Argon) Junior Boarder
 
Posts: 332
Joined: Wed Nov 18, 2009 6:44 pm

Re: Bulldozer Clarifications

Postby JF-AMD » Wed Dec 09, 2009 12:14 am

abinstein wrote:
piesquared wrote:So JF, I guess that means that Interlagos and Valencia are Bulldozer are 8 and 4 modules on the same die (not package) connected by HT? That still shouldn't affect yield, correct? Maybe you can't answer that though.


...

My guess is Interlagos is two 8-core connected in the same way as Magny-Cours. Valencia should be a 8-core die by itself.

Of course, this allows better yield than making a single die of 16 cores.


Correct. Valencia is a single die, 4 modules, 8 total cores. Interlagos is two valencias connected via HT in a single package.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby JF-AMD » Wed Dec 09, 2009 12:25 am

CarlosTex wrote:JF your explanation was the most clarifying about bulldozer so far. Personally i think i´ve understood pretty much the idea and the concept of bulldozer, it will be a multi-threading monster for sure while saving A LOT of die space, performance per watt will be dramatically improved.

But i have a Big concern, i´m not an engineer but everytime i look into a bullduzer module and i think about it i get nervous. I know that the future is multi-threading but i´m kinda afraid that single thread integer performance won´t be amazing. OK i know that integer execution will be improved against K10.5 but IMHO the logic on these integer cores doesn´t look like it will be enough to rival Intel SandyBridge. I mean Nehalem already has got an edge against K10.5 on integer execution and i don´t know if these improvements on bulldozer will be enough to counter Intel´s SandyBridge. Is everybody sure that 4 pipelines per integer core will be enough? The beauty of this micro-architecture is that it´s easy to throw more hardware power to the design without having to redesign the whole thing. Maybe when bulldozer gets a 22nm shrink they can think of putting at least 6 pipelines per integer core and double the size of L1 caches both instruction and data, increasing the load/store capabilities to keep those pipelines working, and bigger L2 and L3. I don´t think this is so difficult to make because a redesign isn´t necessary, and at least AMD could keep improving its architectures with each shrink. For Intel this is a "tock" to a "tick".

Another small concern is clock speed and how well will the performance scale with clock increase. However i really really hope those lonely 4 integer pipelines will be enough. When i look at them in the module diagrams that are out there i always think of the first superscalar architectures i mean that looks to simple to hope for the performance increase we are expecting.

Remember that i´m talking about single-threaded integer performance, i´m not talking about the FPU.

Oh and makes sense that those two integer cores in a module can´t work together because they are separated but feels like were wasting that circuitry power when doing single-thread. Maybe if the L1 data was shared they could work together i don´t know but i feel that it´s a waste.

Anyway if you totally disagree with my concern or share it the same way i do, please comment!

Thanks


Worrying about single threaded performance in 2011-2013 is going to be like worrying about what to do with your cassette player in 1992 after world had switched to CD's. Somewhere there was a guy saying "but there are tons of cassette players out there, if you don't release your music on cassettes you are missing out."

Go open up task manager and see how many processes your system is running right now with just a web browser open. Do you think this is going to get better? Do you expect that number to go down? What people don't realize, even in a desktop environment, is that threading matters. How well does your processor execute a single thread when there are 87 processes running?

Of course you don't have to believe me. I will have more cores, so obviously I have a vested interest. Now, if there was a different company out there with processors that have fewer cores and higher clock speed, you'd probably see them telling everyone that single thread speed matters.

A ferrari is really fast on an open road. It is really slow when it is stuck on a surface street at rush hour surrounded by hondas and hyundais. Single thread speed is really interesting for benchmarks, but in 2 years, it will be the cassette tape of our time. And I can guarantee you that someone will be real concerned about it. And you can still buy vinyl records if you want them.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/

Follow AMD Opteron on Twitter: @JF_AMD
User avatar
JF-AMD
XIP
 
Posts: 1832
Joined: Thu Apr 23, 2009 7:27 am

Re: Bulldozer Clarifications

Postby kaa » Wed Dec 09, 2009 3:14 am

Amdahl's Law, JF-AMD. Amdahl's Law.

SINGLE THREAD PERFORMANCE, ESPECIALLY FOR INTEGER WORKLOADS, WILL =ALWAYS= MATTER.
kaa
K8 Athlon 64 (San Diego) Expert Boarder
K8 Athlon 64 (San Diego) Expert Boarder
 
Posts: 1925
Joined: Sun Mar 07, 2004 9:49 am

Re: Bulldozer Clarifications

Postby Edyros » Wed Dec 09, 2009 7:08 am

At this point, knowing that Bulldozer is AMD's brand new architecture, it is naive to believe that they did not improve single threaded performance and It would be a HUGE mistake if that was the case.
Edyros
K8 Athlon 64 (Orleans) Expert Boarder
K8 Athlon 64 (Orleans) Expert Boarder
 
Posts: 2257
Joined: Wed Nov 10, 2004 10:38 am
Location: Anywhere in the World

Next

Return to K10: Barcelona, Shanghai, Quad-Core Opteron, Phenom

Who is online

Users browsing this forum: No registered users and 9 guests