Text Size

K10: Barcelona, Shanghai, Quad-Core Opteron, Phenom

Agner's CPU Blog: The Instruction Set War

AMD's latest line-up of CPUs including Shanghai Opteron, Phenom and Phenom II X4 & X3, Athlon X2.

Agner's CPU Blog: The Instruction Set War

Postby Mechromancer » Thu Dec 10, 2009 5:03 am

This is a REALLY good read for AMD users especially. It opened my eyes: http://www.agner.org/optimize/blog/read.php?i=25. AMD CPUs are screwed on two fronts, the instruction sets and the compilers. This needs to change.

An excerpt:
There is an almost invisible war going on between Intel and AMD. It's the game of who is defining the new additions to the x86 instruction set. This war has been going on behind the scenes for years without being noticed by the majority IT professionals. Most programmers don't care what is going on at the machine code level, so they can't see all the ridiculous consequences that this war has. Those working with virtualization may have noticed that Intel and AMD processors are incompatible when it comes to virtualization software, but this is only one of the more visible consequences of the conflict.
Some important battles

Traditionally, Intel has been the market leader, defining the instruction set for each new generation of microprocessors: 8086, 80186, 80286, 80386, etc. Each new instruction set is a superset of the previous one so that the backwards compatibility is maintained.

Intel's main competitor, AMD, has tried several times to gain the lead by defining their own extensions to the x86 instruction set. In 1998, AMD was the first to introduce Single-Instruction-Multiple-Data (SIMD) instructions in their so-called 3DNow instruction set. Intel never supported the 3DNow instructions. Instead, they introduced the SSE instruction set a few years later. SSE does essentially the same thing as 3DNow, but with a larger register size. Clearly, Intel had won and AMD had to support SSE because it was better than 3DNow.

In 2001, Intel launched their first 64-bit processor named Itanium with a new parallel instruction set. Instead of accepting the new Itanium instruction set, AMD developed their own 64-bit instruction set which - unlike the Itanium - was backwards compatible with the x86 instruction set. The market favored the backwards compatibility so AMD won this time and Intel had to support the AMD64, or x86-64, instruction set in their next processor.

The next important battle is going on right now. It's about instructions with more than two operands. The industry has recognized a need for fused multiply-and-add instructions (e.g.: D=A*B+C) and several other instructions with more than two operands. The current coding scheme supports only instructions with two operands, so a new coding scheme has to be invented in order to support instructions with more than two operands. AMD came first with a proposal. In August 2007, AMD announced a future instruction set called SSE5 with a new coding scheme. The early disclosure of AMD's intentions was a break with the previous policy where both companies had kept their intentions secret as long as possible. Intel's reply came in April 2008 with an early (probably premature) disclosure of their planned AVX instruction set. Intel's AVX coding scheme was much more flexible and future-oriented than AMD's SSE5 scheme, as I argued in a public discussion forum. Most importantly, the AVX scheme has room for future extensions of the size of the SIMD vector registers, while the SSE5 scheme has little room for any future extensions. It was pretty obvious that Intel had won this time, and thanks to the early disclosure of Intel's AVX instructions, it was not too late for AMD to change their plans. In May 2009, AMD published a revision of their plans where they modified the coding scheme for better compatibility with AVX. In addition to a full support of AVX, the revised AMD plan contains most of the original SSE5 instructions under the new name XOP and with the new coding scheme. Unfortunately, Intel had changed their plans in the meantime! In December 2008, Intel published a revision of their plans which involved a change of the coding of the fused multiply-and-add (FMA) instructions. Now it was too late for AMD to change their design once more, so the first AMD processors with FMA will follow the premature Intel specification rather than Intel's later revision. It is difficult to obtain compatibility when you are following a moving target.
Mechromancer
 
Posts: 78
Joined: Mon Aug 31, 2009 1:00 am

Re: Agner's CPU Blog: The Instruction Set War

Postby wuttz » Thu Dec 10, 2009 5:18 am

just one of the benefits of having majority market share, being able to dictate the instruction set specs. i wonder if amd can pull their "fusion" products to launch earlier, so as to be able to dictate the standard instruction sets for cgpu processors.
Image
wuttz
K8 Athlon 64 X2 (Toledo) Elite Boarder
K8 Athlon 64 X2 (Toledo) Elite Boarder
 
Posts: 3027
Joined: Sat Aug 08, 2009 6:48 pm
Location: Pearland, Texas

Re: Agner's CPU Blog: The Instruction Set War

Postby scientia » Thu Dec 10, 2009 6:09 am

Mechromancer wrote:This is a REALLY good read for AMD users especially. It opened my eyes: http://www.agner.org/optimize/blog/read.php?i=25. AMD CPUs are screwed on two fronts, the instruction sets and the compilers. This needs to change.

AMD's Bulldozer will end up more advanced than whatever Intel has. I don't know how this would make AMD screwed.
User avatar
scientia
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 3986
Joined: Thu Mar 25, 2004 5:42 am
Location: Indiana USA

Re: Agner's CPU Blog: The Instruction Set War

Postby BaronMatrix » Thu Dec 10, 2009 6:33 am

He's just like the rest of the unwashed masses. It's all AMDs fault. Even x64
MSI 790GX X4 940 - @ 3GHz - AC64 HSF - HD 4870 512MB - 8GB DDR2 - 1TB HDD - BluRay - TX650 W - Antec 300 - 2X ACER 23"
HP DV7-3065 - M600 - 17" LCD - BD-Rom - HDMI\eSATA - .5TB HDD - HD4200
User avatar
BaronMatrix
K8 Athlon 64 (San Diego) Expert Boarder
K8 Athlon 64 (San Diego) Expert Boarder
 
Posts: 1949
Joined: Mon Jan 29, 2007 3:13 am

Re: Agner's CPU Blog: The Instruction Set War

Postby AussieFX » Thu Dec 10, 2009 7:02 am

BaronMatrix wrote:He's just like the rest of the unwashed masses. It's all AMDs fault. Even x64

Huh?

Agner is the guy who exposed intels cpuid compiler flags and then got stuck into Francois, he's a good guy.

I believe he's right when he pushes for a standard with instruction sets, it's getting crazy as I wouldn't have a clue what SSE4a does. :roll:
AMD and intel are equally to blame in this mess.
Sent from my flippy phone thingy using TAPATALK HD_2016.1


Image
Nikon D7000 / Nikon D5000
User avatar
AussieFX
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 8116
Joined: Fri May 11, 2007 1:50 pm
Location: I wish I knew...

Re: Agner's CPU Blog: The Instruction Set War

Postby abinstein » Thu Dec 10, 2009 7:06 am

It is a waste of time to read his blog.

1. It's way too long and unstructured.

2. He is wrong about SSE5 vs AVX anyway. He has this weird idea that AVX is more flexible. The fact is, AVX is nothing but a recoding of original SSE's plus FMA, while SSE5 includes not just FMA but also XOP and CVT instructions. SSE5 is actually a cleaner and more flexible design than AVX.

3. He call AMD's FMA4 "premature" simply because Intel is NOT capable of implementing their original 4-argument design, and had to fall back to FMA3. So he thinks it's AMD's fault that Intel failed to stick to their official promises. How twisted is this mind?

The fact is, even as of this date, we don't know what types of FMA3 will Intel implement in their next generation processors (see the "Extensible architecture -- more feature..." line in the slide below). According to the slide below (IDF 2009), not even Sandy Bridge will have FMA.

FMA_SandyBridge.jpg
FMA_SandyBridge.jpg (158.49 KiB) Viewed 11761 times


And this guy Agner is claiming that AMD is screwed because Bulldozer will have FMA4 ("premature") while Intel will have none?
abinstein
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 7177
Joined: Sat Oct 30, 2004 9:49 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby abinstein » Thu Dec 10, 2009 7:30 am

AussieFX wrote:I believe he's right when he pushes for a standard with instruction sets, it's getting crazy as I wouldn't have a clue what SSE4a does. :roll:

As long as Intel abuse their monopoly power to deny credits to AMD's innovation (AMD64 being the prominent example; also SSE4a and SSE5), there is no way these two can have "standard" instruction sets without AMD tracing Intel's monopoly tails.

SSE4a consists of 4 instructions which are very useful for server workloads. Two of them are combined memory writes; two are bit manipulation. I hate to see people criticize it without the slightest idea of what it does.
http://forums.amd.com/devblog/blogpost. ... adid=88051

And how many really knows what SSE4.1 and SSE4.2 do? Please go take a look and see if you can get any sense out of those instructions, other than features designed specifically for media benchmark(et)ing. In fact, most of Intel's SSE4 functions can be performed by just one or two SSE5 instructions, with better flexibility. And media codecs run faster on GPUs anyway...
abinstein
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 7177
Joined: Sat Oct 30, 2004 9:49 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby AussieFX » Thu Dec 10, 2009 8:04 am

abinstein wrote:
And how many really knows what SSE4.1 and SSE4.2 do? Please go take a look and see if you can get any sense out of those instructions, other than features designed specifically for media benchmark(et)ing. In fact, most of Intel's SSE4 functions can be performed by just one or two SSE5 instructions, with better flexibility. And media codecs run faster on GPUs anyway...

That's why it's so crazy because it's all about benchmarketing as far as intel is concerned and this is why they have monopoly powers. (even if the regulators don't think so)
As it stands AMD don't have a chance with any instruction sets they propose because when 80% of the market is intel the programmers will naturally code for the majority. :roll:

The only reason AMD64 ever got through is because the community could see what was going to happen with Itanic being pushed towards the desktop and people got angry.
Then there is the whole Microsoft thing but I don't think they wield that sort of power anymore.

Surely some standard (like jedec) is needed, ok intel have hijacked jedec too but at least AMD have a voice and intels benchmarketing would be curbed because AMD would know in advance of intels proposals.

I read Agners proposals on a thread at Aces, it is better explained there where others have had some input. I haven't read Agners blog because I usually find it way over my head.

EDIT: There is also the new patent sharing arrangement but are instruction sets included in that?
Sent from my flippy phone thingy using TAPATALK HD_2016.1


Image
Nikon D7000 / Nikon D5000
User avatar
AussieFX
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 8116
Joined: Fri May 11, 2007 1:50 pm
Location: I wish I knew...

Re: Agner's CPU Blog: The Instruction Set War

Postby gruffi » Thu Dec 10, 2009 8:43 am

BaronMatrix wrote:He's just like the rest of the unwashed masses. It's all AMDs fault. Even x64

I think you misunderstand Agner Fog. He is a very knowledgeable guy with integrity, not known for bashing AMD like the typical Intel trolls (George Ou etc). I agree with Agner in every aspect. A standardization committee for x86 is really desirable.

abinstein wrote:He has this weird idea that AVX is more flexible.

He speaks about the coding scheme, not the functionality. And he is right. E.g. AVX has more space for future extensions.

abinstein wrote:The fact is, AVX is nothing but a recoding of original SSE's plus FMA, while SSE5 includes not just FMA but also XOP and CVT instructions.

I don't know if Bulldozer will have CVT16 at all. It's not really that important. Half precision is rarely used. Common programming languages like C or C++ even don't have fundamental half precision data types. And yes, SSE5 may have more functionality, but it's still 128-bit. It's a little bit like 3DNow! and SSE. 3DNow! was the first and important specification to get the ball rolling. Finally SSE was the stronger and more future-oriented specification.

abinstein wrote:He call AMD's FMA4 "premature" simply because Intel is NOT capable of implementing their original 4-argument design, and had to fall back to FMA3. So he thinks it's AMD's fault that Intel failed to stick to their official promises.

No. He never called AMD's FMA4 premature. He said Intel's first AVX specification was premature. He just wanted to point out the fact that AMD will not have FMA3 in Bulldozer, only FMA4. OTOH Intel will not have any FMA functionality in Sandy Bridge. And software developers mostly avoid vendor-specific stuff. It's a pragmatic statement, not blaming AMD. This pain of incompatibility is what the article is about.
Tradition is not holding the ashes but passing the flame.
User avatar
gruffi
K6-III Fresh Boarder
K6-III Fresh Boarder
 
Posts: 252
Joined: Sun Jun 21, 2009 12:39 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby abinstein » Thu Dec 10, 2009 9:05 am

gruffi wrote:I think you misunderstand Agner Fog. He is a very knowledgeable guy with integrity, not known for bashing AMD like the typical Intel trolls (George Ou etc). I agree with Agner in every aspect. A standardization committee for x86 is really desirable.

I respect your decision but you have to know that whoever he is, a BS comment is a BS comment from anyone. Even if that person is someone that you previously admire.

He speaks about the coding scheme, not the functionality. And he is right. E.g. AVX has more space for future extensions.

Where is his proof that SSE5 has less space for future extension?
There is none, because it is not true. Coding scheme is nothing but a convention. AVX simply replaces 66h/f2h/f3h with a single unused byte. Is that something "better"? No, it's just that Intel has that controlling power to break compatibility and to do whatever they want with x86.

The backward compatible "AVX" coding is very ugly and inefficient compared to SSE5. The "short" AVX coding is not backward compatible. Please take a look at Intel's AVX documentation. It's all there.

And yes, SSE5 may have more functionality, but it's still 128-bit. It's a little bit like 3DNow! and SSE. 3DNow! was the first and important specification to get the ball rolling. Finally SSE was the stronger and more future-oriented specification.

I can easily expand SSE5 registers into 256 bits. Do you care if I do so? I bet you don't, because I am not Intel. In terms of instruction encoding, whether 128-bit or 256-bit registers are used makes more marketing than technical difference.

I also disagree with the comparison of SSE5 with 3DNow. If you take a look at these two, you can easily find that SSE5 is a LOT more revolutionary than 3DNow in every aspect. Not just the way instructions are encoded, but also the scope and the quantity of the instructions.

In contrast, what is AVX other than a 256-bit recoding of existing SSE's? Oh yeah!? It's so great!? On what? A shorter prefix? A special field for 256-bit registers? Big deal! A CENG grad student can do that.

But I assure you few could design the set of instruction extensions as neat and elegant as SSE5. It's just sad that all those criticizing SSE5 out there simply don't understand this extension.

No. He never called AMD's FMA4 premature. He said Intel's first AVX specification was premature. He just wanted to point out the fact that AMD will not have FMA3 in Bulldozer, only FMA4. OTOH Intel will not have any FMA functionality in Sandy Bridge. And software developers mostly avoid vendor-specific stuff. It's a pragmatic statement, not blaming AMD. This pain of incompatibility is what the article is about.

He is barking at the wrong tree.

Blame Intel for its incompetence in sticking with its original FMA4 promises. Blame Intel for its regression of not implementing FMA in Sandy Bridge. How can something that Intel proposes be implemented by AMD first? Simply put, Intel has its a** kicked by AMD in terms of ISA innovation. The only way for Intel to save face is to purposely downgrade its own technology to create this "incompatibility" which induces pain to software developers. This Intel company is really regressive, and it is stalling the innovation that itself promised to make, isn't it?

"Complaining about ISA incompatibility" is totally inappropriate in this case. AMD took the pain to make their ISA extension compatible to Intel's original official document. It is Intel that failed to execute. The entire incompatibility is created by this evil empire which knows nothing but benchmarketing. When it comes to real innovation, it lags behind and uses its army of viral marketer to blame others (alright, not blaming others, but making "pragmatic statement" of "pain"). :roll:
abinstein
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 7177
Joined: Sat Oct 30, 2004 9:49 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby gruffi » Thu Dec 10, 2009 10:48 am

abinstein wrote:Where is his proof that SSE5 has less space for future extension?

Look into the SSE5 and AVX specifications. You will see that AVX has more space for future extensions. SSE5 has one opcode byte (actually only 5 bits), AVX has one full opcode byte plus several bits in the VEX prefix. The VEX prefix gives more flexibility. Something that doesn't exist in SSE5.

abinstein wrote:I can easily expand SSE5 registers into 256 bits.

Do you live in the subjunctive? SSE5 registers are 128-bit. End of story.

abinstein wrote:I also disagree with the comparison of SSE5 with 3DNow. If you take a look at these two, you can easily find that SSE5 is a LOT more revolutionary than 3DNow in every aspect. Not just the way instructions are encoded, but also the scope and the quantity of the instructions.

I didn't speak about contents, but market significance.

abinstein wrote:But I assure you few could design the set of instruction extensions as neat and elegant as SSE5. It's just sad that all those criticizing SSE5 out there simply don't understand this extension.

You don't understand. No one criticizes SSE5.


There is really no reason to rant against Agner or someone else. Believe what you want, but he is not the person you think he is. His statements are objective and not blaming AMD in any way. I think he even appreciates AMD's effort to release SSE5 for new x86 innovations and to adopt AVX for compatibility despite of Intel's weird specification strategy. And I do so as well. His point is that the ISA incompatibilities hurt the market, software developers and customers alike. And we must find a way to overcome this.
Tradition is not holding the ashes but passing the flame.
User avatar
gruffi
K6-III Fresh Boarder
K6-III Fresh Boarder
 
Posts: 252
Joined: Sun Jun 21, 2009 12:39 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby superrugal » Thu Dec 10, 2009 12:58 pm

You guys are in heated discussion ,I can hardly catch up with your statements and tell all apart......
I just discribe my concept : AVX is a Structure , SSE5 is a group of Instructions.

XOP Instructions come from the part of SSE5 that SSE4 and SSSE3 don't have. SSE5 in BD will be rewrite into 256bit instructions to adapt AVX Structure...

If I was wrong please correct me...

Here is a picture explain what SSE5 is :

Image
superrugal
K5 Fresh Boarder
K5 Fresh Boarder
 
Posts: 103
Joined: Tue Nov 17, 2009 9:39 am

Re: Agner's CPU Blog: The Instruction Set War

Postby Boundless » Thu Dec 10, 2009 4:46 pm

superrugal: > You guys are in heated discussion...

It's pretty heated wherever that blog is discussed, although
a few flamers seem to have some vague awareness of
what their PC would be if AMD hadn't extended x86-32.

> I can hardly catch up with your statements and tell all apart...

It's pretty simple:
AMD wants the XIU instructions.
spIntel wants the HNI instructions.

The problem, like orbital space junk, is real. x86 CISC has
now sailed right through HISC (Humongous Instruction Set Computing)
and is headed to LISC (Ludicrous Instruction Set Computing).

I could see a time when a new die is mostly non-legacy, with
one legacy core for running x86-32 code (as long as the
performance wan't Itanic-slow).
______
XIU: eXtend Isa Usefully.
HNI: Halt if Not Intel (because it's now too late for Plan A,
where IA-64 is dominant, and x86 is crippled to make it so).
User avatar
Boundless
K8 Athlon 64 (Orleans) Expert Boarder
K8 Athlon 64 (Orleans) Expert Boarder
 
Posts: 2023
Joined: Mon Mar 01, 2004 4:40 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby gruffi » Thu Dec 10, 2009 5:22 pm

abinstein wrote:Now tell me, should Agner complain about his pain, or should he really question Intel's motives?

You can philosophize about guilt as long as you want. It's not purposeful.

abinstein wrote:While I agree it is important for the industry to have a compatible standard, I feel Agner's comments on this matter to be very hypocritical. The source of this problem is clearly Intel. Yet he completely ignores it.

You don't understand him. The article is not about SSE5 or AVX. These are just two examples.

abinstein wrote:He's been praising Intel and criticizing AMD on this matter like some yelper bowing to the bully.

Sry, but I see nothing of both.

abinstein wrote:However you like Agner, it doesn't change the fact that he completely miss the source of the problem.

No, he doesn't miss it. It's just irrelevant for his statements.
Tradition is not holding the ashes but passing the flame.
User avatar
gruffi
K6-III Fresh Boarder
K6-III Fresh Boarder
 
Posts: 252
Joined: Sun Jun 21, 2009 12:39 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby scientia » Thu Dec 10, 2009 6:44 pm

I don't understand this thread at all. Most of the comments make no sense.

SSE5 versus AVX is a false choice. AMD was trying to fit SSE5 into the existing one byte extension that they had included with AMD64. When Intel gave the specs for AVX which included additional prefix bytes AMD was happy. They simply took the AVX format and remapped most of the SSE5 instructions into the larger opcode space. No problem. Secondly, AMD picked up the more robust FMA4 which is better than AMD's original SSE5 format. FMA3 is worse than the SSE5 format so there would be no reason for AMD to follow it. Using FMA4 is the right choice.

So, where does all the conflict come from? Apparently Agner would prefer that the x86 instruction set were not forked. The problem is that someone has to introduce an instruction set first, so any change at all is a fork until it is followed. Would it be better if new x86 instructions were agreed upon and released at the same time as happens with HyperTransport and PCIe? Of course it would. And is AMD half to blame for this? Unlikely. It's hard to imagine that AMD would resist a cooperative extension process. Intel on the other hand has a long history of going its own way and getting its own way. How can I be sure that AMD isn't to blame? Because Microsoft, compiler makers, and Linux developers would all be in favor of standards; it is not conceivable that AMD could stand against that alone, but Intel could.

Intel's decision to drop FMA4 in favor of FMA3 is irrational at best. The only reason I can think of why they would do it is because someone ran a simulation and realized that their substructure can't handle the additional register load. Specifically, this would be a problem if the execution units ended up with a one cycle delay because of having to add more operands. So why doesn't it affect AMD? It is possible that AMD started working on this back in 2005 and already had the architecture figured out before the SSE5 announcement in 2007. If Intel had not done any design work and was simply releasing a preliminary spec for AVX, it is possible that they discovered that it conflicted with design work already in progress. They may have been unwilling to push a release date back to gain the benefit of a more robust instruction set.

I think is just part of the cowboy mentality of Intel that if they go their own way they can ambush AMD and squeeze out a little more profit. Standards would help AMD so that tends to go against the grain of Intel thinking. However, sooner or later Intel should realize that their options are getting narrower everyday. The grand adventure of Itanium and the dream of a platform all to itself is dying. The dream of a low power processor that could open up new sources of revenue was an illusion. The dream of a video processor that would sweep the market born from the far flung ambitions of the TeraFlop project has turned into a bitter reality of second rate performance. The days when Intel had complete platforms while AMD was left waiting on third party support is also coming to close; with desktop and mobile in hand, the sever platform will be the final piece of the set. Sooner or later, Intel is going to have to realize that x86 is pretty much its only asset and that cooperation with AMD would be better than damaging that asset.
User avatar
scientia
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 3986
Joined: Thu Mar 25, 2004 5:42 am
Location: Indiana USA

Re: Agner's CPU Blog: The Instruction Set War

Postby superrugal » Thu Dec 10, 2009 7:14 pm

scientia wrote:
Intel's decision to drop FMA4 in favor of FMA3 is irrational at best. The only reason I can think of why they would do it is because someone ran a simulation and realized that their substructure can't handle the additional register load. Specifically, this would be a problem if the execution units ended up with a one cycle delay because of having to add more operands. So why doesn't it affect AMD? It is possible that AMD started working on this back in 2005 and already had the architecture figured out before the SSE5 announcement in 2007. If Intel had not done any design work and was simply releasing a preliminary spec for AVX, it is possible that they discovered that it conflicted with design work already in progress. They may have been unwilling to push a release date back to gain the benefit of a more robust instruction set.


About FMA3 and FMA4 ,I want to quote some information on it.

FMA is a mixture of multiplication and addition operations, it will multiply the first and second number , and then add the third number. Looks like: (A * B) + C . The difference here is that the number of operations is three times(FMA4) or four times(FMA3).

For example there are three registers: A,B,C. Then FMA3 will act:(A * B) + C = "result" and copy into C register. The AMD version of FMA , FMA4 will get the results into the fourth register D : (A * B) + C = D. In fact, in the end of the FMA3 calculation , we also need to cover the data on the C register.

If you ask AMD engineers , they will tell you , FMA4 can help you save the last step of copy operation. And Intel's engineers will tell you , FMA3 can use fewer registers to complete the operation. So one thing we need to figure out is , if you need to do a lot of computing tasks , such as (A * B) + C , AMD's FMA4 will help you save more clock cycles , but more transistors consumption . In other words, these two have their advantages and disadvantages.
Last edited by superrugal on Thu Dec 10, 2009 7:41 pm, edited 1 time in total.
superrugal
K5 Fresh Boarder
K5 Fresh Boarder
 
Posts: 103
Joined: Tue Nov 17, 2009 9:39 am

Re: Agner's CPU Blog: The Instruction Set War

Postby maduroutmb » Thu Dec 10, 2009 8:28 pm

The real problem is that people depend upon closed-source, precompiled binaries. It then follows that for AMD to compete against Intel's games with the instruction set it must not only develop its own compiler or be closely involved with an open source compiler (Open64 or a fork/extensions for GCC) BUT ALSO depend upon developers to provide binaries and support for Intel and AMD.
Last edited by maduroutmb on Thu Dec 10, 2009 8:34 pm, edited 1 time in total.
Phenom 9950 BE @2.52 GHz/1.2125V (Nirvana NV120), 790GX, 8 GB DDR2-800 @4-4-4-12/2.2V, 2*4850 1GB @0.625 GHz
Fedora 14 KDE 64/Windows 7 64
User avatar
maduroutmb
K7 Athlon (Argon) Junior Boarder
K7 Athlon (Argon) Junior Boarder
 
Posts: 361
Joined: Mon May 05, 2008 3:03 am
Location: Galveston, TX

Re: Agner's CPU Blog: The Instruction Set War

Postby abinstein » Thu Dec 10, 2009 8:28 pm

superrugal wrote:If you ask AMD engineers , they will tell you , FMA4 can help you save the last step of copy operation. And Intel's engineers will tell you , FMA3 can use fewer registers to complete the operation. So one thing we need to figure out is , if you need to do a lot of computing tasks , such as (A * B) + C , AMD's FMA4 will help you save more clock cycles , but more transistors consumption . In other words, these two have their advantages and disadvantages.

You don't need to use more registers for FMA4. You can always issue an FMA4 command as:

Code: Select all
vfmaddpd C, A, B, C

which will do C += (A * B)
abinstein
K8 Opteron (SledgeHammer) Moderator
K8 Opteron (SledgeHammer) Moderator
 
Posts: 7177
Joined: Sat Oct 30, 2004 9:49 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby hyc » Thu Dec 10, 2009 10:33 pm

maduroutmb wrote:The real problem is that people depend upon closed-source, precompiled binaries. It then follows that for AMD to compete against Intel's games with the instruction set it must not only develop its own compiler or be closely involved with an open source compiler (Open64 or a fork/extensions for GCC) BUT ALSO depend upon developers to provide binaries and support for Intel and AMD.


Closed-source is an anomaly that will soon be irrelevant. With LLVM progressing on its disassembly tools, you'll be able to decompile any binary into IR and then recompile it optimized for any other target.
hyc
K8 Athlon 64 (Winchester) Expert Boarder
K8 Athlon 64 (Winchester) Expert Boarder
 
Posts: 1349
Joined: Mon Jan 16, 2006 4:38 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby Mechromancer » Fri Dec 11, 2009 1:07 pm

HOW MANY OF YOU READ THE ENTIRE BLOG BEFORE POSTING?!

Reading some of these premature comments really pissed me off. Agner is trying to get the point across that Intel repeatedly screws AMD and VIA by monopolizing instruction sets and compilers to favor whatever crap they come up with. Yes, AMD even decided to go the route of Intel's AVX to make sure future CPUs were properly compatible with Intel optimized software, BUT Intel changes the AVX spec very late just to make sure AMD gets properly screwed. Software developers are forced to pick a side and they rightfully side with the older, larger company, Intel. There really does need to be an instruction set specification group to make a STANDARD that everybody follows. Instructions sets are like APIs in the GPU industry. We should only have one or two that all manufacturers fully support. If you want to add your own (like Glide) well that's up to you, but DX and OpenGL support won't be questioned. This makes it a lot easier for software developers to optimize code. Imagine if ATI, Nvidia, and VIA had their own 3D APIs (again) :shock: .

Abinstein, take a few steps back and reread the blog. Not everybody is out to get AMD (only Intel and Nvidia are). Read the ENTIRE blog before you go jihad again.
Mechromancer
 
Posts: 78
Joined: Mon Aug 31, 2009 1:00 am

Re: Agner's CPU Blog: The Instruction Set War

Postby gruffi » Fri Dec 11, 2009 1:34 pm

Thanks Mechromancer. I couldn't have said it better.
Tradition is not holding the ashes but passing the flame.
User avatar
gruffi
K6-III Fresh Boarder
K6-III Fresh Boarder
 
Posts: 252
Joined: Sun Jun 21, 2009 12:39 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby Эльбрус » Fri Dec 11, 2009 6:56 pm

abinstein wrote:Having a teacher there won't change anything.

Well ... depends if you have a powerful teacher, or a powerless one ...

If the teacher can sue Intel to pay a hefty fine, then I think a "teacher" would be a good idea and intel would be a nice student :mrgreen:

Maybe Agner should write to the European Commission and not in his blog :wink:
User avatar
Эльбрус
K7 Athlon XP (Palomino) Junior Boarder
K7 Athlon XP (Palomino) Junior Boarder
 
Posts: 463
Joined: Sat May 02, 2009 7:13 pm

Re: Agner's CPU Blog: The Instruction Set War

Postby wuttz » Fri Dec 11, 2009 7:33 pm

abinstein wrote:The situation is exactly as I said. If you can refute it in any way, please do. If you will call what George Washington did as "jihad," then I see no reason why not?


GW was a secret muslim too??! :mrgreen: :mrgreen: :mrgreen:

abinstein wrote: Intel doesn't even respect its own spec.


they backed down to FMA3, they aimed too high with AVX-FMA4.

abinstein wrote: What we need is to bring Intel's market share below 67%. Period.


this is why i thunk amd settled for pennies with regards to the antitrust complaints, to have access to the instruction sets for the dominant market(intel) for the next few years.
whats not clear to me is the strategy after those five years of access is over.i do think an open-to-deliberation standards "body" for instruction sets(like "ISO," ICANN) would definitely help put order, conformity and compatibility.
Image
wuttz
K8 Athlon 64 X2 (Toledo) Elite Boarder
K8 Athlon 64 X2 (Toledo) Elite Boarder
 
Posts: 3027
Joined: Sat Aug 08, 2009 6:48 pm
Location: Pearland, Texas

Re: Agner's CPU Blog: The Instruction Set War

Postby Montaray Jack » Fri Dec 11, 2009 10:29 pm

Why are you guys even arguing about this???
sse5 using the AVX FMA4 encoding is a done deal.
AMD's Engineers thought FMA4 encoding was a good idea too, otherwise they wouldn't have changed it.
No.6: “The whole earth as. . . `The Village'?”
No.2: “That is my hope. What's yours?”
No.6: “I'd like to be the first man on the moon!”
--Chimes of Big Ben
Montaray Jack
K8 Athlon 64 (Winchester) Expert Boarder
K8 Athlon 64 (Winchester) Expert Boarder
 
Posts: 1372
Joined: Sat May 30, 2009 11:29 pm
Location: The Village

Re: Agner's CPU Blog: The Instruction Set War

Postby agner » Sat Dec 12, 2009 9:17 pm

A friendly person has drawn my attention to this forum.

I am happy that you are discussing my proposal, but I am less happy to see that I have been misquoted beyond belief and my motives turned upside down by somebody who doesn't even understand my blog post.

Let me clarify some of the main issues of my blog post:

First the difference between the SSE5 and AVX coding schemes:
* The AVX scheme includes extra bits for specifying future extensions to register sizes of 256, 512, 1024 or more bits. The SSE5 scheme has no unused bits for this purpose.
* The AVX scheme extends the opcode space to make room for many thousand new instructions. The SSE5 scheme has only space for 256 instructions with 3 operands + 256 instructions with 3 operands and an immediate byte operand.
* The extra bits in the AVX scheme are obtained by replacing a lot of old escape bytes and prefix bytes with single bits in the AVX prefix.

After AMD had published their SSE5 and Intel had published their AVX specifications, I wrote in various forums that the AVX scheme was likely to win over the SSE5 scheme for technical reasons. I am not talking about which instructions are more useful, only the way they are coded. I also recommended that AMD should change their coding scheme before it was too late. Now I am praising AMD for actually doing so. I have no idea whether my comments had any influence on AMD's decision or they would have changed their coding scheme anyway.

I am criticising Intel for changing their FMA4 to FMA3 after AMD had copied their FMA4 specification. Now it't too late for AMD to change once again. I have no idea about why Intel are making this change.

I am speculating why AMD are not using any part of the huge AVX opcode space, but instead they are introducing their own XOP scheme, which is as close as you can get to the AVX scheme without permission from Intel. Are Intel not allowing AMD access to a fair share of the opcode space and thus forcing AMD to do weird tricks to find space for their new instructions?

I am deploring the forking of instruction sets for several reasons:
1. All additions to the instruction set are irreversible. The CPU vendors keep supporting obsolete instructions at a significant cost in terms of silicon space, performance and power consumption.
2. The development of new processors takes years. A company that has to copy the instruction innovations of its competitor will therefore always lag years behind. This is unfair competition.
3. The software industry is not very willing to support multiple incompatible instructions. It is possible to make software that branches to support multiple instruction sets by using so-called CPU dispatching, but this is so costly in terms of development, testing and maintenance costs that it is hardly ever done. I have never seen a program or function library with a CPU dispatcher that fully utilizes the capabilities of different incompatible x86 instruction sets.

Now, please make sure you understand my blog post before you criticize me. The URL is http://agner.org/optimize/blog/read.php?i=25
agner
 
Posts: 22
Joined: Fri Dec 11, 2009 8:47 am

Next

Return to K10: Barcelona, Shanghai, Quad-Core Opteron, Phenom

Who is online

Users browsing this forum: Bing [Bot], Yahoo [Bot] and 2 guests