bootstrap wrote:

The question is this. Why is there no 256-bit (ymm register) version of the dot product for f64 variables? There is a 256-bit (ymm register) dot product for f32 variables. Seems very strange. I recall no other case where that difference exists. Just for fun I tried putting these 4 lines in a test function and sure enough, the assembler generated an error for only the last of these 4 lines:

vdpps $0x77, %xmm2, %xmm1, %xmm0 # 128-bit xmm register dot product for f32 variables

vdpps $0x77, %ymm2, %ymm1, %ymm0 # 256-bit ymm register dot product for f32 variables

vdppd $0x77, %xmm2, %xmm1, %xmm0 # 128-bit xmm register dot product for f64 variables

vdppd $0x77, %ymm2, %ymm1, %ymm0 # 256-bit ymm register dot product for f64 variables

intel's documentation does note that vdppd is not present for avx. http://software.intel.com/file/41604 (pg 538)

looking at the documentation, seems like vdpps http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_avx_dp_ps.htm for avx calculates only for the lower 4 floats; number of operations on 4 floats (in both 128/256bit for f32).

i am throwing darts in the dark:

so for f64, the choice may have been keeping the same number of operations for both 128/256bit or change behavior of f64 and update to operating 4 doubles. And they didnot choose either