Discussion:
Possible VC7.1 C++ Optimizer Error with intrinsic _byteswap_uint64 when compiler enters register contention (0/1)
(too old to reply)
Ivan S.Warren
2005-07-05 22:16:40 UTC
Permalink
Raw Message
Ladies & Gentlemen,

I have this issue with the VC7.1 compiler.

My application needs to store big endian 64 bit data into memory
arrays.

Unfortunatelly, it seems that when a certain number of complex
operations are performed before the _byteswap_uint64() function is
evaluated (leading to a possible contention in register usage), the
intrinsic generation of the _byteswap_uint64 gets confused as to what
register is actually holding the results.

I somehow managed to generate a testcase which may help tracking down
this issue.

In the attached testcase (which hopefully ought to allow reproducing
the problem), I noted in the generated assembly listing the following

(comments are my own) :

<DEMO>
13 : res = _byteswap_uint64(a & b);

00030 8b 0d 00 00 00
00 mov ecx, DWORD PTR _b
; Get value for "b" Low 32 bits
00036 23 c8 and ecx, eax
; Logical AND with value for "a" low 32 bits
; ECX is "res" low 32 bits
00038 a3 00 00 00 00 mov DWORD PTR _a, eax
; Save from stroui64 in a low 32 bits
0003d a1 04 00 00 00 mov eax, DWORD PTR _b+4
; Get value of b high 32 bits
00042 23 c2 and eax, edx
; Logical AND with high "a' value 32 bits
; EAX is "res" high 32 bits
00044 89 15 04 00 00
00 mov DWORD PTR _a+4, edx
; Save in "a" high 32 bits
0004a 0f c9 bswap ecx
; bswap "res" low 32 bits
0004c 0f c8 bswap eax
; bswap "res" high 32 bits
0004e 83 c4 18 add esp, 24 ;
00000018H
; Bump stack (not related to issue)

; 14 : memcpy((unsigned char *)z, &res, 8);

00051 89 15 00 00 00
00 mov DWORD PTR _z, edx
;
; Now.. What is EDX ???? This should be EAX !!!
00057 89 0d 04 00 00
00 mov DWORD PTR _z+4, ecx
; Save "res" big endiannised low 32 bits in
; z big endian low (tail portion)
</DEMO>

Usually moving things around a bit solves the issue.. But this makes
using _byteswap_uint64() non deterministic.. and, wel.. I don't like
that !

Questions
- Is this a bug or am I missing the point ?
- Is this a known issue (I tried looking for something similar to no
avail). ?
- Should I report this (and how? since this is a VC7.1 issue)
- Optionally, is there any known deterministic protection against
running into something like this ?

Thanks,

--Ivan
Ivan S.Warren
2005-07-05 22:24:39 UTC
Permalink
Raw Message
Addedum :

I put an attachment, but since it is a non-binary newsgroup, it seems
it has been stripped off (which is possible quite normal).

Therefore, here is the sample test case..

Compile with Full optimisation and with assembly listing..

<SAMPLE fn="testcase1.c">
#include <stdio.h>
#include <memory.h>
#include <stdlib.h>
static unsigned __int64 a,b;
volatile unsigned char z[8];
int main(int ac,char **av)
{
char *ptr;
int i;
unsigned __int64 res;
a=_strtoui64(av[1],&ptr,0);
a=_strtoui64(av[2],&ptr,0);
res = _byteswap_uint64(a & b);
memcpy((unsigned char *)z, &res, 8);
for(i=0;i<8;i++)
{
printf("%2.2X",z[i]);
}
printf("\n");
}
</SAMPLE>

--Ivan

On Wed, 06 Jul 2005 00:16:40 +0200, Ivan S.Warren
Post by Ivan S.Warren
Ladies & Gentlemen,
I have this issue with the VC7.1 compiler.
My application needs to store big endian 64 bit data into memory
arrays.
Unfortunatelly, it seems that when a certain number of complex
operations are performed before the _byteswap_uint64() function is
evaluated (leading to a possible contention in register usage), the
intrinsic generation of the _byteswap_uint64 gets confused as to what
register is actually holding the results.
I somehow managed to generate a testcase which may help tracking down
this issue.
In the attached testcase (which hopefully ought to allow reproducing
the problem), I noted in the generated assembly listing the following
<DEMO>
13 : res = _byteswap_uint64(a & b);
00030 8b 0d 00 00 00
00 mov ecx, DWORD PTR _b
; Get value for "b" Low 32 bits
00036 23 c8 and ecx, eax
; Logical AND with value for "a" low 32 bits
; ECX is "res" low 32 bits
00038 a3 00 00 00 00 mov DWORD PTR _a, eax
; Save from stroui64 in a low 32 bits
0003d a1 04 00 00 00 mov eax, DWORD PTR _b+4
; Get value of b high 32 bits
00042 23 c2 and eax, edx
; Logical AND with high "a' value 32 bits
; EAX is "res" high 32 bits
00044 89 15 04 00 00
00 mov DWORD PTR _a+4, edx
; Save in "a" high 32 bits
0004a 0f c9 bswap ecx
; bswap "res" low 32 bits
0004c 0f c8 bswap eax
; bswap "res" high 32 bits
0004e 83 c4 18 add esp, 24 ;
00000018H
; Bump stack (not related to issue)
; 14 : memcpy((unsigned char *)z, &res, 8);
00051 89 15 00 00 00
00 mov DWORD PTR _z, edx
;
; Now.. What is EDX ???? This should be EAX !!!
00057 89 0d 04 00 00
00 mov DWORD PTR _z+4, ecx
; Save "res" big endiannised low 32 bits in
; z big endian low (tail portion)
</DEMO>
Usually moving things around a bit solves the issue.. But this makes
using _byteswap_uint64() non deterministic.. and, wel.. I don't like
that !
Questions
- Is this a bug or am I missing the point ?
- Is this a known issue (I tried looking for something similar to no
avail). ?
- Should I report this (and how? since this is a VC7.1 issue)
- Optionally, is there any known deterministic protection against
running into something like this ?
Thanks,
--Ivan
Heinz Ozwirk
2005-07-06 06:48:10 UTC
Permalink
Raw Message
Post by Ivan S.Warren
a=_strtoui64(av[1],&ptr,0);
a=_strtoui64(av[2],&ptr,0);
res = _byteswap_uint64(a & b);
Do you really assign to a twice?

Heinz
Carl Daniel [VC++ MVP]
2005-07-06 13:29:06 UTC
Permalink
Raw Message
Post by Ivan S.Warren
I put an attachment, but since it is a non-binary newsgroup, it seems
it has been stripped off (which is possible quite normal).
Therefore, here is the sample test case..
[snip]

While it won't help you with VC7.1, I can tell you that your code is
complied correctly by VC8 beta 2 and by the latest builds.

-cd
Gary Chang[MSFT]
2005-07-06 10:07:27 UTC
Permalink
Raw Message
Hi Ivan,

I have already downloaded your sample project to our local machine, and
perform some tests(using "b=_strtoui64(av[2],&ptr,0");). The problem is the
oupput of the release version app(optimized) will always be invalid--zero,
meanwhile the debug version app works well. Please let me know if my test
results is different from yours.
Post by Ivan S.Warren
Unfortunatelly, it seems that when a certain number of complex
operations are performed before the _byteswap_uint64() function is
evaluated (leading to a possible contention in register usage), the
intrinsic generation of the _byteswap_uint64 gets confused as to what
register is actually holding the results.
I agree with you that the problem is caused by the incorrect optimization
to the program: in the release version the _byteswap_uint64 operation code
will be inlined to the caller program's code, and just as your sample
assembly code illustrated, the inline code has the error.

Based on my repeated test, I found the root cause of the misoptimization
appears to be av[1]/av[2], which the program arguments you used directly in
the _strtoui64 function call. If use a const integer or a character buffer
instead, then the release version program would work as well as the debug
version one, such as:

char bufA[8], bufB[8];
memcpy(bufA, av[1], 8);
memcpy(bufB, av[2], 8);

a=_strtoui64(bufA,&ptr,0);
b=_strtoui64(bufB,&ptr,0);
//res1 = a & b;
res = _byteswap_uint64(a & b);
..

Thanks!

Best regards,

Gary Chang
Microsoft Community Support
--------------------
Get Secure! ¡§C www.microsoft.com/security
Register to Access MSDN Managed Newsgroups!
http://support.microsoft.com/default.aspx?scid=/servicedesks/msdn/nospam.asp
&SD=msdn

This posting is provided "AS IS" with no warranties, and confers no rights.
imbecil
2011-08-28 00:32:15 UTC
Permalink
Raw Message
Ivan S.Warren wrote on 07/05/2005 18:16 ET
Post by Ivan S.Warren
Ladies & Gentlemen
I have this issue with the VC7.1 compiler
My application needs to store big endian 64 bit data into memor
arrays
Unfortunatelly, it seems that when a certain number of comple
operations are performed before the _byteswap_uint64() function i
evaluated (leading to a possible contention in register usage), th
intrinsic generation of the _byteswap_uint64 gets confused as to wha
register is actually holding the results
I somehow managed to generate a testcase which may help tracking dow
this issue
In the attached testcase (which hopefully ought to allow reproducin
the problem), I noted in the generated assembly listing the followin
(comments are my own)
<DEMO
13 : res = _byteswap_uint64(a & b)
00030 8b 0d 00 00 0
00 mov ecx, DWORD PTR _
; Get value for "b" Low 32 bit
00036 23 c8 and ecx, ea
; Logical AND with value for "a" low 32 bit
; ECX is "res" low 32 bit
00038 a3 00 00 00 00 mov DWORD PTR _a, ea
; Save from stroui64 in a low 32 bit
0003d a1 04 00 00 00 mov eax, DWORD PTR _b+
; Get value of b high 32 bit
00042 23 c2 and eax, ed
; Logical AND with high "a' value 32 bit
; EAX is "res" high 32 bit
00044 89 15 04 00 0
00 mov DWORD PTR _a+4, ed
; Save in "a" high 32 bit
0004a 0f c9 bswap ec
; bswap "res" low 32 bit
0004c 0f c8 bswap ea
; bswap "res" high 32 bit
0004e 83 c4 18 add esp, 24
00000018
; Bump stack (not related to issue
; 14 : memcpy((unsigned char *)z, &res, 8)
00051 89 15 00 00 0
00 mov DWORD PTR _z, ed
; Now.. What is EDX ???? This should be EAX !!
00057 89 0d 04 00 0
00 mov DWORD PTR _z+4, ec
; Save "res" big endiannised low 32 bits i
; z big endian low (tail portion
</DEMO
Usually moving things around a bit solves the issue.. But this make
using _byteswap_uint64() non deterministic.. and, wel.. I don't lik
that
Question
- Is this a bug or am I missing the point
- Is this a known issue (I tried looking for something similar to n
avail).
- Should I report this (and how? since this is a VC7.1 issue
- Optionally, is there any known deterministic protection agains
running into something like this
Thanks
Here is another solution that has worked (for Visual C++ 7.1, i.e. 2003)

extern "C" void _ReadWriteBarrier (); // Yes, I compiled as C++ ! ^
#pragma intrinsic (_ReadWriteBarrier
..
{ const unsigned __int64 t = a & b; _ReadWriteBarrier (); res
_byteswap_uint64 (t);

I hope this helps you, 6 years from your original posting. ^


-
Yours truly
Imbecil

Loading...