Discussion:
Memory acces completely optimized away
(too old to reply)
g***@hotmail.com
2009-10-29 13:55:57 UTC
Permalink
Hello all,

When I wrote a little test application, I noticed that the memory
access was completely removed in release builds (both vstudio 2003 as
2008):

void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while (*pContinue)
{ }
}

In dissambly:
void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
00981270 mov eax,dword ptr [esp+4]
00981274 mov eax,dword ptr [eax]
while (*pContinue)
00981276 test eax,eax
00981278 jne TestIntlIppiImplFlood+6 (981276h)
{
}
}

One can notice that only the value of *pContinue is stored in eax, and
this is tested. Even if pContinue is modified in another thread, the
function never ends.

This is quite an optimization, but I think it is too agressive. Ofc I
can use volatile for the address, but in effect this means that every
shared variable over threads must get the volatile keyword. I am aware
that one should use boost::mutex or other stuff to prevent data race
conditions, but this was just a simple test in which the variable was
atomicly changed (thru InterlockedIncrement) in one thread and read in
another thread.

can anyone shed light in this?

thx
John Keenan
2009-10-29 16:09:17 UTC
Permalink
Post by g***@hotmail.com
This is quite an optimization, but I think it is too agressive.
Sometimes you can use a do-nothing function to stop this optimization (you
must test with each compiler). For example:

void doNothing( long* pContinue )
{
return;
}

Then add a call to doNothing to your original function:

void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while( *pContinue ){
doNothing( pContinue );
}
}

While a compiler could optimize this to your original assembly code my
experience is that today's compilers do not.

John
Igor Tandetnik
2009-10-29 15:24:43 UTC
Permalink
Post by g***@hotmail.com
When I wrote a little test application, I noticed that the memory
access was completely removed in release builds (both vstudio 2003 as
void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while (*pContinue)
{ }
}
void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
00981270 mov eax,dword ptr [esp+4]
00981274 mov eax,dword ptr [eax]
while (*pContinue)
00981276 test eax,eax
00981278 jne TestIntlIppiImplFlood+6 (981276h)
{
}
}
One can notice that only the value of *pContinue is stored in eax, and
this is tested. Even if pContinue is modified in another thread, the
function never ends.
You should use Interlocked* family of functions to access variables shared between threads. Alternatively, use proper synchronization primitives such as critical sections.
Post by g***@hotmail.com
This is quite an optimization, but I think it is too agressive. Ofc I
can use volatile for the address, but in effect this means that every
shared variable over threads must get the volatile keyword. I am aware
that one should use boost::mutex or other stuff to prevent data race
conditions, but this was just a simple test in which the variable was
atomicly changed (thru InterlockedIncrement) in one thread and read in
another thread.
Synchronizing access to shared data only works when all threads do it. It's pointless to do it in some places but not in others. Use InterlockedCompareExchange to atomically read your variable - like this:

while (InterlockedCompareExchange(pContinue, 0, 0)) {...}
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
g***@hotmail.com
2009-10-29 17:06:05 UTC
Permalink
Thx.

Yes we can fool the optimizer by using dummy functions.

I am aware of the threading issues and that one should lock or
atomically exchange values. Even if the read isn't atomic, one might
expect that the atomic write at least flushes it to memory. I am not
sure if this guarantees a correct read (not sure if the processor
updates its cache for all processors after a memory write, maybe
multicore machine behave here differently then multiprocessor
machines).

Still the compiler has completely optimized away the read, so I was
wondering if this is always correct. If I put any dummy object in the
call, the compiler already produces code in which the memory gets
accessed, so I was wondering why in this simple case the compiler
decided to completely optimize the memory access away and if this is
correct in all cases.
Igor Tandetnik
2009-10-29 17:25:55 UTC
Permalink
Post by g***@hotmail.com
I am aware of the threading issues and that one should lock or
atomically exchange values. Even if the read isn't atomic, one might
expect that the atomic write at least flushes it to memory.
... but that doesn't mean that a different CPU reads it from memory and not, say, from its own cache.
Post by g***@hotmail.com
I am not
sure if this guarantees a correct read
On many modern multicore architectures, it doesn't. See also http://en.wikipedia.org/wiki/Memory_barrier
Post by g***@hotmail.com
Still the compiler has completely optimized away the read, so I was
wondering if this is always correct.
Yes. It's your responsibility to be careful with shared data, and use appropriate access patterns. You don't want the compiler to automatically penalize access to all variables in the program, just in case some of them are shared. That would effectively disable most optimizations.
Post by g***@hotmail.com
If I put any dummy object in the
call, the compiler already produces code in which the memory gets
accessed
I'm not sure what you mean by "dummy object". My guess is, you are putting a call into the loop whose source code the compiler doesn't see at this point. Now, even in a single-threaded program, it's possible to do this:

void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while (*pContinue)
{
DoSomething();
}
}

// in a different source file

long global_continue = 1;
void DoSomething() {
global_continue = 0;
}

TestIntlIppiImplFlood(&global_continue);

This effect is called "aliasing" ( http://en.wikipedia.org/wiki/Aliasing_(computing) ). The compiler has to assume the presence of aliasing unless proven otherwise (e.g. local variables whose address is never given out can't be aliased), and optimize accordingly.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
g***@hotmail.com
2009-11-02 10:41:21 UTC
Permalink
Thx. I think that should be the guideline.

In my test project I was sparing on it, because I didn't want to
induce processor stalls.

Loading...