Discussion:
volatile keyword and memory barriers
(too old to reply)
Leigh Johnston
2010-01-07 23:57:43 UTC
Permalink
I am confused, the Wikipedia article
http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
volatile keyword includes a memory barrier however if I compile the
following program:

volatile int n1;
volatile int n2;

int main()
{
++n1;
++n2;
}

I get the following output:

_main PROC ; COMDAT

; 19 : ++n1;

00000 b8 01 00 00 00 mov eax, 1
00005 01 05 00 00 00
00 add DWORD PTR ?n1@@3HC, eax ; n1

; 20 : ++n2;

0000b 01 05 00 00 00
00 add DWORD PTR ?n2@@3HC, eax ; n2

; 21 : }

00011 33 c0 xor eax, eax
00013 c3 ret 0
_main ENDP

I cannot see any memory barrier instructions here unless I am being stupid
so my question is does VC++ volatile keyword provide a memory barrier or
not? I am using VS2008.

/Leigh
Igor Tandetnik
2010-01-08 01:03:47 UTC
Permalink
Post by Leigh Johnston
I am confused, the Wikipedia article
http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
volatile keyword includes a memory barrier however if I compile the
00000 b8 01 00 00 00 mov eax, 1
00005 01 05 00 00 00
I cannot see any memory barrier instructions
x86 CPUs don't have memory barrier instructions and, architecturally, don't need them. You'd need to compile for IA64 to see them in action.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
Leigh Johnston
2010-01-08 12:49:57 UTC
Permalink
Post by Igor Tandetnik
Post by Leigh Johnston
I am confused, the Wikipedia article
http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
volatile keyword includes a memory barrier however if I compile the
00000 b8 01 00 00 00 mov eax, 1
00005 01 05 00 00 00
I cannot see any memory barrier instructions
x86 CPUs don't have memory barrier instructions and, architecturally,
don't need them. You'd need to compile for IA64 to see them in action.
not true, LFENCE, SFENCE, MFENCE and LOCK all exist for x86 and are useful
in multi-threaded programs.

/Leigh
Igor Tandetnik
2010-01-08 13:35:55 UTC
Permalink
Post by Leigh Johnston
Post by Igor Tandetnik
Post by Leigh Johnston
I am confused, the Wikipedia article
http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
volatile keyword includes a memory barrier however if I compile the
00000 b8 01 00 00 00 mov eax, 1
00005 01 05 00 00 00
I cannot see any memory barrier instructions
x86 CPUs don't have memory barrier instructions and, architecturally,
don't need them. You'd need to compile for IA64 to see them in action.
not true, LFENCE, SFENCE, MFENCE and LOCK all exist for x86 and are useful
in multi-threaded programs.
http://www.linuxjournal.com/article/8211

x86 CPU provides process consistency, where writes by one CPU are observed in order by all other CPUs. For this reason, it doesn't need explicit memory barrier instructions.

LFENCE, SFENCE and MFENCE are SSE instructions, apparently needed because certain other SSE instructions are asynchronous. I must admit I'm not very familiar with SSE, but your example doesn't issue SSE instructions anyway, so this is moot.

LOCK is not an instruction by itself, but a prefix to other instructions that renders them atomic (e.g. instructions like ADD which need to read, modify and write a memory location). Note that "volatile" doesn't promise or guarantee atomicity: ++n1 is still not atomic even though n1 is declared volatile.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
Leigh Johnston
2010-01-08 14:03:04 UTC
Permalink
Post by Igor Tandetnik
http://www.linuxjournal.com/article/8211
x86 CPU provides process consistency, where writes by one CPU are observed
in order by all other CPUs. For this reason, it doesn't need explicit
memory barrier instructions.
LFENCE, SFENCE and MFENCE are SSE instructions, apparently needed because
certain other SSE instructions are asynchronous. I must admit I'm not very
familiar with SSE, but your example doesn't issue SSE instructions anyway,
so this is moot.
The FENCE instructions are not "SSE" instructions they are required for the
following cases it seems
(http://www.intel.com/Assets/PDF/manual/253668.pdf):
Writes to memory are not reordered with other writes, with the following
exceptions:
- writes executed with the CLFLUSH instruction;
- streaming stores (writes) executed with the non-temporal move instructions
(MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
- string operations (see Section 8.2.4.1).

But yeah for my simple ADD example it looks like you are correct, no fence
required.
Post by Igor Tandetnik
LOCK is not an instruction by itself, but a prefix to other instructions
that renders them atomic (e.g. instructions like ADD which need to read,
modify and write a memory location). Note that "volatile" doesn't promise
or guarantee atomicity: ++n1 is still not atomic even though n1 is
declared volatile.
I am aware that LOCK is a prefix and I have read elsewhere that is *also*
acts as a memory barrier when used in conjunction with a compatible
instruction. I am also well aware that volatile does not promise atomicity,
I never said that it does.

/Leigh
Igor Tandetnik
2010-01-08 14:19:19 UTC
Permalink
Post by Leigh Johnston
The FENCE instructions are not "SSE" instructions they are required for the
following cases it seems
Writes to memory are not reordered with other writes, with the following
- writes executed with the CLFLUSH instruction;
- streaming stores (writes) executed with the non-temporal move instructions
(MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
- string operations (see Section 8.2.4.1).
All these are in fact from SSE[2] instruction set:

http://en.wikipedia.org/wiki/X86_instruction_listings
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
Leigh Johnston
2010-01-08 14:20:10 UTC
Permalink
Post by Igor Tandetnik
x86 CPU provides process consistency, where writes by one CPU are observed
in order by all other CPUs. For this reason, it doesn't need explicit
memory barrier instructions.
What about store forwarding? MFENCE may help I think (from
http://www.intel.com/Assets/PDF/manual/253668.pdf):

The memory-ordering model allows concurrent stores by two processors to be
seen
in different orders by those two processors; specifically, each processor
may perceive
its own store occurring before that of the other. This is illustrated by the
following
example:

Example 8-5. Intra-Processor Forwarding is Allowed
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _x] mov r3, [ _y]
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0
r2 == 0 and r4 == 0 is allowed

The memory-ordering model imposes no constraints on the order in which the
two
stores appear to execute by the two processors. This fact allows processor 0
to see
its store before seeing processor 1's, while processor 1 sees its store
before seeing
processor 0's. (Each processor is self consistent.) This allows r2 == 0 and
r4 == 0.
In practice, the reordering in this example can arise as a result of
store-buffer
forwarding. While a store is temporarily held in a processor's store buffer,
it can
satisfy the processor's own loads but is not visible to (and cannot satisfy)
loads by
other processors.
Igor Tandetnik
2010-01-08 14:34:31 UTC
Permalink
Post by Leigh Johnston
Post by Igor Tandetnik
x86 CPU provides process consistency, where writes by one CPU are observed
in order by all other CPUs. For this reason, it doesn't need explicit
memory barrier instructions.
What about store forwarding?
I must admit you are digging deeper than my understanding extends. Hopefully, someone more knowledgeable will chime in.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
Leigh Johnston
2010-01-08 14:44:53 UTC
Permalink
Post by Igor Tandetnik
Post by Leigh Johnston
What about store forwarding?
I must admit you are digging deeper than my understanding extends.
Hopefully, someone more knowledgeable will chime in.
--
Read
http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
Leigh Johnston
2010-01-08 14:58:40 UTC
Permalink
I guess the use-cases for MFENCE on x86 are rare so Microsoft decided that
its overhead cannot be justified to have volatile use it.

/Leigh
Leigh Johnston
2010-01-08 15:28:35 UTC
Permalink
Or the LOCK prefix rather which is used by Enter/LeaveCriticalSection.

/Leigh
Bo Persson
2010-01-08 22:42:35 UTC
Permalink
Post by Leigh Johnston
I guess the use-cases for MFENCE on x86 are rare so Microsoft
decided that its overhead cannot be justified to have volatile use
it.
/Leigh
Because volatile has nothing to do with threads, just with memory
mapped hardware?



Bo Persson
Igor Tandetnik
2010-01-08 22:49:35 UTC
Permalink
Post by Bo Persson
Post by Leigh Johnston
I guess the use-cases for MFENCE on x86 are rare so Microsoft
decided that its overhead cannot be justified to have volatile use
it.
/Leigh
Because volatile has nothing to do with threads, just with memory
mapped hardware?
That would be the case for most C++ compilers, yes. But MS made volatile have something to do with threads as of VC8 (VS 2005), by claiming that reading and writing volatile variables now has acquire/release semantics a la Java, and that the compiler would emit barrier instructions as necessary.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
Leigh Johnston
2010-01-08 13:06:15 UTC
Permalink
Targeting x64 makes no difference, still no memory barrier instructions
output.
Igor Tandetnik
2010-01-08 13:38:46 UTC
Permalink
Post by Leigh Johnston
Targeting x64 makes no difference, still no memory barrier instructions
output.
x64 provides the same strong consistency model as x86. That's why I said you need to compile for IA64 (aka Itanium): as far as I know, it's the only CPU supported by MSVC compiler that has a weak consistency model and actually needs memory barriers.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
Continue reading on narkive:
Search results for 'volatile keyword and memory barriers' (Questions and Answers)
6
replies
cAnCeR aS a HoRoScOpE?
started 2007-05-16 15:47:58 UTC
horoscopes
Loading...