Discussion:
Speed of logical tests
(too old to reply)
Vincent Fatica
2010-06-11 18:23:01 UTC
Permalink
In a ReadFile() loop I have this code for counting newlines.

for ( BYTE* p=inBuf; p<pEnd; p += dwCharSize )
if ( (bUnicode && *((WCHAR*)p) == L'\n') || (!bUnicode && *((CHAR*)p) ==
'\n') )
dwLineCount += 1;

In my performance tests, I use a non-Unicode file with lines of about 100
characters. In each test, three BOOLs are evaluated ... bUnicode will be FALSE,
!bUnicode will be TRUE, and then the character will be tested.

I would expect the following criterion to be faster

( (bUnicode && *((WCHAR*)p) == L'\n') || (*((CHAR*)p) == '\n' && !bUnicode) )

because in the typical case (character not newline) only two evaluations should
be done ... bUnicode will be FALSE and the character won't be a newline. But in
fact this alternative (the only change) increases the time by 50%.

This shakes my confidence a bit because I have always ordered logical tests in a
way which (I thought) would get them done the fastest. ... any ideas what's
going on here?

Thanks.
--
- Vince
Vincent Fatica
2010-06-11 18:39:51 UTC
Permalink
On 11 Jun 2010 14:23:01 -0400, Vincent Fatica <***@blackholespam.net> wrote:

|In a ReadFile() loop I have this code for counting newlines.
|
|for ( BYTE* p=inBuf; p<pEnd; p += dwCharSize )
| if ( (bUnicode && *((WCHAR*)p) == L'\n') || (!bUnicode && *((CHAR*)p) ==
|'\n') )
| dwLineCount += 1;
|
|In my performance tests, I use a non-Unicode file with lines of about 100
|characters. In each test, three BOOLs are evaluated ... bUnicode will be FALSE,
|!bUnicode will be TRUE, and then the character will be tested.
|
|I would expect the following criterion to be faster
|
|( (bUnicode && *((WCHAR*)p) == L'\n') || (*((CHAR*)p) == '\n' && !bUnicode) )
|
|because in the typical case (character not newline) only two evaluations should
|be done ... bUnicode will be FALSE and the character won't be a newline. But in
|fact this alternative (the only change) increases the time by 50%.
|
|This shakes my confidence a bit because I have always ordered logical tests in a
|way which (I thought) would get them done the fastest. ... any ideas what's
|going on here?

In internal tests of only the loop, the alternative (testing the character
first) is a bit faster (as expected). With that logical condition the only
change, the ASM listing for my function is drastically different (so much so
that I can't make sense of it).
--
- Vince
Alex Blekhman
2010-06-12 02:03:08 UTC
Permalink
Post by Vincent Fatica
This shakes my confidence a bit because I have always ordered logical tests in a
way which (I thought) would get them done the fastest. ... any ideas what's
going on here?
Hi Vincent,

I tried your code with VC2010. It seems that reordering the checks in
`if' statement confuses the optimizier a bit. Here is what I got:

if ( (bUnicode && *((WCHAR*)p) == L'\n') || (!bUnicode &&
*((CHAR*)p) == '\n') )
01351051 test dl,dl
01351053 je wmain+5Dh (135105Dh)
01351055 cmp word ptr [eax],0Ah
01351059 je wmain+62h (1351062h)
0135105B jmp wmain+63h (1351063h)
0135105D cmp byte ptr [eax],0Ah
01351060 jne wmain+63h (1351063h)
dwLineCount += 1;
01351062 inc edi

As you can see, if bUnicode is false, then second test isn't performed
at all. The character is checked right away. So, optimizer was smart
enough to figure this out by itself.

Here is the result for rearranged statement:

if ( (bUnicode && *((WCHAR*)p) == L'\n') || (*((CHAR*)p) ==
'\n' && !bUnicode) )
00F31051 test cl,cl
00F31053 je wmain+5Bh (0F3105Bh)
00F31055 cmp word ptr [eax],0Ah
00F31059 je wmain+64h (0F31064h)
00F3105B cmp byte ptr [eax],0Ah
00F3105E jne wmain+65h (0F31065h)
00F31060 test cl,cl
00F31062 jne wmain+65h (0F31065h)
dwLineCount += 1;
00F31064 inc edi

This time the optimizer missed the point and checks bUnicode twice
(notice the second "test cl,cl" command).

So, for the rearranged statement you have additional "test" command for
non-Unicode stream of bytes. I think it explains the time increase.

HTH
Alex
Vincent Fatica
2010-06-12 04:56:21 UTC
Permalink
On Sat, 12 Jun 2010 12:03:08 +1000, Alex Blekhman <***@yahoo.com> wrote:

|Here is the result for rearranged statement:
|
| if ( (bUnicode && *((WCHAR*)p) == L'\n') || (*((CHAR*)p) ==
|'\n' && !bUnicode) )
|00F31051 test cl,cl
|00F31053 je wmain+5Bh (0F3105Bh)
|00F31055 cmp word ptr [eax],0Ah
|00F31059 je wmain+64h (0F31064h)
|00F3105B cmp byte ptr [eax],0Ah
|00F3105E jne wmain+65h (0F31065h)
|00F31060 test cl,cl
|00F31062 jne wmain+65h (0F31065h)
| dwLineCount += 1;
|00F31064 inc edi
|
|This time the optimizer missed the point and checks bUnicode twice
|(notice the second "test cl,cl" command).

Thanks for looking into it Alex. I didn't build a stripped-down version and the
ASM I had was nightmarish and the two versions were radically different. The
question is moot now, though since this (below) is a lot faster than either of
the others, and even makes the text segment smaller (not to mention that it's a
little more readable)! I shouldn't have tried to be so cute in the first place.

if ( !bUnicode )
{
for ( CHAR* p = (CHAR*) inBuf; p < (CHAR*) pEnd; p += 1 )
if ( *p == '\n' )
dwLineCount += 1;
}
else
{
for ( WCHAR* p = (WCHAR*) inBuf; p < (WCHAR*) pEnd; p += 1 )
if ( *p == L'\n')
dwLineCount += 1;
}
--
- Vince
Alex Blekhman
2010-06-12 07:09:42 UTC
Permalink
Post by Vincent Fatica
Thanks for looking into it Alex.
You're welcome. :)
Thomas J. Gritzan
2010-06-12 13:48:13 UTC
Permalink
Post by Vincent Fatica
|
| if ( (bUnicode && *((WCHAR*)p) == L'\n') || (*((CHAR*)p) ==
|'\n' && !bUnicode) )
|00F31051 test cl,cl
|00F31053 je wmain+5Bh (0F3105Bh)
|00F31055 cmp word ptr [eax],0Ah
|00F31059 je wmain+64h (0F31064h)
|00F3105B cmp byte ptr [eax],0Ah
|00F3105E jne wmain+65h (0F31065h)
|00F31060 test cl,cl
|00F31062 jne wmain+65h (0F31065h)
| dwLineCount += 1;
|00F31064 inc edi
|
|This time the optimizer missed the point and checks bUnicode twice
|(notice the second "test cl,cl" command).
Thanks for looking into it Alex. I didn't build a stripped-down version and the
ASM I had was nightmarish and the two versions were radically different. The
question is moot now, though since this (below) is a lot faster than either of
the others, and even makes the text segment smaller (not to mention that it's a
little more readable)! I shouldn't have tried to be so cute in the first place.
if ( !bUnicode )
{
for ( CHAR* p = (CHAR*) inBuf; p < (CHAR*) pEnd; p += 1 )
if ( *p == '\n' )
dwLineCount += 1;
}
else
{
for ( WCHAR* p = (WCHAR*) inBuf; p < (WCHAR*) pEnd; p += 1 )
if ( *p == L'\n')
dwLineCount += 1;
}
Even shorter in text would be a named function to do this. If you use
C++, it even contains one in its standard library:

#include <algorithm> // std::count

if (bUnicode)
dwLineCount += std::count( (WCHAR*)inBuf, (WCHAR*)pEnd, L'\n' );
else
dwLineCount += std::count( (CHAR*)inBuf, (CHAR*)pEnd, '\n' );
/* untested */

Another way to write the original 'if' would be:

if (bUnicode ? *((WCHAR*)p) == L'\n' : *((CHAR*)p) == '\n')
{ /*...*/ }
--
Thomas
Vincent Fatica
2010-06-12 16:34:37 UTC
Permalink
On Sat, 12 Jun 2010 15:48:13 +0200, "Thomas J. Gritzan" <***@gmx.de>
wrote:

|> if ( !bUnicode )
|> {
|> for ( CHAR* p = (CHAR*) inBuf; p < (CHAR*) pEnd; p += 1 )
|> if ( *p == '\n' )
|> dwLineCount += 1;
|> }
|> else
|> {
|> for ( WCHAR* p = (WCHAR*) inBuf; p < (WCHAR*) pEnd; p += 1 )
|> if ( *p == L'\n')
|> dwLineCount += 1;
|> }
|
|Even shorter in text would be a named function to do this. If you use
|C++, it even contains one in its standard library:
|
|#include <algorithm> // std::count
|
|if (bUnicode)
| dwLineCount += std::count( (WCHAR*)inBuf, (WCHAR*)pEnd, L'\n' );
|else
| dwLineCount += std::count( (CHAR*)inBuf, (CHAR*)pEnd, '\n' );
|/* untested */

I don't use the STL much (know fairly little about it). It's interesting to
note that using std::count() left the size of the TEXT segment unchanged and I
failed to observe any difference in speed.
--
- Vince
Alex Blekhman
2010-06-13 02:23:19 UTC
Permalink
Post by Vincent Fatica
I don't use the STL much (know fairly little about it). It's interesting to
note that using std::count() left the size of the TEXT segment unchanged and I
failed to observe any difference in speed.
I couldn't praise STL enough. It's true blessing that the language has
such standard library. When you learn it a bit, then suddenly you
realize that tons of mundane tedious code you've written in the past
could be easily replaced with neat and succinct calls, which are
debugged and optimized by the best people in the industry. Also, you get
a lot of diagnostics completely for free, so you can catch many bug early.

In this particular case `std::count' got inlined by the compiler, so you
get a code, which is pretty similar to handcrafted if's and loops. AND
free debugging checks for debug builds.

Alex
Vincent Fatica
2010-06-12 04:57:53 UTC
Permalink
P.S. I finally had to change my news server to get this group. Glad you're
still here, Alex.
--
- Vince
Alex Blekhman
2010-06-12 07:17:24 UTC
Permalink
Post by Vincent Fatica
P.S. I finally had to change my news server to get this group. Glad you're
still here, Alex.
I am using news.sunsite.dk server for a long time already. It's free
(though requires registration) and provides almost all non-binary groups
that can be found on Usenet. I was wondering what would happen with
microsoft.public.* hierarchy after MS shuts down its news server. But it
seems like nothing changed. This group is still working. So, I'll hang
out here as long as the news server retains the group.

Alex
aslan
2010-06-15 05:41:39 UTC
Permalink
Post by Alex Blekhman
Post by Vincent Fatica
P.S. I finally had to change my news server to get this group. Glad you're
still here, Alex.
I am using news.sunsite.dk server for a long time already. It's free
(though requires registration) and provides almost all non-binary groups
that can be found on Usenet. I was wondering what would happen with
microsoft.public.* hierarchy after MS shuts down its news server. But it
seems like nothing changed. This group is still working. So, I'll hang out
here as long as the news server retains the group.
Alex
I have changed to news.albasani.net which too is free but requires
registration.

Loading...