times by waiter.
Post by Felix BrackFirst of all, many thanks to everyone of you for your tips and ideas. If
I did not respond to some posts, that does not mean the information
therein did not help me; I just did not know what to respond.
In fact my problem had to do with an event that was not correctly used,
i.e. I used that event in more then one place; on a multicore machine
'more then one place' refers to time (when is the code running) and space
(which core(s) is (are) running the code).
The event is part of an overlapped I/O operation used for serial
communication that signals the end of transmission ('WriteFile') operation.
The bad code did not take into account that this overlapped operation
could easily be run more then once at the same time on different cores
and that made my application crash. Even with one core the application
did not work on a fast machine since the event was used for more then one
overlapped I/O operation running at a time and the code was not designed
to handle this.
If this does not make any sense to you I am sorry, but it is quite
complicated to describe and posting the code would not help at all.
Anyway, here are my conclusions, maybe it help one or the other following
1. Overlapped operations take time until they finish.
I know it is some kind of platitude but I sometimes tend to forget this
fact. When using overlapped IO one should write code that respects the
current status of the overlapped IO operation at any time as long as the
operation lasts. I dare to say that code using overlapped operations, but
has no explicit checks for the status of the overlapped operation (this
might be a call to 'GetOverlappedResult' for example), is bad code and it
will fail sooner or later.
2. Use auto reset events sparingly
I think it is better to use manual events then auto events that are able
to reset automatically. The reason for this is quite simple: I stay in
better control about the status of the event and I have to take care
myself that the event gets reset at the correct location in space and
time. I really prefer my software not to work at all because of an event
that only fires once (wrong reset) then running every now and then,
depending on the machine it is running on (the code with the bug
described here did run for more then 8 years on many different machines
without ever failing!).
Not really much you might say and you are right, but believe it or not,
it took me about 4 weeks to find it (and just a few minutes to fix it).
In the end it's the result that counts and that was worth the time: the
code runs like charm.
Again many thanks to all of you, Felix