Discussion:
Avoiding _memset?
(too old to reply)
Vincent Fatica
2009-09-06 16:08:50 UTC
Permalink
(VC9) I am trying to avoid the runtime library in a tiny app (something I do
regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop, the
compiler turns my for-loop into a call to _memset.

; 13 : STARTUPINFO si;
; 14 : si.cb = sizeof(si);
; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb); p < (BYTE*) &si +
sizeof(si); p++)
; 16 : *p=0;

push 64 ; 00000040H
lea edx, DWORD PTR _si$[esp+104]
push 0
push edx
add esi, 2
mov DWORD PTR _si$[esp+108], 68 ; 00000044H
call _memset
add esp, 12 ; 0000000cH

How do I avoid that (elegantly)? Is it some kind of optimization I can simply
turn off? I can trick the compiler with the likes of

; 16 : *p = p ? 0 : 1; // in the loop

That avoids the _memset, but seems particularly kludgy.

Thanks.
--
- Vince
xiaosi
2009-09-06 16:56:43 UTC
Permalink
You may use intrinsic __stosb to replace memset:
#include <intrin.h>
STARTUPINFO si;
__stosb((unsigned char*)&si, 0, sizeof(si));
si.cb = sizeof(si);
Post by Vincent Fatica
(VC9) I am trying to avoid the runtime library in a tiny app (something I do
regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop, the
compiler turns my for-loop into a call to _memset.
; 13 : STARTUPINFO si;
; 14 : si.cb = sizeof(si);
; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb); p < (BYTE*) &si +
sizeof(si); p++)
; 16 : *p=0;
push 64 ; 00000040H
lea edx, DWORD PTR _si$[esp+104]
push 0
push edx
add esi, 2
mov DWORD PTR _si$[esp+108], 68 ; 00000044H
call _memset
add esp, 12 ; 0000000cH
How do I avoid that (elegantly)? Is it some kind of optimization I can simply
turn off? I can trick the compiler with the likes of
; 16 : *p = p ? 0 : 1; // in the loop
That avoids the _memset, but seems particularly kludgy.
Thanks.
--
- Vince
Vincent Fatica
2009-09-06 17:27:57 UTC
Permalink
On Mon, 7 Sep 2009 00:56:43 +0800, "xiaosi" <***@cn99.com> wrote:

|You may use intrinsic __stosb to replace memset:
|#include <intrin.h>
| STARTUPINFO si;
| __stosb((unsigned char*)&si, 0, sizeof(si));
| si.cb = sizeof(si);

Thanks. That's interesting. There's something I wonder about. __stosb is
listed in the documentation's "x64 Intrinsics"; it is not listed in "x86
Intrinsics". What's up with that?
--
- Vince
Alexander Grigoriev
2009-09-06 18:16:51 UTC
Permalink
1. #pragma intrinsic(memset)

or:
2. Write your own memset.
Post by Vincent Fatica
|#include <intrin.h>
| STARTUPINFO si;
| __stosb((unsigned char*)&si, 0, sizeof(si));
| si.cb = sizeof(si);
Thanks. That's interesting. There's something I wonder about. __stosb is
listed in the documentation's "x64 Intrinsics"; it is not listed in "x86
Intrinsics". What's up with that?
--
- Vince
Vincent Fatica
2009-09-06 18:47:37 UTC
Permalink
On Sun, 6 Sep 2009 11:16:51 -0700, "Alexander Grigoriev" <***@earthlink.net>
wrote:

|1. #pragma intrinsic(memset)

I tried that. With "/MT" I get "warning LNK4210: .CRT section exists; there may
be unhandled static initializers or terminators". With "/MD" I get

R6034

An application has made an attempt to load the C runtime library incorrectly.
Please contact the application's support team for more information.
--
- Vince
xiaosi
2009-09-07 02:58:23 UTC
Permalink
The titles of msdn are some misleading, while intrin.h is more truthful:
** __MACHINEI : Intel (32 bit x86) and X64
__MACHINEI(void __stosb(unsigned char *, unsigned char, size_t))
Post by Vincent Fatica
|#include <intrin.h>
| STARTUPINFO si;
| __stosb((unsigned char*)&si, 0, sizeof(si));
| si.cb = sizeof(si);
Thanks. That's interesting. There's something I wonder about. __stosb is
listed in the documentation's "x64 Intrinsics"; it is not listed in "x86
Intrinsics". What's up with that?
--
- Vince
Vincent Fatica
2009-09-07 04:08:01 UTC
Permalink
On Mon, 7 Sep 2009 10:58:23 +0800, "xiaosi" <***@cn99.com> wrote:

|The titles of msdn are some misleading, while intrin.h is more truthful:
|** __MACHINEI : Intel (32 bit x86) and X64
|__MACHINEI(void __stosb(unsigned char *, unsigned char, size_t))

memset() in in there too (__MACHINE, for all compilers). But I can find **no**
evidence that it's implemented as intrinsic by VC9. No matter what I do (/Oi,
#pragma intrinsic(memset), ...) I always get push, push, push, call _memset.
--
- Vince
xiaosi
2009-09-07 04:51:39 UTC
Permalink
#pragma intrinsic(memset) does not force memset to be inline:

"The intrinsic pragma tells the compiler that a function has known behavior. The compiler may call the function and not replace the
function call with inline instructions, if it will result in better performance."[1]

"I tried to look at the library implementation of memset, to understand why the compiler is refusing to generate an intrinsic
version memset (even if I use #pragma intrinsic(memset)!). Turns out the library memset checks to see if your CPU supports SSE2, and
if it does, it calls _VEC_memzero to perform this huge unrolled loop, doing eight 128-bit aligned stores in each iteration. I guess
they profiled this code and found it's faster than using old-fashioned "rep stosd", but since it's too complicated to inline several
different variations, they decided it's better to call the library function."[2]

If you have installed DDK[3] or WDK[4], you can use memset of ntdll.lib instead of crt lib:
#pragma comment(lib, "F:\\WINDDK\\3790.1830\\lib\\wxp\\i386\\ntdll.lib")
STARTUPINFO si = {sizeof(si)};

00401052 6a40 push 40h
00401054 8d4c2424 lea ecx,[esp+24h]
00401058 6a00 push 0
0040105a 51 push ecx
0040105b c744242844000000 mov dword ptr [esp+28h],44h
00401063 e834000000 call test!memset (0040109c)
test!memset:
0040109c ff2518204000 jmp dword ptr [test!_imp__memset (00402018)] ds:0023:00402018={ntdll!memset (7c922435)}
ntdll!memset:
7c922435 8b54240c mov edx,dword ptr [esp+0Ch] ss:0023:0012ff60=00000040

[1] http://msdn.microsoft.com/en-us/library/tzkfha43.aspx
[2] http://www.codeguru.com/forum/showthread.php?t=371491&page=2
[3] http://www.microsoft.com/whdc/devtools/ddk/default.mspx
[4] http://www.microsoft.com/whdc/DevTools/WDK/WDKpkg.mspx
Post by Vincent Fatica
|** __MACHINEI : Intel (32 bit x86) and X64
|__MACHINEI(void __stosb(unsigned char *, unsigned char, size_t))
memset() in in there too (__MACHINE, for all compilers). But I can find **no**
evidence that it's implemented as intrinsic by VC9. No matter what I do (/Oi,
#pragma intrinsic(memset), ...) I always get push, push, push, call _memset.
--
- Vince
Alex Blekhman
2009-09-06 19:02:13 UTC
Permalink
Post by Vincent Fatica
(VC9) I am trying to avoid the runtime library in a tiny app
(something I do regularly). When I try to zero-fill a
STARTUPINFO struct with a for-loop, the compiler turns my
for-loop into a call to _memset.
[...]
How do I avoid that (elegantly)? Is it some kind of
optimization I can simply turn off?
I noticed that optimize for size (/O1) triggers the compiler to
embed calls to memset. #pragma intrinsic(memset) has no any effect
on this behavior. You can still leave optimize for speed (/O2)
though. Also, do not set full optimization (/Ox).

HTH
Alex
Vincent Fatica
2009-09-06 19:18:09 UTC
Permalink
On Sun, 6 Sep 2009 22:02:13 +0300, "Alex Blekhman" <***@yahoo.com>
wrote:

|"Vincent Fatica" wrote:
|> (VC9) I am trying to avoid the runtime library in a tiny app
|> (something I do regularly). When I try to zero-fill a
|> STARTUPINFO struct with a for-loop, the compiler turns my
|> for-loop into a call to _memset.
|> [...]
|> How do I avoid that (elegantly)? Is it some kind of
|> optimization I can simply turn off?
|
|I noticed that optimize for size (/O1) triggers the compiler to
|embed calls to memset. #pragma intrinsic(memset) has no any effect
|on this behavior. You can still leave optimize for speed (/O2)
|though. Also, do not set full optimization (/Ox).

That's what I'm doing. But I get "warning LNK4210: .CRT section exists;"
--
- Vince
Alex Blekhman
2009-09-06 19:34:27 UTC
Permalink
Post by Vincent Fatica
But I get "warning LNK4210: .CRT section exists;"
According to MSDN you have some static/global code that requires
CRT while it isn't available. Also, security checks (/GS) are
tightly integrated with the CRT, so you need to ensure that /GS is
not set for your project. I assume you already found this article:

KB814472 - "You receive linker warnings when you build Managed
Extensions for C++ DLL projects"
http://support.microsoft.com/kb/814472

It talks about managed code mostly, but can give you some insight
about what goes on.

HTH
Alex
Vincent Fatica
2009-09-06 20:00:10 UTC
Permalink
On Sun, 6 Sep 2009 22:34:27 +0300, "Alex Blekhman" <***@yahoo.com>
wrote:

|KB814472 - "You receive linker warnings when you build Managed
|Extensions for C++ DLL projects"
|http://support.microsoft.com/kb/814472

I've seen that before. Mine is an utterly simple program (below). Without
going out of my way at all, I apparently get the intrinsic wcslen() (no library,
no warming, no .CRT section). memset() is another story!

#include <windows.h>
#include <intrin.h>

INT MyWinMain(VOID)
{
WCHAR *pCmdLine = GetCommandLine();
INT argc;
WCHAR **argv = CommandLineToArgvW(pCmdLine, &argc);
pCmdLine += wcslen(argv[0]);
while ( *pCmdLine && *pCmdLine != L' ' )
pCmdLine += 1;
pCmdLine += 1;
STARTUPINFO si;
__stosb((UCHAR*) &si, 0, sizeof(si));
si.cb = sizeof(si);
PROCESS_INFORMATION pi;
CreateProcess(NULL, pCmdLine, NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi);
LocalFree(argv);
return 0;
}
--
- Vince
xiaosi
2009-09-07 05:56:46 UTC
Permalink
Post by Vincent Fatica
while ( *pCmdLine && *pCmdLine != L' ' )
pCmdLine += 1;
pCmdLine += 1;
The above codes work when there's only one space between argv[0] and argv[1].
If there's more than one space or tab, the above codes should change to:
while ( *pCmdLine && *pCmdLine <= L' ' )
pCmdLine += 1;
Vincent Fatica
2009-09-07 14:50:56 UTC
Permalink
On Mon, 7 Sep 2009 13:56:46 +0800, "xiaosi" <***@cn99.com> wrote:

|"Vincent Fatica" <***@blackholespam.net> wrote:
|> while ( *pCmdLine && *pCmdLine != L' ' )
|> pCmdLine += 1;
|> pCmdLine += 1;
|
|The above codes work when there's only one space between argv[0] and argv[1].
|If there's more than one space or tab, the above codes should change to:
| while ( *pCmdLine && *pCmdLine <= L' ' )
| pCmdLine += 1;

Thanks for pointing that out!

Whenever I want to get a pointer to the tail (essentially from argv[1] on) of a
command line I find myself writing some kludgy little routine like the one
above. Is there an easier/canned/customary way (with/without the RTL)?
--
- Vince
xiaosi
2009-09-07 15:14:15 UTC
Permalink
For avoiding crt, I once wrote my_get_command_line, which is borrowed from crt source codes (__tmainCRTStartup in
VC\crt\src\crtexe.c):

LPWSTR __stdcall my_get_command_line() {
LPWSTR p = GetCommandLine();
BOOL inDoubleQuote = 0;
while (*p > L' ' || (*p && inDoubleQuote)) {
if (*p == L'\"')
inDoubleQuote = !inDoubleQuote;
p++;
}
while (*p && (*p <= L' ')) {
p++;
}
return p;
}

This is not more concise than your codes.
Post by Vincent Fatica
|> while ( *pCmdLine && *pCmdLine != L' ' )
|> pCmdLine += 1;
|> pCmdLine += 1;
|
|The above codes work when there's only one space between argv[0] and argv[1].
| while ( *pCmdLine && *pCmdLine <= L' ' )
| pCmdLine += 1;
Thanks for pointing that out!
Whenever I want to get a pointer to the tail (essentially from argv[1] on) of a
command line I find myself writing some kludgy little routine like the one
above. Is there an easier/canned/customary way (with/without the RTL)?
--
- Vince
Vincent Fatica
2009-09-07 16:47:12 UTC
Permalink
On Mon, 7 Sep 2009 23:14:15 +0800, "xiaosi" <***@cn99.com> wrote:

|For avoiding crt, I once wrote my_get_command_line, which is borrowed from crt source codes (__tmainCRTStartup in
|VC\crt\src\crtexe.c):

Thanks for that. After grep-ing through an archive of projects for "BOOL
bInQuotes" I'm embarrassed to admit how many times I have done something
similar. :-)

Have you got any other tips on avoiding the RTL?

wsprintf() is invaluable as are the lstr* functions (now I've learned that some
of the wcs* functions have intrinsic/inline versions).

I have often wanted something to turn a string into a number. If I insist on
avoiding the RTL, I use StrToIntEx and feel a bit guilty dragging in shlwapi.dll
for only that reason.
--
- Vince
xiaosi
2009-09-08 05:32:29 UTC
Permalink
One day I dumpbin /headers /exports ntdll.dll > exports_ntdll.txt, and find ntdll.dll has many "built-in" crt functions. When system
loads an app, it always firstly maps ntdll.dll to the app's process. This means every app has these "built-in" crt functions, so why
bother to import these functions from the crt lib (except using inline for speed). wcstoul/wcstol/strtoul/strtol are also "built-in"
in ntdll.dll.

1180 49B 0000E5C6 _CIcos = __CIcos
1181 49C 0000E682 _CIlog = __CIlog
1182 49D 0000E002 _CIpow = __CIpow
1183 49E 000012D1 _CIsin = __CIsin
1184 49F 0000137F _CIsqrt = __CIsqrt
1185 4A0 0002C8A2 __isascii = ___isascii
1186 4A1 0006F64B __iscsym = ___iscsym
1187 4A2 0006F605 __iscsymf = ___iscsymf
1188 4A3 0006F5F3 __toascii = ___toascii
1189 4A4 0000143B _alldiv = __alldiv
1190 4A5 000014E5 _alldvrm = __alldvrm
1191 4A6 000015C4 _allmul = __allmul
1192 4A7 000015F8 _alloca_probe = __alloca_probe
1193 4A8 00001635 _allrem = __allrem
1194 4A9 000016E9 _allshl = __allshl
1195 4AA 00001708 _allshr = __allshr
1196 4AB 0006F691 _atoi64 = __atoi64
1197 4AC 00001729 _aulldiv = __aulldiv
1198 4AD 00001791 _aulldvrm = __aulldvrm
1199 4AE 00001826 _aullrem = __aullrem
1200 4AF 0000189B _aullshr = __aullshr
1201 4B0 000015F8 _chkstk = __alloca_probe
1202 4B1 0007B048 _fltused = __fltused
1203 4B2 000018BA _ftol = __ftol
1204 4B3 0006F80D _i64toa = __i64toa
1205 4B4 0006F92F _i64tow = __i64tow
1206 4B5 0002E964 _itoa = __itoa
1207 4B6 0002DC81 _itow = __itow
1208 4B7 0006F989 _lfind = __lfind
1209 4B8 0006F74E _ltoa = __ltoa
1210 4B9 0006F867 _ltow = __ltow
1211 4BA 000018E1 _memccpy = __memccpy
1212 4BB 0006F9C2 _memicmp = __memicmp
1213 4BC 0006F9D2 _snprintf = __snprintf
1214 4BD 0001BBCA _snwprintf = __snwprintf
1215 4BE 0006FA30 _splitpath = __splitpath
1216 4BF 00012E44 _strcmpi = __stricmp
1217 4C0 00012E44 _stricmp = __stricmp
1218 4C1 0006FB78 _strlwr = __strlwr
1219 4C2 0001987D _strnicmp = __strnicmp
1220 4C3 0006FBA5 _strupr = __strupr
1221 4C4 0006FBD2 _tolower = __tolower
1222 4C5 0006FC1F _toupper = __toupper
1223 4C6 0006F845 _ui64toa = __ui64toa
1224 4C7 0006F967 _ui64tow = __ui64tow
1225 4C8 0006F77A _ultoa = __ultoa
1226 4C9 0006F893 _ultow = __ultow
1227 4CA 0002FB67 _vsnprintf = __vsnprintf
1228 4CB 0006FC31 _vsnwprintf = __vsnwprintf
1229 4CC 00013358 _wcsicmp = __wcsicmp
1230 4CD 00024849 _wcslwr = __wcslwr
1231 4CE 000181CD _wcsnicmp = __wcsnicmp
1232 4CF 0006FCA7 _wcsupr = __wcsupr
1233 4D0 0006FCDD _wtoi = __wtoi
1234 4D1 0006FCED _wtoi64 = __wtoi64
1235 4D2 0003684A _wtol = __wtol
1236 4D3 0006FD8A abs = _labs
1237 4D4 00001934 atan = _atan
1238 4D5 00024889 atoi = _atoi
1239 4D6 00024896 atol = _atol
1240 4D7 000151D3 bsearch = _bsearch
1241 4D8 000019D7 ceil = _ceil
1242 4D9 0000E5DA cos = _cos
1243 4DA 0006FD9F fabs = _fabs
1244 4DB 00001B18 floor = _floor
1245 4DC 0006F518 isalnum = _isalnum
1246 4DD 0006F3DC isalpha = _isalpha
1247 4DE 0006F5C0 iscntrl = _iscntrl
1248 4DF 0002C879 isdigit = _isdigit
1249 4E0 0006F588 isgraph = _isgraph
1250 4E1 0006F447 islower = _islower
1251 4E2 0006F550 isprint = _isprint
1252 4E3 0006F4E5 ispunct = _ispunct
1253 4E4 0006F4B2 isspace = _isspace
1254 4E5 0006F414 isupper = _isupper
1255 4E6 0006FE57 iswalpha = _iswalpha
1256 4E7 000269D1 iswctype = _iswctype
1257 4E8 00026A75 iswdigit = _iswdigit
1258 4E9 0006FE72 iswlower = _iswlower
1259 4EA 0006FEA5 iswspace = _iswspace
1260 4EB 0006FE8A iswxdigit = _iswxdigit
1261 4EC 0006F47A isxdigit = _isxdigit
1262 4ED 0006FD8A labs = _labs
1263 4EE 0000E67E log = _log
1264 4EF 0002490C mbstowcs = _mbstowcs
1265 4F0 00001C60 memchr = _memchr
1266 4F1 00001D07 memcmp = _memcmp
1267 4F2 00001DB3 memcpy = _memcpy
1268 4F3 000020F5 memmove = _memmove
1269 4F4 00002435 memset = _memset
1270 4F5 0000DFFD pow = _pow
1271 4F6 000203B8 qsort = _qsort
1272 4F7 000012E5 sin = _sin
1273 4F8 00025BA4 sprintf = _sprintf
1274 4F9 00001393 sqrt = _sqrt
1275 4FA 0006FEBD sscanf = _sscanf
1276 4FB 0000249D strcat = _strcat
1277 4FC 0000E7ED strchr = _strchr
1278 4FD 00002583 strcmp = _strcmp
1279 4FE 0000248D strcpy = _strcpy
1280 4FF 00002608 strcspn = _strcspn
1281 500 00002645 strlen = _strlen
1282 501 000026C0 strncat = _strncat
1283 502 000027E5 strncmp = _strncmp
1284 503 0000281D strncpy = _strncpy
1285 504 0000291D strpbrk = _strpbrk
1286 505 00002956 strrchr = _strrchr
1287 506 0000297D strspn = _strspn
1288 507 0000E75E strstr = _strstr
1289 508 000700B2 strtol = _strtol
1290 509 000700D1 strtoul = _strtoul
1291 50A 000184BB swprintf = _swprintf
1292 50B 000029CE tan = _tan
1293 50C 0006FBE4 tolower = _tolower
1294 50D 00023D13 toupper = _toupper
1295 50E 0002A826 towlower = _towlower
1296 50F 000700F0 towupper = _towupper
1297 510 00020304 vDbgPrintEx = ***@16
1298 511 0001EA5B vDbgPrintExWithPrefix = ***@20
1299 512 00070104 vsprintf = _vsprintf
1300 513 00018112 wcscat = _wcscat
1301 514 00014962 wcschr = _wcschr
1302 515 00035424 wcscmp = _wcscmp
1303 516 00012F40 wcscpy = _wcscpy
1304 517 000356EE wcscspn = _wcscspn
1305 518 0000FE2A wcslen = _wcslen
1306 519 00018B24 wcsncat = _wcsncat
1307 51A 0001E40F wcsncmp = _wcsncmp
1308 51B 0001055F wcsncpy = _wcsncpy
1309 51C 00070162 wcspbrk = _wcspbrk
1310 51D 00014671 wcsrchr = _wcsrchr
1311 51E 000701AB wcsspn = _wcsspn
1312 51F 0002380F wcsstr = _wcsstr
1313 520 00029F03 wcstol = _wcstol
1314 521 000701F9 wcstombs = _wcstombs
1315 522 00034D91 wcstoul = _wcstoul
Post by Vincent Fatica
Thanks for that. After grep-ing through an archive of projects for "BOOL
bInQuotes" I'm embarrassed to admit how many times I have done something
similar. :-)
Have you got any other tips on avoiding the RTL?
wsprintf() is invaluable as are the lstr* functions (now I've learned that some
of the wcs* functions have intrinsic/inline versions).
I have often wanted something to turn a string into a number. If I insist on
avoiding the RTL, I use StrToIntEx and feel a bit guilty dragging in shlwapi.dll
for only that reason.
--
- Vince
Vincent Fatica
2009-09-08 06:22:30 UTC
Permalink
How do I ensure my app get them from ntdll.dll and not try to get them from the
CRT lib?

On Tue, 8 Sep 2009 13:32:29 +0800, "xiaosi" <***@cn99.com> wrote:

|One day I dumpbin /headers /exports ntdll.dll > exports_ntdll.txt, and find ntdll.dll has many "built-in" crt functions. When system
|loads an app, it always firstly maps ntdll.dll to the app's process. This means every app has these "built-in" crt functions, so why
|bother to import these functions from the crt lib (except using inline for speed). wcstoul/wcstol/strtoul/strtol are also "built-in"
|in ntdll.dll.
|
| 1180 49B 0000E5C6 _CIcos = __CIcos
| 1181 49C 0000E682 _CIlog = __CIlog
| 1182 49D 0000E002 _CIpow = __CIpow
| 1183 49E 000012D1 _CIsin = __CIsin
| 1184 49F 0000137F _CIsqrt = __CIsqrt
| 1185 4A0 0002C8A2 __isascii = ___isascii
| 1186 4A1 0006F64B __iscsym = ___iscsym
| 1187 4A2 0006F605 __iscsymf = ___iscsymf
| 1188 4A3 0006F5F3 __toascii = ___toascii
| 1189 4A4 0000143B _alldiv = __alldiv
| 1190 4A5 000014E5 _alldvrm = __alldvrm
| 1191 4A6 000015C4 _allmul = __allmul
| 1192 4A7 000015F8 _alloca_probe = __alloca_probe
| 1193 4A8 00001635 _allrem = __allrem
| 1194 4A9 000016E9 _allshl = __allshl
| 1195 4AA 00001708 _allshr = __allshr
| 1196 4AB 0006F691 _atoi64 = __atoi64
| 1197 4AC 00001729 _aulldiv = __aulldiv
| 1198 4AD 00001791 _aulldvrm = __aulldvrm
| 1199 4AE 00001826 _aullrem = __aullrem
| 1200 4AF 0000189B _aullshr = __aullshr
| 1201 4B0 000015F8 _chkstk = __alloca_probe
| 1202 4B1 0007B048 _fltused = __fltused
| 1203 4B2 000018BA _ftol = __ftol
| 1204 4B3 0006F80D _i64toa = __i64toa
| 1205 4B4 0006F92F _i64tow = __i64tow
| 1206 4B5 0002E964 _itoa = __itoa
| 1207 4B6 0002DC81 _itow = __itow
| 1208 4B7 0006F989 _lfind = __lfind
| 1209 4B8 0006F74E _ltoa = __ltoa
| 1210 4B9 0006F867 _ltow = __ltow
| 1211 4BA 000018E1 _memccpy = __memccpy
| 1212 4BB 0006F9C2 _memicmp = __memicmp
| 1213 4BC 0006F9D2 _snprintf = __snprintf
| 1214 4BD 0001BBCA _snwprintf = __snwprintf
| 1215 4BE 0006FA30 _splitpath = __splitpath
| 1216 4BF 00012E44 _strcmpi = __stricmp
| 1217 4C0 00012E44 _stricmp = __stricmp
| 1218 4C1 0006FB78 _strlwr = __strlwr
| 1219 4C2 0001987D _strnicmp = __strnicmp
| 1220 4C3 0006FBA5 _strupr = __strupr
| 1221 4C4 0006FBD2 _tolower = __tolower
| 1222 4C5 0006FC1F _toupper = __toupper
| 1223 4C6 0006F845 _ui64toa = __ui64toa
| 1224 4C7 0006F967 _ui64tow = __ui64tow
| 1225 4C8 0006F77A _ultoa = __ultoa
| 1226 4C9 0006F893 _ultow = __ultow
| 1227 4CA 0002FB67 _vsnprintf = __vsnprintf
| 1228 4CB 0006FC31 _vsnwprintf = __vsnwprintf
| 1229 4CC 00013358 _wcsicmp = __wcsicmp
| 1230 4CD 00024849 _wcslwr = __wcslwr
| 1231 4CE 000181CD _wcsnicmp = __wcsnicmp
| 1232 4CF 0006FCA7 _wcsupr = __wcsupr
| 1233 4D0 0006FCDD _wtoi = __wtoi
| 1234 4D1 0006FCED _wtoi64 = __wtoi64
| 1235 4D2 0003684A _wtol = __wtol
| 1236 4D3 0006FD8A abs = _labs
| 1237 4D4 00001934 atan = _atan
| 1238 4D5 00024889 atoi = _atoi
| 1239 4D6 00024896 atol = _atol
| 1240 4D7 000151D3 bsearch = _bsearch
| 1241 4D8 000019D7 ceil = _ceil
| 1242 4D9 0000E5DA cos = _cos
| 1243 4DA 0006FD9F fabs = _fabs
| 1244 4DB 00001B18 floor = _floor
| 1245 4DC 0006F518 isalnum = _isalnum
| 1246 4DD 0006F3DC isalpha = _isalpha
| 1247 4DE 0006F5C0 iscntrl = _iscntrl
| 1248 4DF 0002C879 isdigit = _isdigit
| 1249 4E0 0006F588 isgraph = _isgraph
| 1250 4E1 0006F447 islower = _islower
| 1251 4E2 0006F550 isprint = _isprint
| 1252 4E3 0006F4E5 ispunct = _ispunct
| 1253 4E4 0006F4B2 isspace = _isspace
| 1254 4E5 0006F414 isupper = _isupper
| 1255 4E6 0006FE57 iswalpha = _iswalpha
| 1256 4E7 000269D1 iswctype = _iswctype
| 1257 4E8 00026A75 iswdigit = _iswdigit
| 1258 4E9 0006FE72 iswlower = _iswlower
| 1259 4EA 0006FEA5 iswspace = _iswspace
| 1260 4EB 0006FE8A iswxdigit = _iswxdigit
| 1261 4EC 0006F47A isxdigit = _isxdigit
| 1262 4ED 0006FD8A labs = _labs
| 1263 4EE 0000E67E log = _log
| 1264 4EF 0002490C mbstowcs = _mbstowcs
| 1265 4F0 00001C60 memchr = _memchr
| 1266 4F1 00001D07 memcmp = _memcmp
| 1267 4F2 00001DB3 memcpy = _memcpy
| 1268 4F3 000020F5 memmove = _memmove
| 1269 4F4 00002435 memset = _memset
| 1270 4F5 0000DFFD pow = _pow
| 1271 4F6 000203B8 qsort = _qsort
| 1272 4F7 000012E5 sin = _sin
| 1273 4F8 00025BA4 sprintf = _sprintf
| 1274 4F9 00001393 sqrt = _sqrt
| 1275 4FA 0006FEBD sscanf = _sscanf
| 1276 4FB 0000249D strcat = _strcat
| 1277 4FC 0000E7ED strchr = _strchr
| 1278 4FD 00002583 strcmp = _strcmp
| 1279 4FE 0000248D strcpy = _strcpy
| 1280 4FF 00002608 strcspn = _strcspn
| 1281 500 00002645 strlen = _strlen
| 1282 501 000026C0 strncat = _strncat
| 1283 502 000027E5 strncmp = _strncmp
| 1284 503 0000281D strncpy = _strncpy
| 1285 504 0000291D strpbrk = _strpbrk
| 1286 505 00002956 strrchr = _strrchr
| 1287 506 0000297D strspn = _strspn
| 1288 507 0000E75E strstr = _strstr
| 1289 508 000700B2 strtol = _strtol
| 1290 509 000700D1 strtoul = _strtoul
| 1291 50A 000184BB swprintf = _swprintf
| 1292 50B 000029CE tan = _tan
| 1293 50C 0006FBE4 tolower = _tolower
| 1294 50D 00023D13 toupper = _toupper
| 1295 50E 0002A826 towlower = _towlower
| 1296 50F 000700F0 towupper = _towupper
| 1297 510 00020304 vDbgPrintEx = ***@16
| 1298 511 0001EA5B vDbgPrintExWithPrefix = ***@20
| 1299 512 00070104 vsprintf = _vsprintf
| 1300 513 00018112 wcscat = _wcscat
| 1301 514 00014962 wcschr = _wcschr
| 1302 515 00035424 wcscmp = _wcscmp
| 1303 516 00012F40 wcscpy = _wcscpy
| 1304 517 000356EE wcscspn = _wcscspn
| 1305 518 0000FE2A wcslen = _wcslen
| 1306 519 00018B24 wcsncat = _wcsncat
| 1307 51A 0001E40F wcsncmp = _wcsncmp
| 1308 51B 0001055F wcsncpy = _wcsncpy
| 1309 51C 00070162 wcspbrk = _wcspbrk
| 1310 51D 00014671 wcsrchr = _wcsrchr
| 1311 51E 000701AB wcsspn = _wcsspn
| 1312 51F 0002380F wcsstr = _wcsstr
| 1313 520 00029F03 wcstol = _wcstol
| 1314 521 000701F9 wcstombs = _wcstombs
| 1315 522 00034D91 wcstoul = _wcstoul
|
|"Vincent Fatica" <***@blackholespam.net> wrote:
|> Thanks for that. After grep-ing through an archive of projects for "BOOL
|> bInQuotes" I'm embarrassed to admit how many times I have done something
|> similar. :-)
|>
|> Have you got any other tips on avoiding the RTL?
|>
|> wsprintf() is invaluable as are the lstr* functions (now I've learned that some
|> of the wcs* functions have intrinsic/inline versions).
|>
|> I have often wanted something to turn a string into a number. If I insist on
|> avoiding the RTL, I use StrToIntEx and feel a bit guilty dragging in shlwapi.dll
|> for only that reason.
|> --
|> - Vince
--
- Vince
Alex Blekhman
2009-09-08 08:31:30 UTC
Permalink
Post by Vincent Fatica
How do I ensure my app get them from ntdll.dll and not try to
get them from the CRT lib?
You can re-create an import lib for ntdll.dll and link with it:

KB131313 - "How To Create 32-bit Import Libraries Without .OBJs or
Source"
http://support.microsoft.com/kb/131313

HTH
Alex
xiaosi
2009-09-08 15:06:07 UTC
Permalink
When I write:
#pragma comment(lib, "F:\\WINDDK\\3790.1830\\lib\\wxp\\i386\\ntdll.lib")
VC always import memset from ntdll.lib instead of CRT lib.

To ensure not to import anything from CRT lib, I write:
#pragma comment(linker, "/nodefaultlib")
But this need to write every lib you will use:
#pragma comment(lib, "kernel32")
#pragma comment(lib, "user32")
#pragma comment(lib, "gdi32")
#pragma comment(lib, "comctl32")
Post by Vincent Fatica
How do I ensure my app get them from ntdll.dll and not try to get them from the
CRT lib?
|One day I dumpbin /headers /exports ntdll.dll > exports_ntdll.txt, and find ntdll.dll has many "built-in" crt functions. When
system
|loads an app, it always firstly maps ntdll.dll to the app's process. This means every app has these "built-in" crt functions, so
why
|bother to import these functions from the crt lib (except using inline for speed). wcstoul/wcstol/strtoul/strtol are also
"built-in"
|in ntdll.dll.
xiaosi
2009-09-08 16:05:31 UTC
Permalink
#pragma comment(linker, "/nodefaultlib:libcmt.lib")
is enough
#pragma comment(linker, "/nodefaultlib")
will prevent memset to link to ntdll.lib.
Post by xiaosi
#pragma comment(lib, "F:\\WINDDK\\3790.1830\\lib\\wxp\\i386\\ntdll.lib")
VC always import memset from ntdll.lib instead of CRT lib.
#pragma comment(linker, "/nodefaultlib")
#pragma comment(lib, "kernel32")
#pragma comment(lib, "user32")
#pragma comment(lib, "gdi32")
#pragma comment(lib, "comctl32")
Post by Vincent Fatica
How do I ensure my app get them from ntdll.dll and not try to get them from the
CRT lib?
|One day I dumpbin /headers /exports ntdll.dll > exports_ntdll.txt, and find ntdll.dll has many "built-in" crt functions. When
system
|loads an app, it always firstly maps ntdll.dll to the app's process. This means every app has these "built-in" crt functions, so
why
|bother to import these functions from the crt lib (except using inline for speed). wcstoul/wcstol/strtoul/strtol are also
"built-in"
|in ntdll.dll.
Vincent Fatica
2009-09-08 18:15:36 UTC
Permalink
On Tue, 8 Sep 2009 23:06:07 +0800, "xiaosi" <***@cn99.com> wrote:

|When I write:
|#pragma comment(lib, "F:\\WINDDK\\3790.1830\\lib\\wxp\\i386\\ntdll.lib")
|VC always import memset from ntdll.lib instead of CRT lib.
|
|To ensure not to import anything from CRT lib, I write:
|#pragma comment(linker, "/nodefaultlib")
|But this need to write every lib you will use:
|#pragma comment(lib, "kernel32")
|#pragma comment(lib, "user32")
|#pragma comment(lib, "gdi32")
|#pragma comment(lib, "comctl32")

I downloaded the WDK so I suppose after installing it I'll have an ntdll.h and
ntdll.lib (right?).

Can't I just ignore, specifically, libcmt.lib?
--
- Vince
Vincent Fatica
2009-09-08 23:44:44 UTC
Permalink
On Tue, 8 Sep 2009 13:32:29 +0800, "xiaosi" <***@cn99.com> wrote:

|in ntdll.dll.
|
| 1180 49B 0000E5C6 _CIcos = __CIcos
| 1181 49C 0000E682 _CIlog = __CIlog

I tried linking an old, big project with ntdll.dll. That project also uses
libcmt.dll (/MT) and it uses many functions which are available in both places.
LIBCMT.lib(_wctype.obj) : error LNK2005: _iswdigit already defined in ntdll.lib(ntdll.dll)
Is there a way to fix that? In general, is there a way to take what's available
in ntdll.dll from ntdll.dll and anything not available there fromm libcmt.lib?
--
- Vince
Vincent Fatica
2009-09-09 00:04:00 UTC
Permalink
On 8 Sep 2009 19:44:44 -0400, Vincent Fatica <***@blackholespam.net> wrote:

|On Tue, 8 Sep 2009 13:32:29 +0800, "xiaosi" <***@cn99.com> wrote:
|
||in ntdll.dll.
||
|| 1180 49B 0000E5C6 _CIcos = __CIcos
|| 1181 49C 0000E682 _CIlog = __CIlog
|
|I tried linking an old, big project with ntdll.dll. That project also uses
|libcmt.dll (/MT) and it uses many functions which are available in both places.
|Even so, I get only one error:
|
|>LIBCMT.lib(_wctype.obj) : error LNK2005: _iswdigit already defined in ntdll.lib(ntdll.dll)
|
|Is there a way to fix that? In general, is there a way to take what's available
|in ntdll.dll from ntdll.dll and anything not available there fromm libcmt.lib?

In fact, that project gets all of these (below) from ntdll.lib (when I imagine
they're also in libcmt.lib). Why is it choking on _iswdigit?

ntdll.dll
1000C278 Import Address Table
100121C0 Import Name Table
0 time date stamp
0 Index of first forwarder reference

4F2 memcpy
4F6 qsort
4D2 _wtol
4D0 _wtoi
522 wcstoul
51D wcsrchr
4F4 memset
514 wcschr
352 RtlUnwind
4D1 _wtoi64
4AE _aullrem
4AC _aulldiv
50E towlower
520 wcstol
4CE _wcsnicmp
51C wcspbrk
--
- Vince
xiaosi
2009-09-09 07:25:55 UTC
Permalink
When add /FORCE:MULTIPLE to linker option:
LIBCMT.lib(_wctype.obj) : warning LNK4006: _iswlower already defined in ntdll.lib(ntdll.dll); second definition ignored
LIBCMT.lib(_wctype.obj) : warning LNK4006: _iswdigit already defined in ntdll.lib(ntdll.dll); second definition ignored
Post by Vincent Fatica
|
||in ntdll.dll.
||
|| 1180 49B 0000E5C6 _CIcos = __CIcos
|| 1181 49C 0000E682 _CIlog = __CIlog
|
|I tried linking an old, big project with ntdll.dll. That project also uses
|libcmt.dll (/MT) and it uses many functions which are available in both places.
|
|>LIBCMT.lib(_wctype.obj) : error LNK2005: _iswdigit already defined in ntdll.lib(ntdll.dll)
|
|Is there a way to fix that? In general, is there a way to take what's available
|in ntdll.dll from ntdll.dll and anything not available there fromm libcmt.lib?
In fact, that project gets all of these (below) from ntdll.lib (when I imagine
they're also in libcmt.lib). Why is it choking on _iswdigit?
ntdll.dll
1000C278 Import Address Table
100121C0 Import Name Table
0 time date stamp
0 Index of first forwarder reference
4F2 memcpy
4F6 qsort
4D2 _wtol
4D0 _wtoi
522 wcstoul
51D wcsrchr
4F4 memset
514 wcschr
352 RtlUnwind
4D1 _wtoi64
4AE _aullrem
4AC _aulldiv
50E towlower
520 wcstol
4CE _wcsnicmp
51C wcspbrk
--
- Vince
Vincent Fatica
2009-09-09 14:00:36 UTC
Permalink
On Wed, 9 Sep 2009 15:25:55 +0800, "xiaosi" <***@cn99.com> wrote:

|When add /FORCE:MULTIPLE to linker option:
|LIBCMT.lib(_wctype.obj) : warning LNK4006: _iswlower already defined in ntdll.lib(ntdll.dll); second definition ignored
|LIBCMT.lib(_wctype.obj) : warning LNK4006: _iswdigit already defined in ntdll.lib(ntdll.dll); second definition ignored

Yes, I discovered that and tried it. It worked but the resulting scary warnings
made it seem rather heavy-handed.

I read about the purpose of ntdll.dll (as an interface to the native API). Then
I looked at the win7 version of ntdll.lib in the WDK. It contains *none* of the
CTR stuff (str*, wcs*, ...) that's in the winxp version. Do you suppose it's
incomplete ... or perhaps, has the philosophy changed?
--
- Vince
xiaosi
2009-09-09 14:19:38 UTC
Permalink
Those crt functions in ntdll.dll seems to be undocumented feature of Windows. I had never heard of this before the day that I
"dumpbined" it. Since it's undocumented, Windows does not promise to support this in the future.
Post by Vincent Fatica
|LIBCMT.lib(_wctype.obj) : warning LNK4006: _iswlower already defined in ntdll.lib(ntdll.dll); second definition ignored
|LIBCMT.lib(_wctype.obj) : warning LNK4006: _iswdigit already defined in ntdll.lib(ntdll.dll); second definition ignored
Yes, I discovered that and tried it. It worked but the resulting scary warnings
made it seem rather heavy-handed.
I read about the purpose of ntdll.dll (as an interface to the native API). Then
I looked at the win7 version of ntdll.lib in the WDK. It contains *none* of the
CTR stuff (str*, wcs*, ...) that's in the winxp version. Do you suppose it's
incomplete ... or perhaps, has the philosophy changed?
--
- Vince
Alex Blekhman
2009-09-07 18:44:17 UTC
Permalink
Mine is an utterly simple program (below). Without going out of
my way at all, I apparently get the intrinsic wcslen() (no
library, no warming, no .CRT section). memset() is another
story!
#include <windows.h>
#include <intrin.h>
INT MyWinMain(VOID)
{
[...]
Even if you use a custom entry point it is still must conform to
the expected signature of [w]WinMain or [w]main. Also, calling
convention must be __stdcall for Windows subsystem and __cdecl for
console subsystem.

Alex
Vincent Fatica
2009-09-07 19:11:00 UTC
Permalink
On Mon, 7 Sep 2009 21:44:17 +0300, "Alex Blekhman" <***@yahoo.com>
wrote:

|Even if you use a custom entry point it is still must conform to
|the expected signature of [w]WinMain or [w]main. Also, calling
|convention must be __stdcall for Windows subsystem and __cdecl for
|console subsystem.

To avoid the RTL I often use a custom entry point and I often violate the
conventions you mentioned. I haven't gotten into trouble yet. Please
elaborate.
--
- Vince
Alex Blekhman
2009-09-07 20:58:19 UTC
Permalink
Post by Vincent Fatica
To avoid the RTL I often use a custom entry point and I often
violate the conventions you mentioned. I haven't gotten into
trouble yet. Please elaborate.
The linker requires that the entry point would be of certain
signature. It means that the code it generates around the call
expects a predefined number of parameters and a predefined calling
convention. I think that calling stack gets corrupted upon exit
from the entry point. I'm not sure why there is no diagnostic
message issued by the system, though. May be it is because there
is no CRT to report it in the convenient way, or the process is
already on its way down when it happens, so no point to report it,
or whatever else. But it seems to me unclean to violate this
requirement while the linker's documentation explicitly insists on
it.

Alex
Vincent Fatica
2009-09-07 22:01:32 UTC
Permalink
On Mon, 7 Sep 2009 23:58:19 +0300, "Alex Blekhman" <***@yahoo.com>
wrote:

|"Vincent Fatica" wrote:
|> To avoid the RTL I often use a custom entry point and I often
|> violate the conventions you mentioned. I haven't gotten into
|> trouble yet. Please elaborate.
|
|The linker requires that the entry point would be of certain
|signature. It means that the code it generates around the call
|expects a predefined number of parameters and a predefined calling
|convention. I think that calling stack gets corrupted upon exit
|from the entry point. I'm not sure why there is no diagnostic
|message issued by the system, though. May be it is because there
|is no CRT to report it in the convenient way, or the process is
|already on its way down when it happens, so no point to report it,
|or whatever else. But it seems to me unclean to violate this
|requirement while the linker's documentation explicitly insists on
|it.

Can you point me to that linker documentation?

If you look at the CRT startup routines (the typical entry points, crtexe.c) for
EXEs, you see that they are __cdecl, taking, no args and returning an int. They
examine a command line, do their thing, and eventually call main, wmain, or
WinMain. So the signatures of main, wmain, and WinMain are very important. But
if you're not using the CRT startup routines and you are specifying an entry
point, it would seem that int __cdecl ...(void) is appropriate.
--
- Vince
Alex Blekhman
2009-09-08 08:18:04 UTC
Permalink
Post by Vincent Fatica
If you look at the CRT startup routines (the typical entry
points, crtexe.c) for EXEs, you see that they are __cdecl,
taking, no args and returning an int. They examine a command
line, do their thing, and eventually call main, wmain, or
WinMain. So the signatures of main, wmain, and WinMain are very
important. But if you're not using the CRT startup routines and
you are specifying an entry point, it would seem that int
__cdecl ...(void) is appropriate.
I found the requirements here:

"/ENTRY (Entry-Point Symbol)"
http://msdn.microsoft.com/en-us/library/f9t8842e.aspx

<quote>
The function must be defined with the __stdcall calling
convention. The parameters and return value must be defined as
documented in the Win32 API for WinMain (for an .exe file) or
DllEntryPoint (for a DLL).
</quote>

However, after reading your answer I looked into crtexe.c file I
discovered that CRT startup routines don't respect this
requirement, exactly as you noticed. Moreover, BaseProcessStart
routine from kernel32.dll doesn't care neither about parameters
nor calling convention. The call looks like this:

call dword ptr [ebp+8]
push eax
call ***@4

Where "dword ptr [ebp+8]" is the address of the process entry
point. The process entry point address is passed as a parameter to
BaseProcessStart, as well.

So, actually the linker documentation is incorrect and one must
use int __cdecl(*)(void) function as a custom entry point. It is
only by sheer luck BaseProcessStart and subsequent code don't
change anything on stack, so there is no access violation
exception.

I would be very interested in hearing from MSFT people about the
issue.

Alex
xiaosi
2009-09-08 14:53:42 UTC
Permalink
Yes, the msdn documentation is incorrect about /ENTRY, it had confused me long ago.
EXE ENTRY should be int __cdecl(*)(void), or int __stdcall(*)(void).
DLL ENTRY must be defined as DllMain.

00 hookdll!DllMain [j:\test\hookdll\hookdll.cpp @ 95]
01 hookdll!__DllMainCRTStartup+0x7a [f:\dd\vctools\crt_bld\self_x86\crt\src\crtdll.c @ 546]
02 hookdll!_DllMainCRTStartup+0x1e [f:\dd\vctools\crt_bld\self_x86\crt\src\crtdll.c @ 510]
03 ntdll!LdrpCallInitRoutine+0x14
04 ntdll!LdrpRunInitializeRoutines+0x344
05 ntdll!LdrpInitializeProcess+0x1131
06 ntdll!_LdrpInitialize+0x183
07 ntdll!KiUserApcDispatcher+0x7

BOOL WINAPI
_DllMainCRTStartup(
HANDLE hDllHandle,
DWORD dwReason,
LPVOID lpreserved
)
Post by Alex Blekhman
"/ENTRY (Entry-Point Symbol)"
http://msdn.microsoft.com/en-us/library/f9t8842e.aspx
<quote>
The function must be defined with the __stdcall calling
convention. The parameters and return value must be defined as
documented in the Win32 API for WinMain (for an .exe file) or
DllEntryPoint (for a DLL).
</quote>
However, after reading your answer I looked into crtexe.c file I
discovered that CRT startup routines don't respect this
requirement, exactly as you noticed. Moreover, BaseProcessStart
routine from kernel32.dll doesn't care neither about parameters
call dword ptr [ebp+8]
push eax
Where "dword ptr [ebp+8]" is the address of the process entry
point. The process entry point address is passed as a parameter to
BaseProcessStart, as well.
So, actually the linker documentation is incorrect and one must
use int __cdecl(*)(void) function as a custom entry point. It is
only by sheer luck BaseProcessStart and subsequent code don't
change anything on stack, so there is no access violation
exception.
I would be very interested in hearing from MSFT people about the
issue.
Alex
xiaosi
2009-09-08 05:29:35 UTC
Permalink
Besides call convention, ExitProcess should be used instead of return, because both tmainCRTStartup and tWinMainCRTStartup call
kernel32!ExitProcess(status) after tmain or tWinMain return status.

If you use Common Dialog Box in your app, and return (instead of ExitProcess) to exit app, the process will not exit (the thread of
Common Dialog Box is still existing).

#pragma comment(linker, "/subsystem:windows")
#pragma comment(linker, "/entry:mywinmain")
int __stdcall mywinmain() {
....
ExitProcess(msg.wParam);
}

#pragma comment(linker, "/subsystem:console")
#pragma comment(linker, "/entry:mymain")
int __cdecl mymain() {
....
ExitProcess(status);
}
Post by Vincent Fatica
|Even if you use a custom entry point it is still must conform to
|the expected signature of [w]WinMain or [w]main. Also, calling
|convention must be __stdcall for Windows subsystem and __cdecl for
|console subsystem.
To avoid the RTL I often use a custom entry point and I often violate the
conventions you mentioned. I haven't gotten into trouble yet. Please
elaborate.
--
- Vince
Vincent Fatica
2009-09-08 06:20:47 UTC
Permalink
Thanks for the ExitProcess tip.

__stdcall/WINAPI is not mentioned in crtexe.c. Aren't they all __cdecl?

static
int
__tmainCRTStartup(
void
);

#ifdef _WINMAIN_

#ifdef WPRFLAG
int wWinMainCRTStartup(
#else /* WPRFLAG */
int WinMainCRTStartup(
#endif /* WPRFLAG */

#else /* _WINMAIN_ */

#ifdef WPRFLAG
int wmainCRTStartup(
#else /* WPRFLAG */
int mainCRTStartup(
#endif /* WPRFLAG */

#endif /* _WINMAIN_ */
void
)
{

On Tue, 8 Sep 2009 13:29:35 +0800, "xiaosi" <***@cn99.com> wrote:

|Besides call convention, ExitProcess should be used instead of return, because both tmainCRTStartup and tWinMainCRTStartup call
|kernel32!ExitProcess(status) after tmain or tWinMain return status.
|
|If you use Common Dialog Box in your app, and return (instead of ExitProcess) to exit app, the process will not exit (the thread of
|Common Dialog Box is still existing).
|
|#pragma comment(linker, "/subsystem:windows")
|#pragma comment(linker, "/entry:mywinmain")
|int __stdcall mywinmain() {
| ....
| ExitProcess(msg.wParam);
|}
|
|#pragma comment(linker, "/subsystem:console")
|#pragma comment(linker, "/entry:mymain")
|int __cdecl mymain() {
| ....
| ExitProcess(status);
|}
|
|"Vincent Fatica" <***@blackholespam.net> wrote:
|> On Mon, 7 Sep 2009 21:44:17 +0300, "Alex Blekhman" <***@yahoo.com>
|> wrote:
|>
|> |Even if you use a custom entry point it is still must conform to
|> |the expected signature of [w]WinMain or [w]main. Also, calling
|> |convention must be __stdcall for Windows subsystem and __cdecl for
|> |console subsystem.
|>
|> To avoid the RTL I often use a custom entry point and I often violate the
|> conventions you mentioned. I haven't gotten into trouble yet. Please
|> elaborate.
|> --
|> - Vince
--
- Vince
xiaosi
2009-09-08 15:47:38 UTC
Permalink
int __stdcall mywinmain() is unnecessary, int __cdecl mywinmain() works fine (since it's void parameter). I am accustomed to
__stdcall under the influence of the incorrect msdn documentation :-)

It's strange that __tmainCRTStartup is not defined explicitly as __cdecl as check_managed_app is:
static int __cdecl check_managed_app (
void
)

*Purpose:
* For an unmanaged app, they call exit and
* never return.
*
*Exit:
* Unmanaged app: never return.
*
*******************************************************************************/
static
int
__tmainCRTStartup(
void
);
Post by Vincent Fatica
Thanks for the ExitProcess tip.
__stdcall/WINAPI is not mentioned in crtexe.c. Aren't they all __cdecl?
static
int
__tmainCRTStartup(
void
);
#ifdef _WINMAIN_
#ifdef WPRFLAG
int wWinMainCRTStartup(
#else /* WPRFLAG */
int WinMainCRTStartup(
#endif /* WPRFLAG */
#else /* _WINMAIN_ */
#ifdef WPRFLAG
int wmainCRTStartup(
#else /* WPRFLAG */
int mainCRTStartup(
#endif /* WPRFLAG */
#endif /* _WINMAIN_ */
void
)
{
|Besides call convention, ExitProcess should be used instead of return, because both tmainCRTStartup and tWinMainCRTStartup call
|kernel32!ExitProcess(status) after tmain or tWinMain return status.
|
|If you use Common Dialog Box in your app, and return (instead of ExitProcess) to exit app, the process will not exit (the thread
of
|Common Dialog Box is still existing).
|
|#pragma comment(linker, "/subsystem:windows")
|#pragma comment(linker, "/entry:mywinmain")
|int __stdcall mywinmain() {
| ....
| ExitProcess(msg.wParam);
|}
|
|#pragma comment(linker, "/subsystem:console")
|#pragma comment(linker, "/entry:mymain")
|int __cdecl mymain() {
| ....
| ExitProcess(status);
|}
Alex Blekhman
2009-09-08 08:22:33 UTC
Permalink
Post by xiaosi
Besides call convention, ExitProcess should be used instead of
return, because both tmainCRTStartup and tWinMainCRTStartup call
kernel32!ExitProcess(status) after tmain or tWinMain return
status.
This is incorrect. tmainCRTStartup and tWinMainCRTStartup call
ExitProcess only if there is an exception thrown from
main/WinMain. Otherwise, both CRT startup routines just return.
kernel32!BaseProcessStart routine calls ExitThread, which in its
turn calls ExitProcess.

Alex
xiaosi
2009-09-08 13:29:48 UTC
Permalink
Mark Lucovsky remarked about BaseProcessStart(PPROCESS_START_ROUTINE lpStartAddress):
"lpStartAddress - Supplies the starting address of the new thread.
The address is logically a procedure that never returns."

Yes, on my 32bit windows xp sp3, __tmainCRTStartupt (tmainCRTStartup or tWinMainCRTStartup) never returns to BaseProcessStart.
Without any exception thrown, __tmainCRTStartup calls exit(), doexit(), __crtExitProcess(), ExitProcess(), _ExitProcess(),
NtTerminateProcess(), and never returns.

00 ntdll!NtTerminateProcess
01 kernel32!_ExitProcess+0x37
02 kernel32!ExitProcess+0x14
03 MSVCR90!__crtExitProcess+0x17 [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c @ 731]
04 MSVCR90!doexit+0x10a [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c @ 644]
05 MSVCR90!exit+0x11 [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c @ 412]
06 winconsol!__tmainCRTStartup+0x125 [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 597]
07 kernel32!BaseProcessStart+0x23

int
__tmainCRTStartup(
void
)
#ifdef WPRFLAG
mainret = wWinMain(
#else /* WPRFLAG */
mainret = WinMain(
#endif /* WPRFLAG */
(HINSTANCE)&__ImageBase,
NULL,
lpszCommandLine,
StartupInfo.dwFlags & STARTF_USESHOWWINDOW
? StartupInfo.wShowWindow
: SW_SHOWDEFAULT
);
#else /* _WINMAIN_ */

#ifdef WPRFLAG
__winitenv = envp;
mainret = wmain(argc, argv, envp);
#else /* WPRFLAG */
__initenv = envp;
mainret = main(argc, argv, envp);
#endif /* WPRFLAG */

#endif /* _WINMAIN_ */

/*
* Note that if the exe is managed app, we don't really need to
* call exit or _c_exit. .cctor should be able to take care of
* this.
*/
if ( !managedapp )
exit(mainret);
Post by xiaosi
Besides call convention, ExitProcess should be used instead of return, because both tmainCRTStartup and tWinMainCRTStartup call
kernel32!ExitProcess(status) after tmain or tWinMain return status.
This is incorrect. tmainCRTStartup and tWinMainCRTStartup call ExitProcess only if there is an exception thrown from main/WinMain.
Otherwise, both CRT startup routines just return. kernel32!BaseProcessStart routine calls ExitThread, which in its turn calls
ExitProcess.
Alex
Alex Blekhman
2009-09-08 15:25:11 UTC
Permalink
Post by xiaosi
Yes, on my 32bit windows xp sp3, __tmainCRTStartupt
(tmainCRTStartup or tWinMainCRTStartup) never returns to
BaseProcessStart. Without any exception thrown,
__tmainCRTStartup calls exit(), doexit(), __crtExitProcess(),
ExitProcess(), _ExitProcess(), NtTerminateProcess(), and never
returns.
Yes, you're right. I overlooked this code. However, even without
explicitly calling ExitProcess the BaseProcessStart routine will
call it anyway. Here's the stack of CRT-less program after main
returns:

...
ntdll.dll!***@0()
ntdll.dll!***@8() + 0xc bytes
kernel32.dll!***@4() + 0x37 bytes
kernel32.dll!7c81cb26()
kernel32.dll!***@4() + 0x63 bytes
kernel32.dll!***@4() + 0x29 bytes


Alex
xiaosi
2009-09-08 17:21:33 UTC
Permalink
"ExitThread Function: If the thread is the last thread in the process when this function is called, the thread's process is also
terminated."[1]

windows xp sp3:
00 ntdll!NtTerminateProcess
01 kernel32!_ExitProcess+0x37
02 kernel32!ExitProcess+0x14
03 kernel32!ExitThread+0x92
04 kernel32!BaseProcessStart+0x28

When I open the GetOpenFileName Dialog Box[2], the process adds three threads (one ntdll.dll!RtlpTimerThread + two
ntdll.dll!RtlpWorkerThread). When I close the GetOpenFileName Dialog Box, the nocrt.exe!main thread exits, but the other three
threads remain. After several minutes, the two ntdll.dll!RtlpWorkerThread exit but the ntdll.dll!RtlpTimerThread remains. The
process is not terminated!

[1] http://msdn.microsoft.com/en-us/library/ms682659.aspx
[2]
#pragma comment(linker, "/entry:main")
#pragma comment(linker, "/subsystem:console")
#pragma comment(linker, "/manifestdependency:\"type='win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0'
processorArchitecture='*' publicKeyToken='6595b64144ccf1df'\"")
#pragma comment(lib, "kernel32")
#pragma comment(lib, "comdlg32")

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <commdlg.h> //GetOpenFileName
#include <intrin.h> //__stosd

int __cdecl main() {
OPENFILENAME op;
char file[MAX_PATH*2];
__stosd((unsigned long *)&op, 0, sizeof(op)/4); // 88/4=22
op.lStructSize = sizeof(op);
op.lpstrFile = file;
op.nMaxFile = sizeof(file);
GetOpenFileName(&op);
return 0;
}
Post by Alex Blekhman
Post by xiaosi
Yes, on my 32bit windows xp sp3, __tmainCRTStartupt
(tmainCRTStartup or tWinMainCRTStartup) never returns to
BaseProcessStart. Without any exception thrown,
__tmainCRTStartup calls exit(), doexit(), __crtExitProcess(),
ExitProcess(), _ExitProcess(), NtTerminateProcess(), and never
returns.
Yes, you're right. I overlooked this code. However, even without
explicitly calling ExitProcess the BaseProcessStart routine will
call it anyway. Here's the stack of CRT-less program after main
...
kernel32.dll!7c81cb26()
Alex
Alex Blekhman
2009-09-09 07:42:35 UTC
Permalink
Post by xiaosi
When I open the GetOpenFileName Dialog Box[2], the process adds
three threads (one ntdll.dll!RtlpTimerThread + two
ntdll.dll!RtlpWorkerThread). When I close the GetOpenFileName
Dialog Box, the nocrt.exe!main thread exits, but the other three
threads remain. After several minutes, the two
ntdll.dll!RtlpWorkerThread exit but the
ntdll.dll!RtlpTimerThread remains. The process is not
terminated!
I see. I always thought that it is the operating system that
closes all threads in the process when main thread exits. It
appears that this logic is implemented by CRT.

Thanks
Alex
Ulrich Eckhardt
2009-09-07 07:26:25 UTC
Permalink
Post by Vincent Fatica
(VC9) I am trying to avoid the runtime library in a tiny app (something I do
regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop,
the compiler turns my for-loop into a call to _memset.
; 13 : STARTUPINFO si;
; 14 : si.cb = sizeof(si);
; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb);
p < (BYTE*) &si + sizeof(si); p++)
; 16 : *p=0;
I'd call this pretty asinine, how about a portable (yeah, as if it mattered
to win32 code...) and straight-forward

STARTUPINFO si = {0};

and leaving the initialisation to the compiler then?
Post by Vincent Fatica
push 64 ; 00000040H
lea edx, DWORD PTR _si$[esp+104]
push 0
push edx
add esi, 2
mov DWORD PTR _si$[esp+108], 68 ; 00000044H
call _memset
add esp, 12 ; 0000000cH
How do I avoid that (elegantly)? Is it some kind of optimization I can simply
turn off? I can trick the compiler with the likes of
; 16 : *p = p ? 0 : 1; // in the loop
That avoids the _memset, but seems particularly kludgy.
How about this:

char simem[sizeof (STARTUPINFO)] = {0};
STARTUPINFO si = (STARTUPINFO*)simem;

or maybe even a union?


Just wondering: Why do you care?

Uli
--
C++ FAQ: http://parashift.com/c++-faq-lite

Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
xiaosi
2009-09-07 08:01:28 UTC
Permalink
Vincent had written "avoid the runtime library in a tiny app".

If you use memset of vc runtime library, the app will imports many codes (which are unnecessary) of vc runtime library, the app size
will be larger than 40 KB(/MT), or 7KB(/MD) + 640 KB MSVCR90.DLL.

By excluding the vc runtime library, this app size is only 3 KB.
Post by Ulrich Eckhardt
I'd call this pretty asinine, how about a portable (yeah, as if it mattered
to win32 code...) and straight-forward
STARTUPINFO si = {0};
and leaving the initialisation to the compiler then?
char simem[sizeof (STARTUPINFO)] = {0};
STARTUPINFO si = (STARTUPINFO*)simem;
or maybe even a union?
Just wondering: Why do you care?
Uli
Vincent Fatica
2009-09-07 14:41:59 UTC
Permalink
On Mon, 07 Sep 2009 09:26:25 +0200, Ulrich Eckhardt <***@satorlaser.com>
wrote:

|How about this:
|
| char simem[sizeof (STARTUPINFO)] = {0};
| STARTUPINFO si = (STARTUPINFO*)simem;

Did you mean

char simem[sizeof (STARTUPINFO)] = {0};
STARTUPINFO *psi = (STARTUPINFO*)simem;

That invokes memset, gives the LNK4210 warning, makes a .CRT section, and
increases the .text segment from 0x9F bytes to 0x9AA bytes (159 to 2474).

SecureZeroMemory and __stosb avoid memset. So does

STARTUPINFO si = {sizeof(si),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};

but that costs about 64 bytes later, moving 0 into each member.
--
- Vince
Ulrich Eckhardt
2009-09-08 08:32:27 UTC
Permalink
Post by Vincent Fatica
On Mon, 07 Sep 2009 09:26:25 +0200, Ulrich Eckhardt
|
| char simem[sizeof (STARTUPINFO)] = {0};
| STARTUPINFO si = (STARTUPINFO*)simem;
Did you mean
char simem[sizeof (STARTUPINFO)] = {0};
STARTUPINFO *psi = (STARTUPINFO*)simem;
Of course, yes.
Post by Vincent Fatica
That invokes memset, gives the LNK4210 warning, makes a .CRT section, and
increases the .text segment from 0x9F bytes to 0x9AA bytes (159 to 2474).
SecureZeroMemory and __stosb avoid memset. So does
STARTUPINFO si = {sizeof(si),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
but that costs about 64 bytes later, moving 0 into each member.
Honestly: I would ignore that and start getting some real work done. ;)

*shrug*

Good luck anyway!

Uli
--
C++ FAQ: http://parashift.com/c++-faq-lite

Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
pm
2009-09-08 11:37:29 UTC
Permalink
http://social.msdn.microsoft.com/forums/en-US/Vsexpressvc/thread/a51fe950-cd74-4133-8d84-1bc07b353bc2/
http://social.msdn.microsoft.com/Forums/en-US/vclanguage/thread/dd0ccf2c-ec28-4b1a-a609-bd962a17febf/

PM-
Post by Vincent Fatica
(VC9) I am trying to avoid the runtime library in a tiny app (something I do
regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop, the
compiler turns my for-loop into a call to _memset.
; 13 : STARTUPINFO si;
; 14 : si.cb = sizeof(si);
; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb); p < (BYTE*) &si +
sizeof(si); p++)
; 16 : *p=0;
push 64 ; 00000040H
lea edx, DWORD PTR _si$[esp+104]
push 0
push edx
add esi, 2
mov DWORD PTR _si$[esp+108], 68 ; 00000044H
call _memset
add esp, 12 ; 0000000cH
How do I avoid that (elegantly)? Is it some kind of optimization I can simply
turn off? I can trick the compiler with the likes of
; 16 : *p = p ? 0 : 1; // in the loop
That avoids the _memset, but seems particularly kludgy.
Thanks.
Ben Voigt [C++ MVP]
2009-09-08 20:01:40 UTC
Permalink
Post by Vincent Fatica
(VC9) I am trying to avoid the runtime library in a tiny app (something I do
regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop, the
compiler turns my for-loop into a call to _memset.
; 13 : STARTUPINFO si;
; 14 : si.cb = sizeof(si);
; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb); p < (BYTE*) &si +
sizeof(si); p++)
; 16 : *p=0;
push 64 ; 00000040H
lea edx, DWORD PTR _si$[esp+104]
push 0
push edx
add esi, 2
mov DWORD PTR _si$[esp+108], 68 ; 00000044H
call _memset
add esp, 12 ; 0000000cH
How do I avoid that (elegantly)? Is it some kind of optimization I can simply
turn off? I can trick the compiler with the likes of
; 16 : *p = p ? 0 : 1; // in the loop
That avoids the _memset, but seems particularly kludgy.
You can use the ZeroMemory macro, which results in a call to ntdll.dll's
RtlZeroMemory routine instead of pulling in the CRT.
Post by Vincent Fatica
Thanks.
--
- Vince
Vincent Fatica
2009-09-08 20:13:37 UTC
Permalink
On Tue, 8 Sep 2009 15:01:40 -0500, "Ben Voigt [C++ MVP]"
<***@newsgroup.nospam> wrote:

|
|
|"Vincent Fatica" <***@blackholespam.net> wrote in message
|news:4aa3de92$***@news.vefatica.net...
|> (VC9) I am trying to avoid the runtime library in a tiny app (something I
|> do
|> regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop,
|> the
|> compiler turns my for-loop into a call to _memset.
|>
|> ; 13 : STARTUPINFO si;
|> ; 14 : si.cb = sizeof(si);
|> ; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb); p < (BYTE*) &si +
|> sizeof(si); p++)
|> ; 16 : *p=0;
|>
|> push 64 ; 00000040H
|> lea edx, DWORD PTR _si$[esp+104]
|> push 0
|> push edx
|> add esi, 2
|> mov DWORD PTR _si$[esp+108], 68 ; 00000044H
|> call _memset
|> add esp, 12 ; 0000000cH
|>
|> How do I avoid that (elegantly)? Is it some kind of optimization I can
|> simply
|> turn off? I can trick the compiler with the likes of
|>
|> ; 16 : *p = p ? 0 : 1; // in the loop
|>
|> That avoids the _memset, but seems particularly kludgy.
|
|You can use the ZeroMemory macro, which results in a call to ntdll.dll's
|RtlZeroMemory routine instead of pulling in the CRT.

Tried that ... results in a call to _memset.
--
- Vince
Giovanni Dicanio
2009-09-08 22:41:41 UTC
Permalink
Post by Vincent Fatica
|You can use the ZeroMemory macro, which results in a call to ntdll.dll's
|RtlZeroMemory routine instead of pulling in the CRT.
Tried that ... results in a call to _memset.
In WinNT.h I read:

#define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))


Giovanni
Ben Voigt [C++ MVP]
2009-09-08 23:22:17 UTC
Permalink
Post by Giovanni Dicanio
Post by Vincent Fatica
|You can use the ZeroMemory macro, which results in a call to ntdll.dll's
|RtlZeroMemory routine instead of pulling in the CRT.
Tried that ... results in a call to _memset.
#define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))
Yes, but I think there is a real exported function in ntdll.dll

So maybe RtlZeroMemory needs to be #undef-d
Post by Giovanni Dicanio
Giovanni
xiaosi
2009-09-09 05:37:22 UTC
Permalink
RtlZeroMemory is really exported from ntdll.dll:
878 36D 00002C64 RtlZeroMemory = ***@8
And is also exported (forwarded to ntdll.dll) from kernel32.dll:
713 2C8 RtlZeroMemory (forwarded to NTDLL.RtlZeroMemory)

#undef RtlZeroMemory
extern "C" __declspec(dllimport) void __stdcall RtlZeroMemory(void*, size_t);
RtlZeroMemory(xx, xxx); will use ntdll!RtlZeroMemory:

0040100a ff1500204000 call dword ptr [test!_imp__RtlZeroMemory (00402000)] ds:0023:00402000={ntdll!RtlZeroMemory (7c922c64)}
Post by Ben Voigt [C++ MVP]
Post by Giovanni Dicanio
Post by Vincent Fatica
|You can use the ZeroMemory macro, which results in a call to ntdll.dll's
|RtlZeroMemory routine instead of pulling in the CRT.
Tried that ... results in a call to _memset.
#define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))
Yes, but I think there is a real exported function in ntdll.dll
So maybe RtlZeroMemory needs to be #undef-d
Post by Giovanni Dicanio
Giovanni
Giovanni Dicanio
2009-09-09 07:57:45 UTC
Permalink
Post by Ben Voigt [C++ MVP]
Post by Giovanni Dicanio
#define RtlZeroMemory(Destination,Length)
memset((Destination),0,(Length))
Yes, but I think there is a real exported function in ntdll.dll
Ben: you are right.

But I wonder why they #define'd RtlZeroMemory as an alias to memset in
WinNT.h ...

Giovanni
Tim Roberts
2009-09-10 03:07:57 UTC
Permalink
Post by Giovanni Dicanio
But I wonder why they #define'd RtlZeroMemory as an alias to memset in
WinNT.h ...
Because "memset" is a compiler intrinsic that can be inlined to a "rep
stosb".
--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Vincent Fatica
2009-09-10 04:39:23 UTC
Permalink
On Wed, 09 Sep 2009 20:07:57 -0700, Tim Roberts <***@probo.com> wrote:

|Giovanni Dicanio <***@REMOVEMEgmail.com> wrote:
|>
|>But I wonder why they #define'd RtlZeroMemory as an alias to memset in
|>WinNT.h ...
|
|Because "memset" is a compiler intrinsic that can be inlined to a "rep
|stosb".

But it's not inlined to "rep stosb" (at least by VC9).
--
- Vince
Bo Persson
2009-09-10 20:22:41 UTC
Permalink
Post by Vincent Fatica
Post by Tim Roberts
Post by Giovanni Dicanio
But I wonder why they #define'd RtlZeroMemory as an alias to
memset in WinNT.h ...
Because "memset" is a compiler intrinsic that can be inlined to a
"rep stosb".
But it's not inlined to "rep stosb" (at least by VC9).
Because "rep stosb" was fast once-upon-a-time (early 1980's, or so),
but the weird design of current processors actually makes them run
faster if you spell it all out explicitly. A short sequence of simple
instructions might run faster than a single specialized instruction.
Honest!


Bo Persson
Tim Roberts
2009-09-12 03:26:20 UTC
Permalink
Post by Bo Persson
Post by Vincent Fatica
But it's not inlined to "rep stosb" (at least by VC9).
Because "rep stosb" was fast once-upon-a-time (early 1980's, or so),
but the weird design of current processors actually makes them run
faster if you spell it all out explicitly. A short sequence of simple
instructions might run faster than a single specialized instruction.
Honest!
Not true. If you're doing less than 7 or 8 iterations, you're right.
Beyond that, "rep stosd" wins. It does one dword per cycle, and it's hard
to beat that, without getting into the more obscure instruction sets.
--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Alex Blekhman
2009-09-09 07:48:28 UTC
Permalink
Post by Vincent Fatica
|You can use the ZeroMemory macro, which results in a call to
ntdll.dll's RtlZeroMemory routine instead of pulling in the CRT.
Tried that ... results in a call to _memset.
I found in the WinNT.h header the SecureZeroMemory macro, which
expands to the RtlSecureZeroMemory function. It is implemented
inline in the header file, so it doesn't have any external
dependencies. Basically, it's a straightforward implementation of
memset.

HTH
Alex
xiaosi
2009-09-09 09:07:18 UTC
Permalink
It's strange that why to use loop instead of __stosb on none _M_AMD64 cpu.
On my AMD32 cpu, __stosb (114 clocks) is faster than this loop (195 clocks).

FORCEINLINE
PVOID
RtlSecureZeroMemory(
__in_bcount(cnt) PVOID ptr,
__in SIZE_T cnt
)
{
volatile char *vptr = (volatile char *)ptr;
#if defined(_M_AMD64)
__stosb((PBYTE )((DWORD64)vptr), 0, cnt);
#else
while (cnt) {
*vptr = 0;
vptr++;
cnt--;
}
#endif
return ptr;
}
Post by Alex Blekhman
I found in the WinNT.h header the SecureZeroMemory macro, which
expands to the RtlSecureZeroMemory function. It is implemented
inline in the header file, so it doesn't have any external
dependencies. Basically, it's a straightforward implementation of
memset.
HTH
Alex
Vincent Fatica
2009-09-09 14:12:52 UTC
Permalink
On Wed, 9 Sep 2009 17:07:18 +0800, "xiaosi" <***@cn99.com> wrote:

|It's strange that why to use loop instead of __stosb on none _M_AMD64 cpu.
|On my AMD32 cpu, __stosb (114 clocks) is faster than this loop (195 clocks).

How do you time such things?
--
- Vince
xiaosi
2009-09-09 14:47:34 UTC
Permalink
I use __rdtsc()[1], the results include the time of rdtsc itself and call ret mov (17 clocks on my AMD32 cpu when cache is all hit).

[1]
#define WIN32_LEAN_AND_MEAN
#include <windows.h> //SecureZeroMemory
#include <stdio.h>
#include <intrin.h>

__declspec(noinline) void __stdcall func0() {
volatile int i = 0;
}

__declspec(noinline) void __stdcall func1() {
long si[88/4];
SecureZeroMemory(si, sizeof(si));
}

__declspec(noinline) void __stdcall func2() {
long si[88/4];
__stosb((unsigned char*)si, 0, sizeof(si));
}

#pragma optimize("gt", on)
#define icount 10
int __cdecl main() {
int i; unsigned long long t, t0[icount], t1[icount], t2[icount];

for (i = 0; i < icount; i++) {
t = __rdtsc();
func0();
t0[i] = __rdtsc() - t;
}

for (i = 0; i < icount; i++) {
t = __rdtsc();
func1();
t1[i] = __rdtsc() - t;
}

for (i = 0; i < icount; i++) {
t = __rdtsc();
func2();
t2[i] = __rdtsc() - t;
}

printf("func0\tfunc1\tfunc2\n");
for (i = 0; i < icount; i ++) {
printf("%I64u\t%I64u\t%I64u\n", t0[i], t1[i], t2[i]);
}
return 0;
}
#pragma optimize("", on)
Post by Vincent Fatica
|It's strange that why to use loop instead of __stosb on none _M_AMD64 cpu.
|On my AMD32 cpu, __stosb (114 clocks) is faster than this loop (195 clocks).
How do you time such things?
--
- Vince
Vincent Fatica
2009-09-09 14:12:04 UTC
Permalink
On Wed, 9 Sep 2009 10:48:28 +0300, "Alex Blekhman" <***@yahoo.com>
wrote:

|"Vincent Fatica" wrote:
|> |You can use the ZeroMemory macro, which results in a call to
|> ntdll.dll's RtlZeroMemory routine instead of pulling in the CRT.
|>
|> Tried that ... results in a call to _memset.
|
|I found in the WinNT.h header the SecureZeroMemory macro, which
|expands to the RtlSecureZeroMemory function. It is implemented
|inline in the header file, so it doesn't have any external
|dependencies. Basically, it's a straightforward implementation of
|memset.

Yes, I discovered that early on. It works, using 7 bytes more text than stosd
(if I recall correctly).
--
- Vince
Continue reading on narkive:
Loading...