g***@hotmail.com
2010-03-11 16:42:59 UTC
Hello all,
this may be a difficult to explain problem, and I need some assembly
to show the difference. In a DLL we export some STL containers to
minimize code bloat, like:
template class __declspec(dllexport) std::vector<int>;
typedef std::vector<int> int_vector;
In a simple test probgram I see now a huge difference in performance.
The c++ function is as follows (same as std::fill, but this is just
example):
void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
for (size_t n = 0; n != nLoop; ++n)
{
const int_vector::iterator itEnd = pVector->end();
for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
{
*it = nValue;
}
}
}
In the assembly code somehow exception handling has been put in, and
this gets updated in the loop, which is major performance issue (see
'//! <- difference'):
void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
00401D30 push 0FFFFFFFFh
00401D32 push offset __ehhandler$?PrfMemoryIterator@@YAXPAV?
$***@HV?$***@H@std@@@std@@***@Z (403718h)
00401D37 mov eax,dword ptr fs:[00000000h]
00401D3D push eax
00401D3E mov dword ptr fs:[0],esp
00401D45 sub esp,4Ch
00401D48 mov eax,dword ptr [___security_cookie (406270h)]
00401D4D xor eax,esp
00401D4F push edi
00401D50 mov edi,ecx
<snip>
for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
00401D7D lea ecx,[esp+4]
00401D81 push ecx
00401D82 mov ecx,ebx
00401D84 call dword ptr
[__imp_std::vector<int,std::allocator<int> >::begin (404004h)]
00401D8A mov eax,dword ptr [esp+4]
00401D8E cmp eax,dword ptr [esp+8]
00401D92 je PrfMemoryIterator+79h (401DA9h)
{
*it = nValue;
00401D94 mov dword ptr [eax],esi
00401D96 mov eax,dword ptr [esp+4] //! <- difference
00401D9A mov ecx,dword ptr [esp+8] //! <- difference
00401D9E add eax,4
00401DA1 cmp eax,ecx
00401DA3 mov dword ptr [esp+4],eax //! <- difference
00401DA7 jne PrfMemoryIterator+64h (401D94h)
However if we not export the STL containers, the generated code is
different:
void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
00401F60 sub esp,44h
00401F63 mov eax,dword ptr [___security_cookie (406290h)]
00401F68 xor eax,esp
00401F6A push edi
00401F6B mov edi,ecx
<snip>
for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
00401F86 mov eax,dword ptr [ebx+4]
00401F89 cmp eax,ecx
00401F8B je PrfMemoryIterator+39h (401F99h)
00401F8D lea ecx,[ecx]
{
*it = nValue;
00401F90 mov dword ptr [eax],esi
00401F92 add eax,4
00401F95 cmp eax,ecx
00401F97 jne PrfMemoryIterator+30h (401F90h)
I use vstudio 2003 here, but I noticed something similar with the
_SECURE_SCL option in vstudio 2008, which also makes a difference from
a performance perspective .
Can anyone help? It is probably somewhere in the exception handling
corner, however why would this make a difference when using exported
classes or not?
Thx in advance.
this may be a difficult to explain problem, and I need some assembly
to show the difference. In a DLL we export some STL containers to
minimize code bloat, like:
template class __declspec(dllexport) std::vector<int>;
typedef std::vector<int> int_vector;
In a simple test probgram I see now a huge difference in performance.
The c++ function is as follows (same as std::fill, but this is just
example):
void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
for (size_t n = 0; n != nLoop; ++n)
{
const int_vector::iterator itEnd = pVector->end();
for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
{
*it = nValue;
}
}
}
In the assembly code somehow exception handling has been put in, and
this gets updated in the loop, which is major performance issue (see
'//! <- difference'):
void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
00401D30 push 0FFFFFFFFh
00401D32 push offset __ehhandler$?PrfMemoryIterator@@YAXPAV?
$***@HV?$***@H@std@@@std@@***@Z (403718h)
00401D37 mov eax,dword ptr fs:[00000000h]
00401D3D push eax
00401D3E mov dword ptr fs:[0],esp
00401D45 sub esp,4Ch
00401D48 mov eax,dword ptr [___security_cookie (406270h)]
00401D4D xor eax,esp
00401D4F push edi
00401D50 mov edi,ecx
<snip>
for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
00401D7D lea ecx,[esp+4]
00401D81 push ecx
00401D82 mov ecx,ebx
00401D84 call dword ptr
[__imp_std::vector<int,std::allocator<int> >::begin (404004h)]
00401D8A mov eax,dword ptr [esp+4]
00401D8E cmp eax,dword ptr [esp+8]
00401D92 je PrfMemoryIterator+79h (401DA9h)
{
*it = nValue;
00401D94 mov dword ptr [eax],esi
00401D96 mov eax,dword ptr [esp+4] //! <- difference
00401D9A mov ecx,dword ptr [esp+8] //! <- difference
00401D9E add eax,4
00401DA1 cmp eax,ecx
00401DA3 mov dword ptr [esp+4],eax //! <- difference
00401DA7 jne PrfMemoryIterator+64h (401D94h)
However if we not export the STL containers, the generated code is
different:
void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
00401F60 sub esp,44h
00401F63 mov eax,dword ptr [___security_cookie (406290h)]
00401F68 xor eax,esp
00401F6A push edi
00401F6B mov edi,ecx
<snip>
for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
00401F86 mov eax,dword ptr [ebx+4]
00401F89 cmp eax,ecx
00401F8B je PrfMemoryIterator+39h (401F99h)
00401F8D lea ecx,[ecx]
{
*it = nValue;
00401F90 mov dword ptr [eax],esi
00401F92 add eax,4
00401F95 cmp eax,ecx
00401F97 jne PrfMemoryIterator+30h (401F90h)
I use vstudio 2003 here, but I noticed something similar with the
_SECURE_SCL option in vstudio 2008, which also makes a difference from
a performance perspective .
Can anyone help? It is probably somewhere in the exception handling
corner, however why would this make a difference when using exported
classes or not?
Thx in advance.