Discussion:
Efficient way of Searching in Array..
(too old to reply)
Sachin
2009-10-05 04:24:01 UTC
Permalink
Hi i have a requirement of searching into Character array for a character
with ASCII value greater than 255 and replacing the character by another
value..

to be more specific what i am doing is escaping characters in char buffer by
its equivalent escape sequence to form an HTML
so i will be escaping all characters with value greater than 255 with
&<intvalue>;
so that the resultant HTML would render it properly .

now in average case my char array will contain more than 50000 characters
the very priliminary algorithm would look like

void CHTML::EscapeUnicodeCharacters(wstring& value)
{
wstring newbuff;
const WCHAR* buff = value.c_str();
int len = value.length();

for (int i = 0; i<len ; i++)
{
if( value[i] >127)
{
WCHAR* wchr = new WCHAR[10];
memset(wchr, 0, 10);
swprintf_s(wchr, 10, L"&%ld;",value[i]);
newbuff.append(wchr);
delete[] wchr;
}
else
newbuff.insert(newbuff.length(), 1, value[i]);
}

value.clear();
value.assign(newbuff);
}

to sumup what i have done is search linearly in input array and side by side
prepare second array , if character value is less than 127 put it as it is in
output array .. else
insert escape sequence in output array.


case we optimise the alogorithm from linear to logarithmic ?
Wayne A. King
2009-10-05 07:44:47 UTC
Permalink
On Sun, 4 Oct 2009 21:24:01 -0700, Sachin
Post by Sachin
case we optimise the alogorithm from linear to logarithmic ?
If you have to examine each character to determine it's value, there's
no alternative to linear processing of the array.

One optimization you should use is to get rid of the new[]/delete[] -
they add unnecessary overhead. A simple static array on the stack will
be faster, and should be declared before the loop.

Also consider giving newbuff an initial reserve to minimize the
overhead associated with expansion, which may require frequent
new allocations, copying and deleting.

- Wayne

- Wayne A. King
***@rogers.com
Doug Harrison [MVP]
2009-10-05 21:47:21 UTC
Permalink
On Sun, 4 Oct 2009 21:24:01 -0700, Sachin
Post by Sachin
Hi i have a requirement of searching into Character array for a character
with ASCII value greater than 255 and replacing the character by another
value..
to be more specific what i am doing is escaping characters in char buffer by
its equivalent escape sequence to form an HTML
so i will be escaping all characters with value greater than 255 with
&<intvalue>;
so that the resultant HTML would render it properly .
now in average case my char array will contain more than 50000 characters
the very priliminary algorithm would look like
void CHTML::EscapeUnicodeCharacters(wstring& value)
{
wstring newbuff;
const WCHAR* buff = value.c_str();
int len = value.length();
for (int i = 0; i<len ; i++)
{
if( value[i] >127)
{
WCHAR* wchr = new WCHAR[10];
memset(wchr, 0, 10);
swprintf_s(wchr, 10, L"&%ld;",value[i]);
newbuff.append(wchr);
delete[] wchr;
Get rid of new/delete and use a local buffer.
Post by Sachin
}
else
newbuff.insert(newbuff.length(), 1, value[i]);
You should use push_back here.
Post by Sachin
}
value.clear();
value.assign(newbuff);
No need to call clear(); it doesn't accomplish anything.
Post by Sachin
}
to sumup what i have done is search linearly in input array and side by side
prepare second array , if character value is less than 127 put it as it is in
output array .. else
insert escape sequence in output array.
case we optimise the alogorithm from linear to logarithmic ?
No, but you can decrease the constant factor by minimizing the number of
append/push_back operations. Instead of copying ASCII character by
character, append everything from a start position up to the non-ASCII
character that stopped the inner scanning loop. Then escape the non-ASCII
character, set the start pos just beyond it, and resume the scan for a
non-ASCII character. You can further optimize by avoiding copying the last
ASCII fragment to newbuff; if this fragment is the whole length of the
source string, there were no non-ASCII characters, and so you're done, but
if not, you could shift the remaining characters over to make room for
newbuff, which you would then copy into the front of the source string.
Finally, to minimize the number of reallocations, you might consider using
reserve() on newbuff when you find the first non-ASCII character.
--
Doug Harrison
Visual C++ MVP
Continue reading on narkive:
Loading...