Post by JackFor simplicity, now pretend I make the assumption of all of them are made
up of 16-bit characters, What is the general method of converting the
stream of BYTEs into foreign characters?
Question up front: what is a foreign character? Anything but Latin letters
and Arabian numbers? Hebrew is much older than Latin, so isn't Latin the
foreign one then?
Anyway, back from philosophy to programming. Without knowing the encoding of
the byte data, there is no way to convert it to anything. Further, and that
is something you might have missed, Unicode (not UNICODE) is not a file
format or anything like that. Rather, it is a standard that also defines
several file or transfer formats (UTF = Unicode Transfer Format). Now, MS
Windows internally uses WCHAR, which encodes Unicode codepoints using
UTF-16.
If your file happens to be UTF-16 encoded, too, you can just memcpy() the
file content into a WCHAR array and you're done. If it happens to be the
big-endian variant of UTF-16, you will further have to swap every two
bytes. Also, if you want your code to run on anything non-MS Windows or on
a big-endian machine (which MS-Windows doesn't run on anyways), you will
have to perform further conversions.
BTW: It would help if you showed a short snippet of a hexdump of the file in
question including the text it should represent. It should then be easy to
figure out if it is one of the common encodings.
Good luck!
Uli
--
C++ FAQ: http://parashift.com/c++-faq-lite
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932