First time here? You are looking at the most recent posts. You may also want to check out older archives or the tag cloud. Please leave a comment, ask a question and consider subscribing to the latest posts via RSS. Thank you for visiting! (hide this)


There are 1 entries for the tag UTF-8

Beware the Unicode Byte Order Mark when merging files

What is the Unicode Byte Order Mark (BOM)? Every Unicode string starts with a "zero-width no-break space"; depending on its actual byte representation, a text processor understands the endianness of the characters that follow the BOM. For example, in UTF-16, if the first character of a string is represented as FE FF it means that the bytes in the string are represented using the Big Endian order, while if the string was using the Little Endian order, the first character would have been represented as FF FE. This rule apply only to UFT-16 and UTF-32 encoded strings, and not to UTF-8 (since there is just...