See Also: UnicodeEncoding Members
System.Text.UnicodeEncoding encodes each Unicode character in UTF-16, i.e. as two consecutive bytes. Both little-endian and big-endian encodings are supported.
On little-endian platforms such as Intel machines, it is generally more efficient to store Unicode characters in little-endian. However, many other platforms can store Unicode characters in big-endian. Unicode files can be distinguished by the presence of the byte order mark (U+FEFF), which is written as either 0xfe 0xff or 0xff 0xfe.
This System.Text.Encoding implementation can detect a byte order mark automatically and switch byte orders, based on a parameter specified in the constructor.
ISO/IEC 10646 defines UCS-2 and UCS-4. UCS-4 is a four-byte (32-bit) encoding containing 231 code positions, divided into 128 groups of 256 planes. Each plane contains 216 code positions. UCS-2 is a two-byte (16-bit) encoding containing the 216 code positions of UCS-4 for which the upper two bytes are zero, known as Plane Zero or the Basic Multilingual Plane (BMP). For example, the code position for LATIN CAPITAL LETTER A in UCS-4 is 0x00000041 whereas in UCS-2 it is 0x0041.
ISO/IEC 10646 also defines UTF-16, which stands for "UCS Transformation Format for 16 Planes of Group 00". UTF-16 is a two byte encoding that uses an extension mechanism to represent 221 code positions. UTF-16 represents code positions in Plane Zero by its UCS-2 code value and code positions in Planes 1 through 16 by a pair of special code values, called surrogates. UTF-16 is equivalent to the Unicode Standard. For a detailed description of UTF-16 and surrogates, see "The Unicode Standard Version 3.0" Appendix C.