UnicodeEncoding Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Represents a UTF-16 encoding of Unicode characters.
public ref class UnicodeEncoding : System::Text::Encoding
public class UnicodeEncoding : System.Text.Encoding
[System.Serializable]
public class UnicodeEncoding : System.Text.Encoding
[System.Serializable]
[System.Runtime.InteropServices.ComVisible(true)]
public class UnicodeEncoding : System.Text.Encoding
type UnicodeEncoding = class
inherit Encoding
[<System.Serializable>]
type UnicodeEncoding = class
inherit Encoding
[<System.Serializable>]
[<System.Runtime.InteropServices.ComVisible(true)>]
type UnicodeEncoding = class
inherit Encoding
Public Class UnicodeEncoding
Inherits Encoding
- Inheritance
- Attributes
Examples
The following example demonstrates how to encode a string of Unicode characters into a byte array by using a UnicodeEncoding object. The byte array is decoded into a string to demonstrate that there is no loss of data.
using System;
using System.Text;
class UnicodeEncodingExample {
public static void Main() {
// The encoding.
UnicodeEncoding unicode = new UnicodeEncoding();
// Create a string that contains Unicode characters.
String unicodeString =
"This Unicode string contains two characters " +
"with codes outside the traditional ASCII code range, " +
"Pi (\u03a0) and Sigma (\u03a3).";
Console.WriteLine("Original string:");
Console.WriteLine(unicodeString);
// Encode the string.
Byte[] encodedBytes = unicode.GetBytes(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes) {
Console.Write("[{0}]", b);
}
Console.WriteLine();
// Decode bytes back to string.
// Notice Pi and Sigma characters are still present.
String decodedString = unicode.GetString(encodedBytes);
Console.WriteLine();
Console.WriteLine("Decoded bytes:");
Console.WriteLine(decodedString);
}
}
Imports System.Text
Imports Microsoft.VisualBasic.Strings
Class UnicodeEncodingExample
Public Shared Sub Main()
' The encoding.
Dim uni As New UnicodeEncoding()
' Create a string that contains Unicode characters.
Dim unicodeString As String = _
"This Unicode string contains two characters " & _
"with codes outside the traditional ASCII code range, " & _
"Pi (" & ChrW(928) & ") and Sigma (" & ChrW(931) & ")."
Console.WriteLine("Original string:")
Console.WriteLine(unicodeString)
' Encode the string.
Dim encodedBytes As Byte() = uni.GetBytes(unicodeString)
Console.WriteLine()
Console.WriteLine("Encoded bytes:")
Dim b As Byte
For Each b In encodedBytes
Console.Write("[{0}]", b)
Next b
Console.WriteLine()
' Decode bytes back to string.
' Notice Pi and Sigma characters are still present.
Dim decodedString As String = uni.GetString(encodedBytes)
Console.WriteLine()
Console.WriteLine("Decoded bytes:")
Console.WriteLine(decodedString)
End Sub
End Class
The following example uses the same string as the previous one, except that it writes the encoded bytes to a file and prefixes the byte stream with a byte order mark (BOM). It then reads the file in two different ways: as a text file by using a StreamReader object; and as a binary file. As you would expect, neither newly-read string includes the BOM.
using System;
using System.IO;
using System.Text;
public class Example
{
public static void Main()
{
// Create a UTF-16 encoding that supports a BOM.
Encoding unicode = new UnicodeEncoding();
// A Unicode string with two characters outside an 8-bit code range.
String unicodeString =
"This Unicode string has 2 characters outside the " +
"ASCII range: \n" +
"Pi (\u03A0)), and Sigma (\u03A3).";
Console.WriteLine("Original string:");
Console.WriteLine(unicodeString);
Console.WriteLine();
// Encode the string.
Byte[] encodedBytes = unicode.GetBytes(unicodeString);
Console.WriteLine("The encoded string has {0} bytes.\n",
encodedBytes.Length);
// Write the bytes to a file with a BOM.
var fs = new FileStream(@".\UTF8Encoding.txt", FileMode.Create);
Byte[] bom = unicode.GetPreamble();
fs.Write(bom, 0, bom.Length);
fs.Write(encodedBytes, 0, encodedBytes.Length);
Console.WriteLine("Wrote {0} bytes to the file.\n", fs.Length);
fs.Close();
// Open the file using StreamReader.
var sr = new StreamReader(@".\UTF8Encoding.txt");
String newString = sr.ReadToEnd();
sr.Close();
Console.WriteLine("String read using StreamReader:");
Console.WriteLine(newString);
Console.WriteLine();
// Open the file as a binary file and decode the bytes back to a string.
fs = new FileStream(@".\UTF8Encoding.txt", FileMode.Open);
Byte[] bytes = new Byte[fs.Length];
fs.Read(bytes, 0, (int)fs.Length);
fs.Close();
String decodedString = unicode.GetString(bytes);
Console.WriteLine("Decoded bytes:");
Console.WriteLine(decodedString);
}
}
// The example displays the following output:
// Original string:
// This Unicode string has 2 characters outside the ASCII range:
// Pi (π), and Sigma (Σ).
//
// The encoded string has 172 bytes.
//
// Wrote 174 bytes to the file.
//
// String read using StreamReader:
// This Unicode string has 2 characters outside the ASCII range:
// Pi (π), and Sigma (Σ).
//
// Decoded bytes:
// This Unicode string has 2 characters outside the ASCII range:
// Pi (π), and Sigma (Σ).
Imports System.IO
Imports System.Text
Class Example
Public Shared Sub Main()
' Create a UTF-16 encoding that supports a BOM.
Dim unicode As New UnicodeEncoding()
' A Unicode string with two characters outside an 8-bit code range.
Dim unicodeString As String = _
"This Unicode string has 2 characters outside the " &
"ASCII range: " & vbCrLf &
"Pi (" & ChrW(&h03A0) & "), and Sigma (" & ChrW(&h03A3) & ")."
Console.WriteLine("Original string:")
Console.WriteLine(unicodeString)
Console.WriteLine()
' Encode the string.
Dim encodedBytes As Byte() = unicode.GetBytes(unicodeString)
Console.WriteLine("The encoded string has {0} bytes.",
encodedBytes.Length)
Console.WriteLine()
' Write the bytes to a file with a BOM.
Dim fs As New FileStream(".\UnicodeEncoding.txt", FileMode.Create)
Dim bom() As Byte = unicode.GetPreamble()
fs.Write(bom, 0, bom.Length)
fs.Write(encodedBytes, 0, encodedBytes.Length)
Console.WriteLine("Wrote {0} bytes to the file.", fs.Length)
fs.Close()
Console.WriteLine()
' Open the file using StreamReader.
Dim sr As New StreamReader(".\UnicodeEncoding.txt")
Dim newString As String = sr.ReadToEnd()
sr.Close()
Console.WriteLine("String read using StreamReader:")
Console.WriteLine(newString)
Console.WriteLine()
' Open the file as a binary file and decode the bytes back to a string.
fs = new FileStream(".\UnicodeEncoding.txt", FileMode.Open)
Dim bytes(fs.Length - 1) As Byte
fs.Read(bytes, 0, fs.Length)
fs.Close()
Dim decodedString As String = unicode.GetString(bytes)
Console.WriteLine("Decoded bytes:")
Console.WriteLine(decodedString)
End Sub
End Class
' The example displays the following output:
' Original string:
' This Unicode string has 2 characters outside the ASCII range:
' Pi (π), and Sigma (Σ).
'
' The encoded string has 172 bytes.
'
' Wrote 174 bytes to the file.
'
' String read using StreamReader:
' This Unicode string has 2 characters outside the ASCII range:
' Pi (π), and Sigma (Σ).
'
' Decoded bytes:
' This Unicode string has 2 characters outside the ASCII range:
' Pi (π), and Sigma (Σ).
Remarks
Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters.
The Unicode Standard assigns a code point (a number) to each character in every supported script. A Unicode Transformation Format (UTF) is a way to encode that code point. The Unicode Standard uses the following UTFs:
UTF-8, which represents each code point as a sequence of one to four bytes.
UTF-16, which represents each code point as a sequence of one to two 16-bit integers.
UTF-32, which represents each code point as a 32-bit integer.
For more information about the UTFs and other encodings supported by System.Text, see Character Encoding in the .NET Framework.
The UnicodeEncoding class represents a UTF-16 encoding. The encoder can use either big endian byte order (most significant byte first) or little endian byte order (least significant byte first). For example, the Latin Capital Letter A (code point U+0041) is serialized as follows (in hexadecimal):
Big endian byte order: 00 00 00 41
Little endian byte order: 41 00 00 00
It is generally more efficient to store Unicode characters using the native byte order of a particular platform. For example, it is better to use the little endian byte order on little endian platforms, such as Intel computers. The UnicodeEncoding class corresponds to the Windows code pages 1200 (little endian byte order) and 1201 (big endian byte order). You can determine the "endianness" of a particular architecture by calling the BitConverter.IsLittleEndian method.
Optionally, the UnicodeEncoding object provides a byte order mark (BOM), which is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process. If the preamble contains a byte order mark (BOM), it helps the decoder determine the byte order and the transformation format or UTF.
If the UnicodeEncoding instance is configured to provide a BOM, you can retrieve it by calling the GetPreamble method; otherwise, the method returns an empty array. Note that, even if a UnicodeEncoding object is configured for BOM support, you must include the BOM at the beginning of the encoded byte stream as appropriate; the encoding methods of the UnicodeEncoding class do not do this automatically.
Caution
To enable error detection and to make the class instance more secure, you should instantiate a UnicodeEncoding object by calling the UnicodeEncoding(Boolean, Boolean, Boolean) constructor and setting its throwOnInvalidBytes argument to true. With error detection, a method that detects an invalid sequence of characters or bytes throws a ArgumentException. Without error detection, no exception is thrown, and the invalid sequence is generally ignored.
You can instantiate a UnicodeEncoding object in a number of ways, depending on whether you want to it to provide a byte order mark (BOM), whether you want big-endian or little-endian encoding, and whether you want to enable error detection. The following table lists the UnicodeEncoding constructors and the Encoding properties that return a UnicodeEncoding object.
| Member | Endianness | BOM | Error detection |
|---|---|---|---|
| BigEndianUnicode | Big-endian | Yes | No (Replacement fallback) |
| Encoding.Unicode | Little-endian | Yes | No (Replacement fallback) |
| UnicodeEncoding.UnicodeEncoding() | Little-endian | Yes | No (Replacement fallback) |
| UnicodeEncoding(Boolean, Boolean) | Configurable | Configurable | No (Replacement fallback) |