You are viewing the version of this documentation from Perl 5.22.0. View the latest version

CONTENTS

NAME

Encode::Unicode -- Various Unicode Transformation Formats

SYNOPSIS

use Encode qw/encode decode/;
$ucs2 = encode("UCS-2BE", $utf8);
$utf8 = decode("UCS-2BE", $ucs2);

ABSTRACT

This module implements all Character Encoding Schemes of Unicode that are officially documented by Unicode Consortium (except, of course, for UTF-8, which is a native format in perl).

http://www.unicode.org/glossary/ says:

Character Encoding Scheme A character encoding form plus byte serialization. There are Seven character encoding schemes in Unicode: UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32 (UCS-4), UTF-32BE (UCS-4BE) and UTF-32LE (UCS-4LE), and UTF-7.

Since UTF-7 is a 7-bit (re)encoded version of UTF-16BE, It is not part of Unicode's Character Encoding Scheme. It is separately implemented in Encode::Unicode::UTF7. For details see Encode::Unicode::UTF7.

Quick Reference
              Decodes from ord(N)           Encodes chr(N) to...
     octet/char BOM S.P d800-dfff  ord > 0xffff     \x{1abcd} ==
---------------+-----------------+------------------------------
UCS-2BE       2   N   N  is bogus                  Not Available
UCS-2LE       2   N   N     bogus                  Not Available
UTF-16      2/4   Y   Y  is   S.P           S.P            BE/LE
UTF-16BE    2/4   N   Y       S.P           S.P    0xd82a,0xdfcd
UTF-16LE    2/4   N   Y       S.P           S.P    0x2ad8,0xcddf
UTF-32        4   Y   -  is bogus         As is            BE/LE
UTF-32BE      4   N   -     bogus         As is       0x0001abcd
UTF-32LE      4   N   -     bogus         As is       0xcdab0100
UTF-8       1-4   -   -     bogus   >= 4 octets   \xf0\x9a\af\8d
---------------+-----------------+------------------------------