Encode - character encodings in Perl
use Encode qw(decode encode);
$characters = decode('UTF-8', $octets, Encode::FB_CROAK);
$octets = encode('UTF-8', $characters, Encode::FB_CROAK);
Encode consists of a collection of modules whose details are too extensive to fit in one document. This one itself explains the top-level APIs and general topics at a glance. For other topics and more details, see the documentation for these modules:
The Encode
module provides the interface between Perl strings and the rest of the system. Perl strings are sequences of characters.
The repertoire of characters that Perl can represent is a superset of those defined by the Unicode Consortium. On most platforms the ordinal values of a character as returned by ord(S)
is the Unicode codepoint for that character. The exceptions are platforms where the legacy encoding is some variant of EBCDIC rather than a superset of ASCII; see perlebcdic.
During recent history, data is moved around a computer in 8-bit chunks, often called "bytes" but also known as "octets" in standards documents. Perl is widely used to manipulate data of many types: not only strings of characters representing human or computer languages, but also "binary" data, being the machine's representation of numbers, pixels in an image, or just about anything.
When Perl is processing "binary data", the programmer wants Perl to process "sequences of bytes". This is not a problem for Perl: because a byte has 256 possible values, it easily fits in Perl's much larger "logical character".
This document mostly explains the how. perlunitut and perlunifaq explain the why.
A character in the range 0 .. 2**32-1 (or more); what Perl's strings are made of.
A character in the range 0..255; a special case of a Perl character.