| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlunicode - Unicode support in Perl
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | =head2 Important Caveats
|
|---|
| 8 |
|
|---|
| 9 | Unicode support is an extensive requirement. While Perl does not
|
|---|
| 10 | implement the Unicode standard or the accompanying technical reports
|
|---|
| 11 | from cover to cover, Perl does support many Unicode features.
|
|---|
| 12 |
|
|---|
| 13 | =over 4
|
|---|
| 14 |
|
|---|
| 15 | =item Input and Output Layers
|
|---|
| 16 |
|
|---|
| 17 | Perl knows when a filehandle uses Perl's internal Unicode encodings
|
|---|
| 18 | (UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with
|
|---|
| 19 | the ":utf8" layer. Other encodings can be converted to Perl's
|
|---|
| 20 | encoding on input or from Perl's encoding on output by use of the
|
|---|
| 21 | ":encoding(...)" layer. See L<open>.
|
|---|
| 22 |
|
|---|
| 23 | To indicate that Perl source itself is using a particular encoding,
|
|---|
| 24 | see L<encoding>.
|
|---|
| 25 |
|
|---|
| 26 | =item Regular Expressions
|
|---|
| 27 |
|
|---|
| 28 | The regular expression compiler produces polymorphic opcodes. That is,
|
|---|
| 29 | the pattern adapts to the data and automatically switches to the Unicode
|
|---|
| 30 | character scheme when presented with Unicode data--or instead uses
|
|---|
| 31 | a traditional byte scheme when presented with byte data.
|
|---|
| 32 |
|
|---|
| 33 | =item C<use utf8> still needed to enable UTF-8/UTF-EBCDIC in scripts
|
|---|
| 34 |
|
|---|
| 35 | As a compatibility measure, the C<use utf8> pragma must be explicitly
|
|---|
| 36 | included to enable recognition of UTF-8 in the Perl scripts themselves
|
|---|
| 37 | (in string or regular expression literals, or in identifier names) on
|
|---|
| 38 | ASCII-based machines or to recognize UTF-EBCDIC on EBCDIC-based
|
|---|
| 39 | machines. B<These are the only times when an explicit C<use utf8>
|
|---|
| 40 | is needed.> See L<utf8>.
|
|---|
|
|---|