You are viewing the version of this documentation from Perl 5.20.2. View the latest version

CONTENTS

NAME

charnames - access to Unicode character names and named character sequences; also define character names

SYNOPSIS

use charnames ':full';
print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
print "\N{LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW}",
      " is an officially named sequence of two Unicode characters\n";

use charnames ':loose';
print "\N{Greek small-letter  sigma}",
       "can be used to ignore case, underscores, most blanks,"
       "and when you aren't sure if the official name has hyphens\n";

use charnames ':short';
print "\N{greek:Sigma} is an upper-case sigma.\n";

use charnames qw(cyrillic greek);
print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n";

use utf8;
use charnames ":full", ":alias" => {
  e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
  mychar => 0xE8000,  # Private use area
  "自転車に乗る人" => "BICYCLIST"
};
print "\N{e_ACUTE} is a small letter e with an acute.\n";
print "\N{mychar} allows me to name private use characters.\n";
print "And I can create synonyms in other languages,",
      " such as \N{自転車に乗る人} for "BICYCLIST (U+1F6B4)\n";

use charnames ();
print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE"
printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints
                                                         # "10330"
print charnames::vianame("LATIN CAPITAL LETTER A"); # prints 65 on
                                                    # ASCII platforms;
                                                    # 193 on EBCDIC
print charnames::string_vianame("LATIN CAPITAL LETTER A"); # prints "A"

DESCRIPTION

Pragma use charnames is used to gain access to the names of the Unicode characters and named character sequences, and to allow you to define your own character and character sequence names.

All forms of the pragma enable use of the following 3 functions:

Starting in Perl v5.16, any occurrence of \N{CHARNAME} sequences in a double-quotish string automatically loads this module with arguments :full and :short (described below) if it hasn't already been loaded with different arguments, in order to compile the named Unicode character into position in the string. Prior to v5.16, an explicit use charnames was required to enable this usage. (However, prior to v5.16, the form "use charnames ();" did not enable \N{CHARNAME}.)

Note that \N{U+...}, where the ... is a hexadecimal number, also inserts a character into a string. The character it inserts is the one whose Unicode code point (ordinal value) is equal to the number. For example, "\N{U+263a}" is the Unicode (white background, black foreground) smiley face equivalent to "\N{WHITE SMILING FACE}". Also note, \N{...} can mean a regex quantifier instead of a character name, when the ... is a number (or comma separated pair of numbers (see "QUANTIFIERS" in perlreref), and is not related to this pragma.

The charnames pragma supports arguments :full, :loose, :short, script names and customized aliases.

If :full is present, for expansion of \N{CHARNAME}, the string CHARNAME is first looked up in the list of standard Unicode character names.

:loose is a variant of :full which allows CHARNAME to be less precisely specified. Details are in "LOOSE MATCHES".

If :short is present, and CHARNAME has the form SCRIPT:CNAME, then CNAME is looked up as a letter in script SCRIPT, as described in the next paragraph. Or, if use charnames is used with script name arguments, then for \N{CHARNAME} the name CHARNAME is looked up as a letter in the given scripts (in the specified order). Customized aliases can override these, and are explained in "CUSTOM ALIASES".

For lookup of CHARNAME inside a given script SCRIPTNAME, this pragma looks in the table of standard Unicode names for the names

SCRIPTNAME CAPITAL LETTER CHARNAME
SCRIPTNAME SMALL LETTER CHARNAME
SCRIPTNAME LETTER CHARNAME

If CHARNAME is all lowercase, then the CAPITAL variant is ignored, otherwise the SMALL variant is ignored, and both CHARNAME and SCRIPTNAME are converted to all uppercase for look-up. Other than that, both of them follow loose rules if :loose is also specified; strict otherwise.

Note that \N{...} is compile-time; it's a special form of string constant used inside double-quotish strings; this means that you cannot use variables inside the \N{...}. If you want similar run-time functionality, use charnames::string_vianame().

Note, starting in Perl 5.18, the name BELL refers to the Unicode character U+1F514, instead of the traditional U+0007. For the latter, use ALERT or BEL.

It is a syntax error to use \N{NAME} where NAME is unknown.

For \N{NAME}, it is a fatal error if use bytes is in effect and the input name is that of a character that won't fit into a byte (i.e., whose ordinal is above 255).

Otherwise, any string that includes a \N{charname} or \N{U+code point} will automatically have Unicode rules (see