charnames - access to Unicode character names and named character sequences; also define character names
use charnames ':full';
print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
print "\N{LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW}",
" is an officially named sequence of two Unicode characters\n";
use charnames ':loose';
print "\N{Greek small-letter sigma}",
"can be used to ignore case, underscores, most blanks,"
"and when you aren't sure if the official name has hyphens\n";
use charnames ':short';
print "\N{greek:Sigma} is an upper-case sigma.\n";
use charnames qw(cyrillic greek);
print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n";
use utf8;
use charnames ":full", ":alias" => {
e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
mychar => 0xE8000, # Private use area
"自転車に乗る人" => "BICYCLIST"
};
print "\N{e_ACUTE} is a small letter e with an acute.\n";
print "\N{mychar} allows me to name private use characters.\n";
print "And I can create synonyms in other languages,",
" such as \N{自転車に乗る人} for "BICYCLIST (U+1F6B4)\n";
use charnames ();
print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE"
printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints
# "10330"
print charnames::vianame("LATIN CAPITAL LETTER A"); # prints 65 on
# ASCII platforms;
# 193 on EBCDIC
print charnames::string_vianame("LATIN CAPITAL LETTER A"); # prints "A"
Pragma use charnames
is used to gain access to the names of the Unicode characters and named character sequences, and to allow you to define your own character and character sequence names.
All forms of the pragma enable use of the following 3 functions:
"charnames::string_vianame(name)" for run-time lookup of a either a character name or a named character sequence, returning its string representation
"charnames::vianame(name)" for run-time lookup of a character name (but not a named character sequence) to get its ordinal value (code point)
"charnames::viacode(code)" for run-time lookup of a code point to get its Unicode name.
Starting in Perl v5.16, any occurrence of \N{CHARNAME}
sequences in a double-quotish string automatically loads this module with arguments :full
and :short
(described below) if it hasn't already been loaded with different arguments, in order to compile the named Unicode character into position in the string. Prior to v5.16, an explicit use charnames
was required to enable this usage. (However, prior to v5.16, the form "use charnames ();"
did not enable \N{CHARNAME}
.)
Note that \N{U+...}
, where the ... is a hexadecimal number, also inserts a character into a string. The character it inserts is the one whose Unicode code point (ordinal value) is equal to the number. For example, "\N{U+263a}"
is the Unicode (white background, black foreground) smiley face equivalent to "\N{WHITE SMILING FACE}"
. Also note, \N{...}
can mean a regex quantifier instead of a character name, when the ... is a number (or comma separated pair of numbers (see "QUANTIFIERS" in perlreref), and is not related to this pragma.
The charnames
pragma supports arguments :full
, :loose
, :short
, script names and customized aliases.
If :full
is present, for expansion of \N{CHARNAME}
, the string CHARNAME is first looked up in the list of standard Unicode character names.
:loose
is a variant of :full
which allows CHARNAME to be less precisely specified. Details are in "LOOSE MATCHES".
If :short
is present, and CHARNAME has the form SCRIPT:CNAME
, then CNAME is looked up as a letter in script SCRIPT, as described in the next paragraph. Or, if use charnames
is used with script name arguments, then for \N{CHARNAME}
the name CHARNAME is looked up as a letter in the given scripts (in the specified order). Customized aliases can override these, and are explained in "CUSTOM ALIASES".
For lookup of CHARNAME inside a given script SCRIPTNAME, this pragma looks in the table of standard Unicode names for the names
SCRIPTNAME CAPITAL LETTER CHARNAME
SCRIPTNAME SMALL LETTER CHARNAME
SCRIPTNAME LETTER CHARNAME
If CHARNAME is all lowercase, then the CAPITAL
variant is ignored, otherwise the SMALL
variant is ignored, and both CHARNAME and SCRIPTNAME are converted to all uppercase for look-up. Other than that, both of them follow loose rules if :loose
is also specified; strict otherwise.
Note that \N{...}
is compile-time; it's a special form of string constant used inside double-quotish strings; this means that you cannot use variables inside the \N{...}
. If you want similar run-time functionality, use charnames::string_vianame().
Note, starting in Perl 5.18, the name BELL
refers to the Unicode character U+1F514, instead of the traditional U+0007. For the latter, use ALERT
or BEL
.
It is a syntax error to use \N{NAME}
where NAME
is unknown.
For \N{NAME}
, it is a fatal error if use bytes
is in effect and the input name is that of a character that won't fit into a byte (i.e., whose ordinal is above 255).
Otherwise, any string that includes a \N{charname}
or \N{U+code point}
will automatically have Unicode rules (see