You are viewing the version of this documentation from Perl 5.30.3. View the latest version

CONTENTS

NAME

Unicode::Normalize - Unicode Normalization Forms

SYNOPSIS

(1) using function names exported by default:

use Unicode::Normalize;

$NFD_string  = NFD($string);  # Normalization Form D
$NFC_string  = NFC($string);  # Normalization Form C
$NFKD_string = NFKD($string); # Normalization Form KD
$NFKC_string = NFKC($string); # Normalization Form KC

(2) using function names exported on request:

use Unicode::Normalize 'normalize';

$NFD_string  = normalize('D',  $string);  # Normalization Form D
$NFC_string  = normalize('C',  $string);  # Normalization Form C
$NFKD_string = normalize('KD', $string);  # Normalization Form KD
$NFKC_string = normalize('KC', $string);  # Normalization Form KC

DESCRIPTION

Parameters:

$string is used as a string under character semantics (see perlunicode).

$code_point should be an unsigned integer representing a Unicode code point.

Note: Between XSUB and pure Perl, there is an incompatibility about the interpretation of $code_point as a decimal number. XSUB converts $code_point to an unsigned integer, but pure Perl does not. Do not use a floating point nor a negative sign in $code_point.

Normalization Forms

$NFD_string = NFD($string)

It returns the Normalization Form D (formed by canonical decomposition).

$NFC_string = NFC($string)

It returns the Normalization Form C (formed by canonical decomposition followed by canonical composition).

$NFKD_string = NFKD($string)

It returns the Normalization Form KD (formed by compatibility decomposition).

$NFKC_string = NFKC($string)

It returns the Normalization Form KC (formed by compatibility decomposition followed by canonical composition).

$FCD_string = FCD($string)

If the given string is in FCD ("Fast C or D" form; cf. UTN #5), it returns the string without modification; otherwise it returns an FCD string.

Note: FCD is not always unique, then plural forms may be equivalent each other. FCD() will return one of these equivalent forms.

$FCC_string = FCC($string)

It returns the FCC form ("Fast C Contiguous"; cf. UTN #5).

Note: FCC is unique, as well as four normalization forms (NF*).

$normalized_string = normalize($form_name, $string)

It returns the normalization form of $form_name.

As $form_name, one of the following names must be given.

'C'  or 'NFC'  for Normalization Form C  (UAX #15)
'D'  or 'NFD'  for Normalization Form D  (UAX #15)
'KC' or 'NFKC' for Normalization Form KC (UAX #15)
'KD' or 'NFKD' for Normalization Form KD (UAX #15)

'FCD'          for "Fast C or D" Form  (UTN #5)
'FCC'          for "Fast C Contiguous" (UTN #5)

Decomposition and Composition

$decomposed_string = decompose($string [, $useCompatMapping])

It returns the concatenation of the decomposition of each character in the string.

If the second parameter (a boolean) is omitted or false, the decomposition is canonical decomposition; if the second parameter (a boolean) is true, the decomposition is compatibility decomposition.

The string returned is not always in NFD/NFKD. Reordering may be required.

$NFD_string  = reorder(decompose($string));       # eq. to NFD()
$NFKD_string = reorder(decompose($string, TRUE)); # eq. to NFKD()
$reordered_string = reorder($string)

It returns the result of reordering the combining characters according to Canonical Ordering Behavior.

For example, when you have a list of NFD/NFKD strings, you can get the concatenated NFD/NFKD string from them, by saying

$concat_NFD  = reorder(join '', @NFD_strings);
$concat_NFKD = reorder(join '', @NFKD_strings);