perldata - Perl data types
Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes". A scalar is a single string (of any size, limited only by the available memory), number, or a reference to something (which will be discussed in perlref). Normal arrays are ordered lists of scalars indexed by number, starting with 0. Hashes are unordered collections of scalar values indexed by their associated string key.
Values are usually referred to by name, or through a named reference. The first character of the name tells you to what sort of data structure it refers. The rest of the name tells you the particular value to which it refers. Usually this name is a single identifier, that is, a string beginning with a letter or underscore, and containing letters, underscores, and digits. In some cases, it may be a chain of identifiers, separated by ::
(or by the slightly archaic '
); all but the last are interpreted as names of packages, to locate the namespace in which to look up the final identifier (see "Packages" in perlmod for details). It's possible to substitute for a simple identifier, an expression that produces a reference to the value at runtime. This is described in more detail below and in perlref.
Perl also has its own built-in variables whose names don't follow these rules. They have strange names so they don't accidentally collide with one of your normal variables. Strings that match parenthesized parts of a regular expression are saved under names containing only digits after the $
(see perlop and perlre). In addition, several special variables that provide windows into the inner working of Perl have names containing punctuation characters and control characters. These are documented in perlvar.
Scalar values are always named with '$', even when referring to a scalar that is part of an array or a hash. The '$' symbol works semantically like the English word "the" in that it indicates a single value is expected.
$days # the simple scalar value "days"
$days[28] # the 29th element of array @days
$days{'Feb'} # the 'Feb' value from hash %days
$#days # the last index of array @days
Entire arrays (and slices of arrays and hashes) are denoted by '@', which works much like the word "these" or "those" does in English, in that it indicates multiple values are expected.
@days # ($days[0], $days[1],... $days[n])
@days[3,4,5] # same as ($days[3],$days[4],$days[5])
@days{'a','c'} # same as ($days{'a'},$days{'c'})
Entire hashes are denoted by '%':
%days # (key1, val1, key2, val2 ...)
In addition, subroutines are named with an initial '&', though this is optional when unambiguous, just as the word "do" is often redundant in English. Symbol table entries can be named with an initial '*', but you don't really care about that yet (if ever :-).
Every variable type has its own namespace, as do several non-variable identifiers. This means that you can, without fear of conflict, use the same name for a scalar variable, an array, or a hash--or, for that matter, for a filehandle, a directory handle, a subroutine name, a format name, or a label. This means that $foo and @foo are two different variables. It also means that $foo[1]
is a part of @foo, not a part of $foo. This may seem a bit weird, but that's okay, because it is weird.
Because variable references always start with '$', '@', or '%', the "reserved" words aren't in fact reserved with respect to variable names. They are reserved with respect to labels and filehandles, however, which don't have an initial special character. You can't have a filehandle named "log", for instance. Hint: you could say open(LOG,'logfile')
rather than open(log,'logfile')
. Using uppercase filehandles also improves readability and protects you from conflict with future reserved words. Case is significant--"FOO", "Foo", and "foo" are all different names. Names that start with a letter or underscore may also contain digits and underscores.
It is possible to replace such an alphanumeric name with an expression that returns a reference to the appropriate type. For a description of this, see perlref.
Names that start with a digit may contain only more digits. Names that do not start with a letter, underscore, digit or a caret (i.e. a control character) are limited to one character, e.g., $%
or $$
. (Most of these one character names have a predefined significance to Perl. For instance, $$
is the current process id.)
The interpretation of operations and values in Perl sometimes depends on the requirements of the context around the operation or value. There are two major contexts: list and scalar. Certain operations return list values in contexts wanting a list, and scalar values otherwise. If this is true of an operation it will be mentioned in the documentation for that operation. In other words, Perl overloads certain operations based on whether the expected return value is singular or plural. Some words in English work this way, like "fish" and "sheep".
In a reciprocal fashion, an operation provides either a scalar or a list context to each of its arguments. For example, if you say
int( <STDIN> )
the integer operation provides scalar context for the <> operator, which responds by reading one line from STDIN and passing it back to the integer operation, which will then find the integer value of that line and return that. If, on the other hand, you say
sort( <STDIN> )
then the sort operation provides list context for <>, which will proceed to read every line available up to the end of file, and pass that list of lines back to the sort routine, which will then sort those lines and return them as a list to whatever the context of the sort was.
Assignment is a little bit special in that it uses its left argument to determine the context for the right argument. Assignment to a scalar evaluates the right-hand side in scalar context, while assignment to an array or hash evaluates the righthand side in list context. Assignment to a list (or slice, which is just a list anyway) also evaluates the righthand side in list context.
When you use the use warnings
pragma or Perl's -w command-line option, you may see warnings about useless uses of constants or functions in "void context". Void context just means the value has been discarded, such as a statement containing only "fred";
or getpwuid(0);
. It still counts as scalar context for functions that care whether or not they're being called in list context.
User-defined subroutines may choose to care whether they are being called in a void, scalar, or list context. Most subroutines do not need to bother, though. That's because both scalars and lists are automatically interpolated into lists. See "wantarray" in perlfunc for how you would dynamically discern your function's calling context.