You are viewing the version of this documentation from Perl 5.12.0. View the latest version

CONTENTS

NAME

perltooc - Tom's OO Tutorial for Class Data in Perl

DESCRIPTION

When designing an object class, you are sometimes faced with the situation of wanting common state shared by all objects of that class. Such class attributes act somewhat like global variables for the entire class, but unlike program-wide globals, class attributes have meaning only to the class itself.

Here are a few examples where class attributes might come in handy:

Unlike a true global, class attributes should not be accessed directly. Instead, their state should be inspected, and perhaps altered, only through the mediated access of class methods. These class attributes accessor methods are similar in spirit and function to accessors used to manipulate the state of instance attributes on an object. They provide a clear firewall between interface and implementation.

You should allow access to class attributes through either the class name or any object of that class. If we assume that $an_object is of type Some_Class, and the &Some_Class::population_count method accesses class attributes, then these two invocations should both be possible, and almost certainly equivalent.

Some_Class->population_count()
$an_object->population_count()

The question is, where do you store the state which that method accesses? Unlike more restrictive languages like C++, where these are called static data members, Perl provides no syntactic mechanism to declare class attributes, any more than it provides a syntactic mechanism to declare instance attributes. Perl provides the developer with a broad set of powerful but flexible features that can be uniquely crafted to the particular demands of the situation.

A class in Perl is typically implemented in a module. A module consists of two complementary feature sets: a package for interfacing with the outside world, and a lexical file scope for privacy. Either of these two mechanisms can be used to implement class attributes. That means you get to decide whether to put your class attributes in package variables or to put them in lexical variables.

And those aren't the only decisions to make. If you choose to use package variables, you can make your class attribute accessor methods either ignorant of inheritance or sensitive to it. If you choose lexical variables, you can elect to permit access to them from anywhere in the entire file scope, or you can limit direct data access exclusively to the methods implementing those attributes.

Class Data in a Can

One of the easiest ways to solve a hard problem is to let someone else do it for you! In this case, Class::Data::Inheritable (available on a CPAN near you) offers a canned solution to the class data problem using closures. So before you wade into this document, consider having a look at that module.

Class Data as Package Variables

Because a class in Perl is really just a package, using package variables to hold class attributes is the most natural choice. This makes it simple for each class to have its own class attributes. Let's say you have a class called Some_Class that needs a couple of different attributes that you'd like to be global to the entire class. The simplest thing to do is to use package variables like $Some_Class::CData1 and $Some_Class::CData2 to hold these attributes. But we certainly don't want to encourage outsiders to touch those data directly, so we provide methods to mediate access.

In the accessor methods below, we'll for now just ignore the first argument--that part to the left of the arrow on method invocation, which is either a class name or an object reference.

    package Some_Class;
    sub CData1 {
	shift;	# XXX: ignore calling class/object
	$Some_Class::CData1 = shift if @_;
	return $Some_Class::CData1;
    } 
    sub CData2 {
	shift;	# XXX: ignore calling class/object
	$Some_Class::CData2 = shift if @_;
	return $Some_Class::CData2;
    } 

This technique is highly legible and should be completely straightforward to even the novice Perl programmer. By fully qualifying the package variables, they stand out clearly when reading the code. Unfortunately, if you misspell one of these, you've introduced an error that's hard to catch. It's also somewhat disconcerting to see the class name itself hard-coded in so many places.

Both these problems can be easily fixed. Just add the use strict pragma, then pre-declare your package variables. (The our operator will be new in 5.6, and will work for package globals just like my works for scoped lexicals.)

    package Some_Class;
    use strict;
    our($CData1, $CData2);   	# our() is new to perl5.6
    sub CData1 {
	shift;	# XXX: ignore calling class/object
	$CData1 = shift if @_;
	return $CData1;
    } 
    sub CData2 {
	shift;	# XXX: ignore calling class/object
	$CData2 = shift if @_;
	return $CData2;
    } 

As with any other global variable, some programmers prefer to start their package variables with capital letters. This helps clarity somewhat, but by no longer fully qualifying the package variables, their significance can be lost when reading the code. You can fix this easily enough by choosing better names than were used here.

Putting All Your Eggs in One Basket

Just as the mindless enumeration of accessor methods for instance attributes grows tedious after the first few (see perltoot), so too does the repetition begin to grate when listing out accessor methods for class data. Repetition runs counter to the primary virtue of a programmer: Laziness, here manifesting as that innate urge every programmer feels to factor out duplicate code whenever possible.

Here's what to do. First, make just one hash to hold all class attributes.

    package Some_Class;
    use strict;
    our %ClassData = (   	# our() is new to perl5.6
	CData1 => "",
	CData2 => "",
    );

Using closures (see perlref) and direct access to the package symbol table (see perlmod), now clone an accessor method for each key in the %ClassData hash. Each of these methods is used to fetch or store values to the specific, named class attribute.

    for my $datum (keys %ClassData) {
	no strict "refs";	# to register new methods in package
	*$datum = sub {
	    shift;	# XXX: ignore calling class/object
	    $ClassData{$datum} = shift if @_;
	    return $ClassData{$datum};
	} 
    } 

It's true that you could work out a solution employing an &AUTOLOAD method, but this approach is unlikely to prove satisfactory. Your function would have to distinguish between class attributes and object attributes; it could interfere with inheritance; and it would have to careful about DESTROY. Such complexity is uncalled for in most cases, and certainly in this one.

You may wonder why we're rescinding strict refs for the loop. We're manipulating the package's symbol table to introduce new function names using symbolic references (indirect naming), which the strict pragma would otherwise forbid. Normally, symbolic references are a dodgy notion at best. This isn't just because they can be used accidentally when you aren't meaning to. It's also because for most uses to which beginning Perl programmers attempt to put symbolic references, we have much better approaches, like nested hashes or hashes of arrays. But there's nothing wrong with using symbolic references to manipulate something that is meaningful only from the perspective of the package symbol table, like method names or package variables. In other words, when you want to refer to the symbol table, use symbol references.

Clustering all the class attributes in one place has several advantages. They're easy to spot, initialize, and change. The aggregation also makes them convenient to access externally, such as from a debugger or a persistence package. The only possible problem is that we don't automatically know the name of each class's class object, should it have one. This issue is addressed below in "The Eponymous Meta-Object".

Inheritance Concerns

Suppose you have an instance of a derived class, and you access class data using an inherited method call. Should that end up referring to the base class's attributes, or to those in the derived class? How would it work in the earlier examples? The derived class inherits all the base class's methods, including those that access class attributes. But what package are the class attributes stored in?

The answer is that, as written, class attributes are stored in the package into which those methods were compiled. When you invoke the &CData1 method on the name of the derived class or on one of that class's objects, the version shown above is still run, so you'll access $Some_Class::CData1--or in the method cloning version, $Some_Class::ClassData{CData1}.

Think of these class methods as executing in the context of their base class, not in that of their derived class. Sometimes this is exactly what you want. If Feline subclasses Carnivore, then the population of Carnivores in the world should go up when a new Feline is born. But what if you wanted to figure out how many Felines you have apart from Carnivores? The current approach doesn't support that.

You'll have to decide on a case-by-case basis whether it makes any sense for class attributes to be package-relative. If you want it to be so, then stop ignoring the first argument to the function. Either it will be a package name if the method was invoked directly on a class name, or else it will be an object reference if the method was invoked on an object reference. In the latter case, the ref() function provides the class of that object.

    package Some_Class;
    sub CData1 {
	my $obclass = shift;	
	my $class   = ref($obclass) || $obclass;
	my $varname = $class . "::CData1";
	no strict "refs"; 	# to access package data symbolically
	$$varname = shift if @_;
	return $$varname;
    } 

And then do likewise for all other class attributes (such as CData2, etc.) that you wish to access as package variables in the invoking package instead of the compiling package as we had previously.

Once again we temporarily disable the strict references ban, because otherwise we couldn't use the fully-qualified symbolic name for the package global. This is perfectly reasonable: since all package variables by definition live in a package, there's nothing wrong with accessing them via that package's symbol table. That's what it's there for (well, somewhat).

What about just using a single hash for everything and then cloning methods? What would that look like? The only difference would be the closure used to produce new method entries for the class's symbol table.

    no strict "refs";	
    *$datum = sub {
	my $obclass = shift;	
	my $class   = ref($obclass) || $obclass;
	my $varname = $class . "::ClassData";
	$varname->{$datum} = shift if @_;
	return $varname->{$datum};
    }

The Eponymous Meta-Object

It could be argued that the %ClassData hash in the previous example is neither the most imaginative nor the most intuitive of names. Is there something else that might make more sense, be more useful, or both?

As it happens, yes, there is. For the "class meta-object", we'll use a package variable of the same name as the package itself. Within the scope of a package Some_Class declaration, we'll use the eponymously named hash %Some_Class as that class's meta-object. (Using an eponymously named hash is somewhat reminiscent of classes that name their constructors eponymously in the Python or C++ fashion. That is, class Some_Class would use &Some_Class::Some_Class as a constructor, probably even exporting that name as well. The StrNum class in Recipe 13.14 in The Perl Cookbook does this, if you're looking for an example.)

This predictable approach has many benefits, including having a well-known identifier to aid in debugging, transparent persistence, or checkpointing. It's also the obvious name for monadic classes and translucent attributes, discussed later.

Here's an example of such a class. Notice how the name of the hash storing the meta-object is the same as the name of the package used to implement the class.

    package Some_Class;
    use strict;

    # create class meta-object using that most perfect of names
    our %Some_Class = (   	# our() is new to perl5.6
	CData1 => "",
	CData2 => "",
    );

    # this accessor is calling-package-relative
    sub CData1 {
	my $obclass = shift;	
	my $class   = ref($obclass) || $obclass;
	no strict "refs"; 	# to access eponymous meta-object
	$class->{CData1} = shift if @_;
	return $class->{CData1};
    }

    # but this accessor is not
    sub CData2 {
	shift;			# XXX: ignore calling class/object
	no strict "refs"; 	# to access eponymous meta-object
	__PACKAGE__ -> {CData2} = shift if @_;
	return __PACKAGE__ -> {CData2};
    } 

In the second accessor method, the __PACKAGE__ notation was used for two reasons. First, to avoid hardcoding the literal package name in the code in case we later want to change that name. Second, to clarify to the reader that what matters here is the package currently being compiled into, not the package of the invoking object or class. If the long sequence of non-alphabetic characters bothers you, you can always put the __PACKAGE__ in a variable first.

    sub CData2 {
	shift;			# XXX: ignore calling class/object
	no strict "refs"; 	# to access eponymous meta-object
	my $class = __PACKAGE__;
	$class->{CData2} = shift if @_;
	return $class->{CData2};
    } 

Even though we're using symbolic references for good not evil, some folks tend to become unnerved when they see so many places with strict ref checking disabled. Given a symbolic reference, you can always produce a real reference (the reverse is not true, though). So we'll create a subroutine that does this conversion for us. If invoked as a function of no arguments, it returns a reference to the compiling class's eponymous hash. Invoked as a class method, it returns a reference to the eponymous hash of its caller. And when invoked as an object method, this function returns a reference to the eponymous hash for whatever class the object belongs to.

    package Some_Class;
    use strict;

    our %Some_Class = (   	# our() is new to perl5.6
	CData1 => "",
	CData2 => "",
    );

    # tri-natured: function, class method, or object method
    sub _classobj {
	my $obclass = shift || __PACKAGE__;
	my $class   = ref($obclass) || $obclass;
	no strict "refs";   # to convert sym ref to real one
	return \%$class;
    } 

    for my $datum (keys %{ _classobj() } ) { 
	# turn off strict refs so that we can
	# register a method in the symbol table
	no strict "refs";    	
	*$datum = sub {
	    use strict "refs";
	    my $self = shift->_classobj();
	    $self->{$datum} = shift if @_;
	    return $self->{$datum};
	}
    }

Indirect References to Class Data

A reasonably common strategy for handling class attributes is to store a reference to each package variable on the object itself. This is a strategy you've probably seen before, such as in perltoot and perlbot, but there may be variations in the example below that you haven't thought of before.

    package Some_Class;
    our($CData1, $CData2);      	# our() is new to perl5.6

    sub new {
	my $obclass = shift;
	return bless my $self = {
	    ObData1 => "",
	    ObData2 => "",
	    CData1  => \$CData1,
	    CData2  => \$CData2,
	} => (ref $obclass || $obclass);
    } 

    sub ObData1 {
	my $self = shift;
	$self->{ObData1} = shift if @_;
	return $self->{ObData1};
    } 

    sub ObData2 {
	my $self = shift;
	$self->{ObData2} = shift if @_;
	return $self->{ObData2};
    } 

    sub CData1 {
	my $self = shift;
	my $dataref = ref $self
			? $self->{CData1}
			: \$CData1;
	$$dataref = shift if @_;
	return $$dataref;
    } 

    sub CData2 {
	my $self = shift;
	my $dataref = ref $self
			? $self->{CData2}
			: \$CData2;
	$$dataref = shift if @_;
	return $$dataref;
    } 

As written above, a derived class will inherit these methods, which will consequently access package variables in the base class's package. This is not necessarily expected behavior in all circumstances. Here's an example that uses a variable meta-object, taking care to access the proper package's data.

package Some_Class;
use strict;

our %Some_Class = (   	# our() is new to perl5.6
    CData1 => "",
    CData2 => "",
);

sub _classobj {
    my $self  = shift;
    my $class = ref($self) || $self;
    no strict "refs";
    # get (hard) ref to eponymous meta-object
    return \%$class;
} 

sub new {
    my $obclass  = shift;
    my $classobj = $obclass->_classobj();
    bless my $self = {
	ObData1 => "",
	ObData2 => "",
	CData1  => \$classobj->{CData1},
	CData2  => \$classobj->{CData2},
    } => (ref $obclass || $obclass);
    return $self;
} 

sub ObData1 {
    my $self = shift;
    $self->{ObData1} = shift if @_;
    return $self->{ObData1};
} 

sub ObData2 {
    my $self = shift;
    $self->{ObData2} = shift if @_;
    return $self->{ObData2};
} 

sub CData1 {
    my $self = shift;
    $self = $self->_classobj() unless ref $self;
    my $dataref = $self->{CData1};
    $$dataref = shift if @_;
    return $$dataref;
} 

sub CData2 {
    my $self = shift;
    $self = $self->_classobj() unless ref $self;
    my $dataref = $self->{CData2};
    $$dataref = shift if @_;
    return $$dataref;
} 

Not only are we now strict refs clean, using an eponymous meta-object seems to make the code cleaner. Unlike the previous version, this one does something interesting in the face of inheritance: it accesses the class meta-object in the invoking class instead of the one into which the method was initially compiled.

You can easily access data in the class meta-object, making it easy to dump the complete class state using an external mechanism such as when debugging or implementing a persistent class. This works because the class meta-object is a package variable, has a well-known name, and clusters all its data together. (Transparent persistence is not always feasible, but it's certainly an appealing idea.)

There's still no check that object accessor methods have not been invoked on a class name. If strict ref checking is enabled, you'd blow up. If not, then you get the eponymous meta-object. What you do with--or about--this is up to you. The next two sections demonstrate innovative uses for this powerful feature.