source: trunk/essentials/dev-lang/perl/pod/perldsc.pod@ 3439

Last change on this file since 3439 was 3181, checked in by bird, 19 years ago

perl 5.8.8

File size: 24.9 KB
Line 
1=head1 NAME
2X<data structure> X<complex data structure> X<struct>
3
4perldsc - Perl Data Structures Cookbook
5
6=head1 DESCRIPTION
7
8The single feature most sorely lacking in the Perl programming language
9prior to its 5.0 release was complex data structures. Even without direct
10language support, some valiant programmers did manage to emulate them, but
11it was hard work and not for the faint of heart. You could occasionally
12get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
13keys are actually more like a single concatenated string C<"$AoA$b">, but
14traversal and sorting were difficult. More desperate programmers even
15hacked Perl's internal symbol table directly, a strategy that proved hard
16to develop and maintain--to put it mildly.
17
18The 5.0 release of Perl let us have complex data structures. You
19may now write something like this and all of a sudden, you'd have an array
20with three dimensions!
21
22 for $x (1 .. 10) {
23 for $y (1 .. 10) {
24 for $z (1 .. 10) {
25 $AoA[$x][$y][$z] =
26 $x ** $y + $z;
27 }
28 }
29 }
30
31Alas, however simple this may appear, underneath it's a much more
32elaborate construct than meets the eye!
33
34How do you print it out? Why can't you say just C<print @AoA>? How do
35you sort it? How can you pass it to a function or get one of these back
36from a function? Is it an object? Can you save it to disk to read
37back later? How do you access whole rows or columns of that matrix? Do
38all the values have to be numeric?
39
40As you see, it's quite easy to become confused. While some small portion
41of the blame for this can be attributed to the reference-based
42implementation, it's really more due to a lack of existing documentation with
43examples designed for the beginner.
44
45This document is meant to be a detailed but understandable treatment of the
46many different sorts of data structures you might want to develop. It
47should also serve as a cookbook of examples. That way, when you need to
48create one of these complex data structures, you can just pinch, pilfer, or
49purloin a drop-in example from here.
50
51Let's look at each of these possible constructs in detail. There are separate
52sections on each of the following:
53
54=over 5
55
56=item * arrays of arrays
57
58=item * hashes of arrays
59
60=item * arrays of hashes
61
62=item * hashes of hashes
63
64=item * more elaborate constructs
65
66=back
67
68But for now, let's look at general issues common to all
69these types of data structures.
70
71=head1 REFERENCES
72X<reference> X<dereference> X<dereferencing> X<pointer>
73
74The most important thing to understand about all data structures in Perl
75-- including multidimensional arrays--is that even though they might
76appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
77one-dimensional. They can hold only scalar values (meaning a string,
78number, or a reference). They cannot directly contain other arrays or
79hashes, but instead contain I<references> to other arrays or hashes.
80X<multidimensional array> X<array, multidimensional>
81
82You can't use a reference to an array or hash in quite the same way that you
83would a real array or hash. For C or C++ programmers unused to
84distinguishing between arrays and pointers to the same, this can be
85confusing. If so, just think of it as the difference between a structure
86and a pointer to a structure.
87
88You can (and should) read more about references in the perlref(1) man
89page. Briefly, references are rather like pointers that know what they
90point to. (Objects are also a kind of reference, but we won't be needing
91them right away--if ever.) This means that when you have something which
92looks to you like an access to a two-or-more-dimensional array and/or hash,
93what's really going on is that the base type is
94merely a one-dimensional entity that contains references to the next
95level. It's just that you can I<use> it as though it were a
96two-dimensional one. This is actually the way almost all C
97multidimensional arrays work as well.
98
99 $array[7][12] # array of arrays
100 $array[7]{string} # array of hashes
101 $hash{string}[7] # hash of arrays
102 $hash{string}{'another string'} # hash of hashes
103
104Now, because the top level contains only references, if you try to print
105out your array in with a simple print() function, you'll get something
106that doesn't look very nice, like this:
107
108 @AoA = ( [2, 3], [4, 5, 7], [0] );
109 print $AoA[1][2];
110 7
111 print @AoA;
112 ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
113
114
115That's because Perl doesn't (ever) implicitly dereference your variables.
116If you want to get at the thing a reference is referring to, then you have
117to do this yourself using either prefix typing indicators, like
118C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
119like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
120
121=head1 COMMON MISTAKES
122
123The two most common mistakes made in constructing something like
124an array of arrays is either accidentally counting the number of
125elements or else taking a reference to the same memory location
126repeatedly. Here's the case where you just get the count instead
127of a nested array:
128
129 for $i (1..10) {
130 @array = somefunc($i);
131 $AoA[$i] = @array; # WRONG!
132 }
133
134That's just the simple case of assigning an array to a scalar and getting
135its element count. If that's what you really and truly want, then you
136might do well to consider being a tad more explicit about it, like this:
137
138 for $i (1..10) {
139 @array = somefunc($i);
140 $counts[$i] = scalar @array;
141 }
142
143Here's the case of taking a reference to the same memory location
144again and again:
145
146 for $i (1..10) {
147 @array = somefunc($i);
148 $AoA[$i] = \@array; # WRONG!
149 }
150
151So, what's the big problem with that? It looks right, doesn't it?
152After all, I just told you that you need an array of references, so by
153golly, you've made me one!
154
155Unfortunately, while this is true, it's still broken. All the references
156in @AoA refer to the I<very same place>, and they will therefore all hold
157whatever was last in @array! It's similar to the problem demonstrated in
158the following C program:
159
160 #include <pwd.h>
161 main() {
162 struct passwd *getpwnam(), *rp, *dp;
163 rp = getpwnam("root");
164 dp = getpwnam("daemon");
165
166 printf("daemon name is %s\nroot name is %s\n",
167 dp->pw_name, rp->pw_name);
168 }
169
170Which will print
171
172 daemon name is daemon
173 root name is daemon
174
175The problem is that both C<rp> and C<dp> are pointers to the same location
176in memory! In C, you'd have to remember to malloc() yourself some new
177memory. In Perl, you'll want to use the array constructor C<[]> or the
178hash constructor C<{}> instead. Here's the right way to do the preceding
179broken code fragments:
180X<[]> X<{}>
181
182 for $i (1..10) {
183 @array = somefunc($i);
184 $AoA[$i] = [ @array ];
185 }
186
187The square brackets make a reference to a new array with a I<copy>
188of what's in @array at the time of the assignment. This is what
189you want.
190
191Note that this will produce something similar, but it's
192much harder to read:
193
194 for $i (1..10) {
195 @array = 0 .. $i;
196 @{$AoA[$i]} = @array;
197 }
198
199Is it the same? Well, maybe so--and maybe not. The subtle difference
200is that when you assign something in square brackets, you know for sure
201it's always a brand new reference with a new I<copy> of the data.
202Something else could be going on in this new case with the C<@{$AoA[$i]}}>
203dereference on the left-hand-side of the assignment. It all depends on
204whether C<$AoA[$i]> had been undefined to start with, or whether it
205already contained a reference. If you had already populated @AoA with
206references, as in
207
208 $AoA[3] = \@another_array;
209
210Then the assignment with the indirection on the left-hand-side would
211use the existing reference that was already there:
212
213 @{$AoA[3]} = @array;
214
215Of course, this I<would> have the "interesting" effect of clobbering
216@another_array. (Have you ever noticed how when a programmer says
217something is "interesting", that rather than meaning "intriguing",
218they're disturbingly more apt to mean that it's "annoying",
219"difficult", or both? :-)
220
221So just remember always to use the array or hash constructors with C<[]>
222or C<{}>, and you'll be fine, although it's not always optimally
223efficient.
224
225Surprisingly, the following dangerous-looking construct will
226actually work out fine:
227
228 for $i (1..10) {
229 my @array = somefunc($i);
230 $AoA[$i] = \@array;
231 }
232
233That's because my() is more of a run-time statement than it is a
234compile-time declaration I<per se>. This means that the my() variable is
235remade afresh each time through the loop. So even though it I<looks> as
236though you stored the same variable reference each time, you actually did
237not! This is a subtle distinction that can produce more efficient code at
238the risk of misleading all but the most experienced of programmers. So I
239usually advise against teaching it to beginners. In fact, except for
240passing arguments to functions, I seldom like to see the gimme-a-reference
241operator (backslash) used much at all in code. Instead, I advise
242beginners that they (and most of the rest of us) should try to use the
243much more easily understood constructors C<[]> and C<{}> instead of
244relying upon lexical (or dynamic) scoping and hidden reference-counting to
245do the right thing behind the scenes.
246
247In summary:
248
249 $AoA[$i] = [ @array ]; # usually best
250 $AoA[$i] = \@array; # perilous; just how my() was that array?
251 @{ $AoA[$i] } = @array; # way too tricky for most programmers
252
253
254=head1 CAVEAT ON PRECEDENCE
255X<dereference, precedence> X<dereferencing, precedence>
256
257Speaking of things like C<@{$AoA[$i]}>, the following are actually the
258same thing:
259X<< -> >>
260
261 $aref->[2][2] # clear
262 $$aref[2][2] # confusing
263
264That's because Perl's precedence rules on its five prefix dereferencers
265(which look like someone swearing: C<$ @ * % &>) make them bind more
266tightly than the postfix subscripting brackets or braces! This will no
267doubt come as a great shock to the C or C++ programmer, who is quite
268accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
269element of C<a>. That is, they first take the subscript, and only then
270dereference the thing at that subscript. That's fine in C, but this isn't C.
271
272The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
273the deref of $aref, making it take $aref as a reference to an
274array, and then dereference that, and finally tell you the I<i'th> value
275of the array pointed to by $AoA. If you wanted the C notion, you'd have to
276write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
277before the leading C<$> dereferencer.
278
279=head1 WHY YOU SHOULD ALWAYS C<use strict>
280
281If this is starting to sound scarier than it's worth, relax. Perl has
282some features to help you avoid its most common pitfalls. The best
283way to avoid getting confused is to start every program like this:
284
285 #!/usr/bin/perl -w
286 use strict;
287
288This way, you'll be forced to declare all your variables with my() and
289also disallow accidental "symbolic dereferencing". Therefore if you'd done
290this:
291
292 my $aref = [
293 [ "fred", "barney", "pebbles", "bambam", "dino", ],
294 [ "homer", "bart", "marge", "maggie", ],
295 [ "george", "jane", "elroy", "judy", ],
296 ];
297
298 print $aref[2][2];
299
300The compiler would immediately flag that as an error I<at compile time>,
301because you were accidentally accessing C<@aref>, an undeclared
302variable, and it would thereby remind you to write instead:
303
304 print $aref->[2][2]
305
306=head1 DEBUGGING
307X<data structure, debugging> X<complex data structure, debugging>
308X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging>
309X<array of arrays, debugging> X<hash of arrays, debugging>
310X<array of hashes, debugging> X<hash of hashes, debugging>
311
312Before version 5.002, the standard Perl debugger didn't do a very nice job of
313printing out complex data structures. With 5.002 or above, the
314debugger includes several new features, including command line editing as
315well as the C<x> command to dump out complex data structures. For
316example, given the assignment to $AoA above, here's the debugger output:
317
318 DB<1> x $AoA
319 $AoA = ARRAY(0x13b5a0)
320 0 ARRAY(0x1f0a24)
321 0 'fred'
322 1 'barney'
323 2 'pebbles'
324 3 'bambam'
325 4 'dino'
326 1 ARRAY(0x13b558)
327 0 'homer'
328 1 'bart'
329 2 'marge'
330 3 'maggie'
331 2 ARRAY(0x13b540)
332 0 'george'
333 1 'jane'
334 2 'elroy'
335 3 'judy'
336
337=head1 CODE EXAMPLES
338
339Presented with little comment (these will get their own manpages someday)
340here are short code examples illustrating access of various
341types of data structures.
342
343=head1 ARRAYS OF ARRAYS
344X<array of arrays> X<AoA>
345
346=head2 Declaration of an ARRAY OF ARRAYS
347