source: trunk/essentials/dev-lang/perl/pod/perlhack.pod@ 3397

Last change on this file since 3397 was 3181, checked in by bird, 19 years ago

perl 5.8.8

File size: 99.8 KB
Line 
1=head1 NAME
2
3perlhack - How to hack at the Perl internals
4
5=head1 DESCRIPTION
6
7This document attempts to explain how Perl development takes place,
8and ends with some suggestions for people wanting to become bona fide
9porters.
10
11The perl5-porters mailing list is where the Perl standard distribution
12is maintained and developed. The list can get anywhere from 10 to 150
13messages a day, depending on the heatedness of the debate. Most days
14there are two or three patches, extensions, features, or bugs being
15discussed at a time.
16
17A searchable archive of the list is at either:
18
19 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
20
21or
22
23 http://archive.develooper.com/[email protected]/
24
25List subscribers (the porters themselves) come in several flavours.
26Some are quiet curious lurkers, who rarely pitch in and instead watch
27the ongoing development to ensure they're forewarned of new changes or
28features in Perl. Some are representatives of vendors, who are there
29to make sure that Perl continues to compile and work on their
30platforms. Some patch any reported bug that they know how to fix,
31some are actively patching their pet area (threads, Win32, the regexp
32engine), while others seem to do nothing but complain. In other
33words, it's your usual mix of technical people.
34
35Over this group of porters presides Larry Wall. He has the final word
36in what does and does not change in the Perl language. Various
37releases of Perl are shepherded by a "pumpking", a porter
38responsible for gathering patches, deciding on a patch-by-patch,
39feature-by-feature basis what will and will not go into the release.
40For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
41Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and
42Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release.
43
44In addition, various people are pumpkings for different things. For
45instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
46I<Configure> pumpkin up till the 5.8 release. For the 5.10 release
47H.Merijn Brand took over.
48
49Larry sees Perl development along the lines of the US government:
50there's the Legislature (the porters), the Executive branch (the
51pumpkings), and the Supreme Court (Larry). The legislature can
52discuss and submit patches to the executive branch all they like, but
53the executive branch is free to veto them. Rarely, the Supreme Court
54will side with the executive branch over the legislature, or the
55legislature over the executive branch. Mostly, however, the
56legislature and the executive branch are supposed to get along and
57work out their differences without impeachment or court cases.
58
59You might sometimes see reference to Rule 1 and Rule 2. Larry's power
60as Supreme Court is expressed in The Rules:
61
62=over 4
63
64=item 1
65
66Larry is always by definition right about how Perl should behave.
67This means he has final veto power on the core functionality.
68
69=item 2
70
71Larry is allowed to change his mind about any matter at a later date,
72regardless of whether he previously invoked Rule 1.
73
74=back
75
76Got that? Larry is always right, even when he was wrong. It's rare
77to see either Rule exercised, but they are often alluded to.
78
79New features and extensions to the language are contentious, because
80the criteria used by the pumpkings, Larry, and other porters to decide
81which features should be implemented and incorporated are not codified
82in a few small design goals as with some other languages. Instead,
83the heuristics are flexible and often difficult to fathom. Here is
84one person's list, roughly in decreasing order of importance, of
85heuristics that new features have to be weighed against:
86
87=over 4
88
89=item Does concept match the general goals of Perl?
90
91These haven't been written anywhere in stone, but one approximation
92is:
93
94 1. Keep it fast, simple, and useful.
95 2. Keep features/concepts as orthogonal as possible.
96 3. No arbitrary limits (platforms, data sizes, cultures).
97 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
98 5. Either assimilate new technologies, or build bridges to them.
99
100=item Where is the implementation?
101
102All the talk in the world is useless without an implementation. In
103almost every case, the person or people who argue for a new feature
104will be expected to be the ones who implement it. Porters capable
105of coding new features have their own agendas, and are not available
106to implement your (possibly good) idea.
107
108=item Backwards compatibility
109
110It's a cardinal sin to break existing Perl programs. New warnings are
111contentious--some say that a program that emits warnings is not
112broken, while others say it is. Adding keywords has the potential to
113break programs, changing the meaning of existing token sequences or
114functions might break programs.
115
116=item Could it be a module instead?
117
118Perl 5 has extension mechanisms, modules and XS, specifically to avoid
119the need to keep changing the Perl interpreter. You can write modules
120that export functions, you can give those functions prototypes so they
121can be called like built-in functions, you can even write XS code to
122mess with the runtime data structures of the Perl interpreter if you
123want to implement really complicated things. If it can be done in a
124module instead of in the core, it's highly unlikely to be added.
125
126=item Is the feature generic enough?
127
128Is this something that only the submitter wants added to the language,
129or would it be broadly useful? Sometimes, instead of adding a feature
130with a tight focus, the porters might decide to wait until someone
131implements the more generalized feature. For instance, instead of
132implementing a "delayed evaluation" feature, the porters are waiting
133for a macro system that would permit delayed evaluation and much more.
134
135=item Does it potentially introduce new bugs?
136
137Radical rewrites of large chunks of the Perl interpreter have the
138potential to introduce new bugs. The smaller and more localized the
139change, the better.
140
141=item Does it preclude other desirable features?
142
143A patch is likely to be rejected if it closes off future avenues of
144development. For instance, a patch that placed a true and final
145interpretation on prototypes is likely to be rejected because there
146are still options for the future of prototypes that haven't been
147addressed.
148
149=item Is the implementation robust?
150
151Good patches (tight code, complete, correct) stand more chance of
152going in. Sloppy or incorrect patches might be placed on the back
153burner until the pumpking has time to fix, or might be discarded
154altogether without further notice.
155
156=item Is the implementation generic enough to be portable?
157
158The worst patches make use of a system-specific features. It's highly
159unlikely that nonportable additions to the Perl language will be
160accepted.
161
162=item Is the implementation tested?
163
164Patches which change behaviour (fixing bugs or introducing new features)
165must include regression tests to verify that everything works as expected.
166Without tests provided by the original author, how can anyone else changing
167perl in the future be sure that they haven't unwittingly broken the behaviour
168the patch implements? And without tests, how can the patch's author be
169confident that his/her hard work put into the patch won't be accidentally
170thrown away by someone in the future?
171
172=item Is there enough documentation?
173
174Patches without documentation are probably ill-thought out or
175incomplete. Nothing can be added without documentation, so submitting
176a patch for the appropriate manpages as well as the source code is
177always a good idea.
178
179=item Is there another way to do it?
180
181Larry said "Although the Perl Slogan is I<There's More Than One Way
182to Do It>, I hesitate to make 10 ways to do something". This is a
183tricky heuristic to navigate, though--one man's essential addition is
184another man's pointless cruft.
185
186=item Does it create too much work?
187
188Work for the pumpking, work for Perl programmers, work for module
189authors, ... Perl is supposed to be easy.
190
191=item Patches speak louder than words
192
193Working code is always preferred to pie-in-the-sky ideas. A patch to
194add a feature stands a much higher chance of making it to the language
195than does a random feature request, no matter how fervently argued the
196request might be. This ties into "Will it be useful?", as the fact
197that someone took the time to make the patch demonstrates a strong
198desire for the feature.
199
200=back
201
202If you're on the list, you might hear the word "core" bandied
203around. It refers to the standard distribution. "Hacking on the
204core" means you're changing the C source code to the Perl
205interpreter. "A core module" is one that ships with Perl.
206
207=head2 Keeping in sync
208
209The source code to the Perl interpreter, in its different versions, is
210kept in a repository managed by a revision control system ( which is
211currently the Perforce program, see http://perforce.com/ ). The
212pumpkings and a few others have access to the repository to check in
213changes. Periodically the pumpking for the development version of Perl
214will release a new version, so the rest of the porters can see what's
215changed. The current state of the main trunk of repository, and patches
216that describe the individual changes that have happened since the last
217public release are available at this location:
218
219 http://public.activestate.com/pub/apc/
220 ftp://public.activestate.com/pub/apc/
221
222If you're looking for a particular change, or a change that affected
223a particular set of files, you may find the B<Perl Repository Browser>
224useful:
225
226 http://public.activestate.com/cgi-bin/perlbrowse
227
228You may also want to subscribe to the perl5-changes mailing list to
229receive a copy of each patch that gets submitted to the maintenance
230and development "branches" of the perl repository. See
231http://lists.perl.org/ for subscription information.
232
233If you are a member of the perl5-porters mailing list, it is a good
234thing to keep in touch with the most recent changes. If not only to
235verify if what you would have posted as a bug report isn't already
236solved in the most recent available perl development branch, also
237known as perl-current, bleading edge perl, bleedperl or bleadperl.
238
239Needless to say, the source code in perl-current is usually in a perpetual
240state of evolution. You should expect it to be very buggy. Do B<not> use
241it for any purpose other than testing and development.
242
243Keeping in sync with the most recent branch can be done in several ways,
244but the most convenient and reliable way is using B<rsync>, available at
245ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent
246branch by FTP.)
247
248If you choose to keep in sync using rsync, there are two approaches
249to doing so:
250
251=over 4
252
253=item rsync'ing the source tree
254
255Presuming you are in the directory where your perl source resides
256and you have rsync installed and available, you can "upgrade" to
257the bleadperl using:
258
259 # rsync -avz rsync://public.activestate.com/perl-current/ .
260
261This takes care of updating every single item in the source tree to
262the latest applied patch level, creating files that are new (to your
263distribution) and setting date/time stamps of existing files to
264reflect the bleadperl status.
265
266Note that this will not delete any files that were in '.' before
267the rsync. Once you are sure that the rsync is running correctly,
268run it with the --delete and the --dry-run options like this:
269
270 # rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ .
271
272This will I<simulate> an rsync run that also deletes files not
273present in the bleadperl master copy. Observe the results from
274this run closely. If you are sure that the actual run would delete
275no files precious to you, you could remove the '--dry-run' option.
276
277You can than check what patch was the latest that was applied by
278looking in the file B<.patch>, which will show the number of the
279latest patch.
280
281If you have more than one machine to keep in sync, and not all of
282them have access to the WAN (so you are not able to rsync all the
283source trees to the real source), there are some ways to get around
284this problem.
285
286=over 4
287
288=item Using rsync over the LAN
289
290Set up a local rsync server which makes the rsynced source tree
291available to the LAN and sync the other machines against this
292directory.
293
294From http://rsync.samba.org/README.html :
295
296 "Rsync uses rsh or ssh for communication. It does not need to be
297 setuid and requires no special privileges for installation. It
298 does not require an inetd entry or a daemon. You must, however,
299 have a working rsh or ssh system. Using ssh is recommended for
300 its security features."
301
302=item Using pushing over the NFS
303
304Having the other systems mounted over the NFS, you can take an
305active pushing approach by checking the just updated tree against
306the other not-yet synced trees. An example would be
307
308 #!/usr/bin/perl -w
309
310 use strict;
311 use File::Copy;
312
313 my %MF = map {
314 m/(\S+)/;
315 $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
316 } `cat MANIFEST`;
317
318 my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
319
320 foreach my $host (keys %remote) {
321 unless (-d $remote{$host}) {
322 print STDERR "Cannot Xsync for host $host\n";
323 next;
324 }
325 foreach my $file (keys %MF) {
326 my $rfile = "$remote{$host}/$file";
327 my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
328 defined $size or ($mode, $size, $mtime) = (0, 0, 0);
329 $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
330 printf "%4s %-34s %8d %9d %8d %9d\n",
331 $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
332 unlink $rfile;
333 copy ($file, $rfile);
334 utime time, $MF{$file}[2], $rfile;
335 chmod $MF{$file}[0], $rfile;
336 }
337 }
338
339though this is not perfect. It could be improved with checking
340file checksums before updating. Not all NFS systems support
341reliable utime support (when used over the NFS).
342
343=back
344
345=item rsync'ing the patches
346
347The source tree is maintained by the pumpking who applies patches to
348the files in the tree. These patches are either created by the
349pumpking himself using C<diff -c> after updating the file manually or
350by applying patches sent in by posters on the perl5-porters list.
351These patches are also saved and rsync'able, so you can apply them
352yourself to the source files.
353
354Presuming you are in a directory where your patches reside, you can
355get them in sync with
356
357 # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
358
359This makes sure the latest available patch is downloaded to your
360patch directory.
361
362It's then up to you to apply these patches, using something like
363
364 # last=`ls -t *.gz | sed q`
365 # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
366 # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
367 # cd ../perl-current
368 # patch -p1 -N <../perl-current-diffs/blead.patch
369
370or, since this is only a hint towards how it works, use CPAN-patchaperl
371from Andreas König to have better control over the patching process.
372
373=back
374
375=head2 Why rsync the source tree
376
377=over 4
378
379=item It's easier to rsync the source tree
380
381Since you don't have to apply the patches yourself, you are sure all
382files in the source tree are in the right state.
383
384=item It's more reliable
385
386While both the rsync-able source and patch areas are automatically
387updated every few minutes, keep in mind that applying patches may
388sometimes mean careful hand-holding, especially if your version of
389the C<patch> program does not understand how to deal with new files,
390files with 8-bit characters, or files without trailing newlines.
391
392=back
393
394=head2 Why rsync the patches
395
396=over 4
397
398=item It's easier to rsync the patches
399
400If you have more than one machine that you want to keep in track with
401bleadperl, it's easier to rsync the patches only once and then apply
402them to all the source trees on the different machines.
403
404In case you try to keep in pace on 5 different machines, for which
405only one of them has access to the WAN, rsync'ing all the source
406trees should than be done 5 times over the NFS. Having
407rsync'ed the patches only once, I can apply them to all the source
408trees automatically. Need you say more ;-)
409
410=item It's a good reference
411
412If you do not only like to have the most recent development branch,
413but also like to B<fix> bugs, or extend features, you want to dive
414into the sources. If you are a seasoned perl core diver, you don't
415need no manuals, tips, roadmaps, perlguts.pod or other aids to find
416your way around. But if you are a starter, the patches may help you
417in finding where you should start and how to change the bits that
418bug you.
419
420The file B<Changes> is updated on occasions the pumpking sees as his
421own little sync points. On those occasions, he releases a tar-ball of
422the current source tree (i.e. [email protected]), which will be an
423excellent point to start with when choosing to use the 'rsync the
424patches' scheme. Starting with perl@7582, which means a set of source
425files on which the latest applied patch is number 7582, you apply all
426succeeding patches available from then on (7583, 7584, ...).
427
428You can use the patches later as a kind of search archive.
429
430=over 4
431
432=item Finding a start point
433
434If you want to fix/change the behaviour of function/feature Foo, just
435scan the patches for patches that mention Foo either in the subject,
436the comments, or the body of the fix. A good chance the patch shows
437you the files that are affected by that patch which are very likely
438to be the starting point of your journey into the guts of perl.
439
440=item Finding how to fix a bug
441
442If you've found I<where> the function/feature Foo misbehaves, but you
443don't know how to fix it (but you do know the change you want to
444make), you can, again, peruse the patches for similar changes and
445look how others apply the fix.
446
447=item Finding the source of misbehaviour
448
449When you keep in sync with bleadperl, the pumpking would love to
450I<see> that the community efforts really work. So after each of his
451sync points, you are to 'make test' to check if everything is still
452in working order. If it is, you do 'make ok', which will send an OK
453report to [email protected]. (If you do not have access to a mailer
454from the system you just finished successfully 'make test', you can
455do 'make okfile', which creates the file C<perl.ok>, which you can
456than take to your favourite mailer and mail yourself).
457
458But of course, as always, things will not always lead to a success
459path, and one or more test do not pass the 'make test'. Before
460sending in a bug report (using 'make nok' or 'make nokfile'), check
461the mailing list if someone else has reported the bug already and if
462so, confirm it by replying to that message. If not, you might want to
463trace the source of that misbehaviour B<before> sending in the bug,
464which will help all the other porters in finding the solution.
465
466Here the saved patches come in very handy. You can check the list of
467patches to see which patch changed what file and what change caused
468the misbehaviour. If you note that in the bug report, it saves the
469one trying to solve it, looking for that point.
470
471=back
472
473If searching the patches is too bothersome, you might consider using
474perl's bugtron to find more information about discussions and
475ramblings on posted bugs.
476
477If you want to get the best of both worlds, rsync both the source
478tree for convenience, reliability and ease and rsync the patches
479for reference.
480
481=back
482
483=head2 Working with the source
484
485Because you cannot use the Perforce client, you cannot easily generate
486diffs against the repository, nor will merges occur when you update
487via rsync. If you edit a file locally and then rsync against the
488latest source, changes made in the remote copy will I<overwrite> your
489local versions!
490
491The best way to deal with this is to maintain a tree of symlinks to
492the rsync'd source. Then, when you want to edit a file, you remove
493the symlink, copy the real file into the other tree, and edit it. You
494can then diff your edited file against the original to generate a
495patch, and you can safely update the original tree.
496
497Perl's F<Configure> script can generate this tree of symlinks for you.
498The following example assumes that you have used rsync to pull a copy
499of the Perl source into the F<perl-rsync> directory. In the directory
500above that one, you can execute the following commands:
501
502 mkdir perl-dev
503 cd perl-dev
504 ../perl-rsync/Configure -Dmksymlinks -Dusedevel -D"optimize=-g"
505
506This will start the Perl configuration process. After a few prompts,
507you should see something like this:
508
509 Symbolic links are supported.
510
511 Checking how to test for symbolic links...
512 Your builtin 'test -h' may be broken.
513 Trying external '/usr/bin/test -h'.
514 You can test for symbolic links with '/usr/bin/test -h'.
515
516 Creating the symbolic links...
517 (First creating the subdirectories...)
518 (Then creating the symlinks...)
519
520The specifics may vary based on your operating system, of course.
521After you see this, you can abort the F<Configure> script, and you
522will see that the directory you are in has a tree of symlinks to the
523F<perl-rsync> directories and files.
524
525If you plan to do a lot of work with the Perl source, here are some
526Bourne shell script functions that can make your life easier:
527
528 function edit {
529 if [ -L $1 ]; then
530 mv $1 $1.orig
531 cp $1.orig $1
532 vi $1
533 else
534 /bin/vi $1
535 fi
536 }
537
538 function unedit {
539 if [ -L $1.orig ]; then
540 rm $1
541 mv $1.orig $1
542 fi
543 }
544
545Replace "vi" with your favorite flavor of editor.
546
547Here is another function which will quickly generate a patch for the
548files which have been edited in your symlink tree:
549
550 mkpatchorig() {
551 local diffopts
552 for f in `find . -name '*.orig' | sed s,^\./,,`
553 do
554 case `echo $f | sed 's,.orig$,,;s,.*\.,,'` in
555 c) diffopts=-p ;;
556 pod) diffopts='-F^=' ;;
557 *) diffopts= ;;
558 esac
559 diff -du $diffopts $f `echo $f | sed 's,.orig$,,'`
560 done
561 }
562
563This function produces patches which include enough context to make
564your changes obvious. This makes it easier for the Perl pumpking(s)
565to review them when you send them to the perl5-porters list, and that
566means they're more likely to get applied.
567
568This function assumed a GNU diff, and may require some tweaking for
569other diff variants.
570
571=head2 Perlbug administration
572
573There is a single remote administrative interface for modifying bug status,
574category, open issues etc. using the B<RT> I<bugtracker> system, maintained
575by I<Robert Spier>. Become an administrator, and close any bugs you can get
576your sticky mitts on:
577
578 http://rt.perl.org
579
580The bugtracker mechanism for B<perl5> bugs in particular is at:
581
582 http://bugs6.perl.org/perlbug
583
584To email the bug system administrators:
585
586 "perlbug-admin" <[email protected]>
587
588
589=head2 Submitting patches
590
591Always submit patches to I<[email protected]>. If you're
592patching a core module and there's an author listed, send the author a
593copy (see L<Patching a core module>). This lets other porters review
594your patch, which catches a surprising number of errors in patches.
595Either use the diff program (available in source code form from
596ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch>
597(available from I<CPAN/authors/id/JV/>). Unified diffs are preferred,
598but context diffs are accepted. Do not send RCS-style diffs or diffs
599without context lines. More information is given in the
600I<Porting/patching.pod> file in the Perl source distribution. Please
601patch against the latest B<development> version (e.g., if you're
602fixing a bug in the 5.005 track, patch against the latest 5.005_5x
603version). Only patches that survive the heat of the development
604branch get applied to maintenance versions.
605
606Your patch should update the documentation and test suite. See
607L<Writing a test>.
608
609To report a bug in Perl, use the program I<perlbug> which comes with
610Perl (if you can't get Perl to work, send mail to the address
611I<[email protected]> or I<[email protected]>). Reporting bugs through
612I<perlbug> feeds into the automated bug-tracking system, access to
613which is provided through the web at http://bugs.perl.org/ . It
614often pays to check the archives of the perl5-porters mailing list to
615see whether the bug you're reporting has been reported before, and if
616so whether it was considered a bug. See above for the location of
617the searchable archives.
618
619The CPAN testers ( http://testers.cpan.org/ ) are a group of
620volunteers who test CPAN modules on a variety of platforms. Perl
621Smokers ( http://archives.develooper.com/[email protected]/ )
622automatically tests Perl source releases on platforms with various
623configurations. Both efforts welcome volunteers.
624
625It's a good idea to read and lurk for a while before chipping in.
626That way you'll get to see the dynamic of the conversations, learn the
627personalities of the players, and hopefully be better prepared to make
628a useful contribution when do you speak up.
629
630If after all this you still think you want to join the perl5-porters
631mailing list, send mail to I<[email protected]>. To
632unsubscribe, send mail to I<[email protected]>.
633
634To hack on the Perl guts, you'll need to read the following things:
635
636=over 3
637
638=item L<perlguts>
639
640This is of paramount importance, since it's the documentation of what
641goes where in the Perl source. Read it over a couple of times and it
642might start to make sense - don't worry if it doesn't yet, because the
643best way to study it is to read it in conjunction with poking at Perl
644source, and we'll do that later on.
645
646You might also want to look at Gisle Aas's illustrated perlguts -
647there's no guarantee that this will be absolutely up-to-date with the
648latest documentation in the Perl core, but the fundamentals will be
649right. ( http://gisle.aas.no/perl/illguts/ )
650
651=item L<perlxstut> and L<perlxs>
652
653A working knowledge of XSUB programming is incredibly useful for core
654hacking; XSUBs use techniques drawn from the PP code, the portion of the
655guts that actually executes a Perl program. It's a lot gentler to learn
656those techniques from simple examples and explanation than from the core
657itself.
658
659=item L<perlapi>
660
661The documentation for the Perl API explains what some of the internal
662functions do, as well as the many macros used in the source.
663
664=item F<Porting/pumpkin.pod>
665
666This is a collection of words of wisdom for a Perl porter; some of it is
667only useful to the pumpkin holder, but most of it applies to anyone
668wanting to go about Perl development.
669
670=item The perl5-porters FAQ
671
672This should be available from http://simon-cozens.org/writings/p5p-faq ;
673alternatively, you can get the FAQ emailed to you by sending mail to
674C<[email protected]>. It contains hints on reading perl5-porters,
675information on how perl5-porters works and how Perl development in general
676works.
677
678=back
679
680=head2 Finding Your Way Around
681
682Perl maintenance can be split into a number of areas, and certain people
683(pumpkins) will have responsibility for each area. These areas sometimes
684correspond to files or directories in the source kit. Among the areas are:
685
686=over 3
687
688=item Core modules
689
690Modules shipped as part of the Perl core live in the F<lib/> and F<ext/>
691subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
692contains the core XS modules.
693
694=item Tests
695
696There are tests for nearly all the modules, built-ins and major bits
697of functionality. Test files all have a .t suffix. Module tests live
698in the F<lib/> and F<ext/> directories next to the module being
699tested. Others live in F<t/>. See L<Writing a test>
700
701=item Documentation
702
703Documentation maintenance includes looking after everything in the
704F<pod/> directory, (as well as contributing new documentation) and
705the documentation to the modules in core.
706
707=item Configure
708
709The configure process is the way we make Perl portable across the
710myriad of operating systems it supports. Responsibility for the
711configure, build and installation process, as well as the overall
712portability of the core code rests with the configure pumpkin - others
713help out with individual operating systems.
714
715The files involved are the operating system directories, (F<win32/>,
716F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
717and F<Makefile>, as well as the metaconfig files which generate
718F<Configure>. (metaconfig isn't included in the core distribution.)
719
720=item Interpreter
721
722And of course, there's the core of the Perl interpreter itself. Let's
723have a look at that in a little more detail.
724
725=back
726
727Before we leave looking at the layout, though, don't forget that
728F<MANIFEST> contains not only the file names in the Perl distribution,
729but short descriptions of what's in them, too. For an overview of the
730important files, try this:
731
732 perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
733
734=head2 Elements of the interpreter
735
736The work of the interpreter has two main stages: compiling the code
737into the internal representation, or bytecode, and then executing it.
738L<perlguts/Compiled code> explains exactly how the compilation stage
739happens.
740
741Here is a short breakdown of perl's operation:
742
743=over 3
744
745=item Startup
746
747The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
748This is very high-level code, enough to fit on a single screen, and it
749resembles the code found in L<perlembed>; most of the real action takes
750place in F<perl.c>
751
752First, F<perlmain.c> allocates some memory and constructs a Perl
753interpreter:
754
755 1 PERL_SYS_INIT3(&argc,&argv,&env);
756 2
757 3 if (!PL_do_undump) {
758 4 my_perl = perl_alloc();
759 5 if (!my_perl)
760 6 exit(1);
761 7 perl_construct(my_perl);
762 8 PL_perl_destruct_level = 0;
763 9 }
764
765Line 1 is a macro, and its definition is dependent on your operating
766system. Line 3 references C<PL_do_undump>, a global variable - all
767global variables in Perl start with C<PL_>. This tells you whether the
768current running program was created with the C<-u> flag to perl and then
769F<undump>, which means it's going to be false in any sane context.
770
771Line 4 calls a function in F<perl.c> to allocate memory for a Perl
772interpreter. It's quite a simple function, and the guts of it looks like
773this:
774
775 my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
776
777Here you see an example of Perl's system abstraction, which we'll see
778later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
779own C<malloc> as defined in F<malloc.c> if you selected that option at
780configure time.
781
782Next, in line 7, we construct the interpreter; this sets up all the
783special variables that Perl needs, the stacks, and so on.
784
785Now we pass Perl the command line options, and tell it to go:
786
787 exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
788 if (!exitstatus) {
789 exitstatus = perl_run(my_perl);
790 }
791
792
793C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
794in F<perl.c>, which processes the command line options, sets up any
795statically linked XS modules, opens the program and calls C<yyparse> to
796parse it.
797
798=item Parsing
799
800The aim of this stage is to take the Perl source, and turn it into an op
801tree. We'll see what one of those looks like later. Strictly speaking,
802there's three things going on here.
803
804C<yyparse>, the parser, lives in F<perly.c>, although you're better off
805reading the original YACC input in F<perly.y>. (Yes, Virginia, there
806B<is> a YACC grammar for Perl!) The job of the parser is to take your
807code and "understand" it, splitting it into sentences, deciding which
808operands go with which operators and so on.
809
810The parser is nobly assisted by the lexer, which chunks up your input
811into tokens, and decides what type of thing each token is: a variable
812name, an operator, a bareword, a subroutine, a core function, and so on.
813The main point of entry to the lexer is C<yylex>, and that and its
814associated routines can be found in F<toke.c>. Perl isn't much like
815other computer languages; it's highly context sensitive at times, it can
816be tricky to work out what sort of token something is, or where a token
817ends. As such, there's a lot of interplay between the tokeniser and the
818parser, which can get pretty frightening if you're not used to it.
819
820As the parser understands a Perl program, it builds up a tree of
821operations for the interpreter to perform during execution. The routines
822which construct and link together the various operations are to be found
823in F<op.c>, and will be examined later.
824
825=item Optimization
826
827Now the parsing stage is complete, and the finished tree represents
828the operations that the Perl interpreter needs to perform to execute our
829program. Next, Perl does a dry run over the tree looking for
830optimisations: constant expressions such as C<3 + 4> will be computed
831now, and the optimizer will also see if any multiple operations can be
832replaced with a single one. For instance, to fetch the variable C<$foo>,
833instead of grabbing the glob C<*foo> and looking at the scalar
834component, the optimizer fiddles the op tree to use a function which
835directly looks up the scalar in question. The main optimizer is C<peep>
836in F<op.c>, and many ops have their own optimizing functions.
837
838=item Running
839
840Now we're finally ready to go: we have compiled Perl byte code, and all
841that's left to do is run it. The actual execution is done by the
842C<runops_standard> function in F<run.c>; more specifically, it's done by
843these three innocent looking lines:
844
845 while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
846 PERL_ASYNC_CHECK();
847 }
848
849You may be more comfortable with the Perl version of that:
850
851 PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
852
853Well, maybe not. Anyway, each op contains a function pointer, which
854stipulates the function which will actually carry out the operation.
855This function will return the next op in the sequence - this allows for
856things like C<if> which choose the next op dynamically at run time.
857The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
858execution if required.
859
860The actual functions called are known as PP code, and they're spread
861between four files: F<pp_hot.c> contains the "hot" code, which is most
862often used and highly optimized, F<pp_sys.c> contains all the
863system-specific functions, F<pp_ctl.c> contains the functions which
864implement control structures (C<if>, C<while> and the like) and F<pp.c>
865contains everything else. These are, if you like, the C code for Perl's
866built-in functions and operators.
867
868Note that each C<pp_> function is expected to return a pointer to the next
869op. Calls to perl subs (and eval blocks) are handled within the same
870runops loop, and do not consume extra space on the C stack. For example,
871C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> block
872struct onto the context stack which contain the address of the op
873following the sub call or eval. They then return the first op of that sub
874or eval block, and so execution continues of that sub or block. Later, a
875C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>,
876retrieves the return op from it, and returns it.