Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

perlhack.pod@ 3397

Visit:

Last change on this file since 3397 was 3181, checked in by bird, 19 years ago
perl 5.8.8
File size: 99.8 KB

Line
1	=head1 NAME
2
3	perlhack - How to hack at the Perl internals
4
5	=head1 DESCRIPTION
6
7	This document attempts to explain how Perl development takes place,
8	and ends with some suggestions for people wanting to become bona fide
9	porters.
10
11	The perl5-porters mailing list is where the Perl standard distribution
12	is maintained and developed. The list can get anywhere from 10 to 150
13	messages a day, depending on the heatedness of the debate. Most days
14	there are two or three patches, extensions, features, or bugs being
15	discussed at a time.
16
17	A searchable archive of the list is at either:
18
19	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
20
21	or
22
23	http://archive.develooper.com/[email protected]/
24
25	List subscribers (the porters themselves) come in several flavours.
26	Some are quiet curious lurkers, who rarely pitch in and instead watch
27	the ongoing development to ensure they're forewarned of new changes or
28	features in Perl. Some are representatives of vendors, who are there
29	to make sure that Perl continues to compile and work on their
30	platforms. Some patch any reported bug that they know how to fix,
31	some are actively patching their pet area (threads, Win32, the regexp
32	engine), while others seem to do nothing but complain. In other
33	words, it's your usual mix of technical people.
34
35	Over this group of porters presides Larry Wall. He has the final word
36	in what does and does not change in the Perl language. Various
37	releases of Perl are shepherded by a "pumpking", a porter
38	responsible for gathering patches, deciding on a patch-by-patch,
39	feature-by-feature basis what will and will not go into the release.
40	For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
41	Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and
42	Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release.
43
44	In addition, various people are pumpkings for different things. For
45	instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
46	I<Configure> pumpkin up till the 5.8 release. For the 5.10 release
47	H.Merijn Brand took over.
48
49	Larry sees Perl development along the lines of the US government:
50	there's the Legislature (the porters), the Executive branch (the
51	pumpkings), and the Supreme Court (Larry). The legislature can
52	discuss and submit patches to the executive branch all they like, but
53	the executive branch is free to veto them. Rarely, the Supreme Court
54	will side with the executive branch over the legislature, or the
55	legislature over the executive branch. Mostly, however, the
56	legislature and the executive branch are supposed to get along and
57	work out their differences without impeachment or court cases.
58
59	You might sometimes see reference to Rule 1 and Rule 2. Larry's power
60	as Supreme Court is expressed in The Rules:
61
62	=over 4
63
64	=item 1
65
66	Larry is always by definition right about how Perl should behave.
67	This means he has final veto power on the core functionality.
68
69	=item 2
70
71	Larry is allowed to change his mind about any matter at a later date,
72	regardless of whether he previously invoked Rule 1.
73
74	=back
75
76	Got that? Larry is always right, even when he was wrong. It's rare
77	to see either Rule exercised, but they are often alluded to.
78
79	New features and extensions to the language are contentious, because
80	the criteria used by the pumpkings, Larry, and other porters to decide
81	which features should be implemented and incorporated are not codified
82	in a few small design goals as with some other languages. Instead,
83	the heuristics are flexible and often difficult to fathom. Here is
84	one person's list, roughly in decreasing order of importance, of
85	heuristics that new features have to be weighed against:
86
87	=over 4
88
89	=item Does concept match the general goals of Perl?
90
91	These haven't been written anywhere in stone, but one approximation
92	is:
93
94	1. Keep it fast, simple, and useful.
95	2. Keep features/concepts as orthogonal as possible.
96	3. No arbitrary limits (platforms, data sizes, cultures).
97	4. Keep it open and exciting to use/patch/advocate Perl everywhere.
98	5. Either assimilate new technologies, or build bridges to them.
99
100	=item Where is the implementation?
101
102	All the talk in the world is useless without an implementation. In
103	almost every case, the person or people who argue for a new feature
104	will be expected to be the ones who implement it. Porters capable
105	of coding new features have their own agendas, and are not available
106	to implement your (possibly good) idea.
107
108	=item Backwards compatibility
109
110	It's a cardinal sin to break existing Perl programs. New warnings are
111	contentious--some say that a program that emits warnings is not
112	broken, while others say it is. Adding keywords has the potential to
113	break programs, changing the meaning of existing token sequences or
114	functions might break programs.
115
116	=item Could it be a module instead?
117
118	Perl 5 has extension mechanisms, modules and XS, specifically to avoid
119	the need to keep changing the Perl interpreter. You can write modules
120	that export functions, you can give those functions prototypes so they
121	can be called like built-in functions, you can even write XS code to
122	mess with the runtime data structures of the Perl interpreter if you
123	want to implement really complicated things. If it can be done in a
124	module instead of in the core, it's highly unlikely to be added.
125
126	=item Is the feature generic enough?
127
128	Is this something that only the submitter wants added to the language,
129	or would it be broadly useful? Sometimes, instead of adding a feature
130	with a tight focus, the porters might decide to wait until someone
131	implements the more generalized feature. For instance, instead of
132	implementing a "delayed evaluation" feature, the porters are waiting
133	for a macro system that would permit delayed evaluation and much more.
134
135	=item Does it potentially introduce new bugs?
136
137	Radical rewrites of large chunks of the Perl interpreter have the
138	potential to introduce new bugs. The smaller and more localized the
139	change, the better.
140
141	=item Does it preclude other desirable features?
142
143	A patch is likely to be rejected if it closes off future avenues of
144	development. For instance, a patch that placed a true and final
145	interpretation on prototypes is likely to be rejected because there
146	are still options for the future of prototypes that haven't been
147	addressed.
148
149	=item Is the implementation robust?
150
151	Good patches (tight code, complete, correct) stand more chance of
152	going in. Sloppy or incorrect patches might be placed on the back
153	burner until the pumpking has time to fix, or might be discarded
154	altogether without further notice.
155
156	=item Is the implementation generic enough to be portable?
157
158	The worst patches make use of a system-specific features. It's highly
159	unlikely that nonportable additions to the Perl language will be
160	accepted.
161
162	=item Is the implementation tested?
163
164	Patches which change behaviour (fixing bugs or introducing new features)
165	must include regression tests to verify that everything works as expected.
166	Without tests provided by the original author, how can anyone else changing
167	perl in the future be sure that they haven't unwittingly broken the behaviour
168	the patch implements? And without tests, how can the patch's author be
169	confident that his/her hard work put into the patch won't be accidentally
170	thrown away by someone in the future?
171
172	=item Is there enough documentation?
173
174	Patches without documentation are probably ill-thought out or
175	incomplete. Nothing can be added without documentation, so submitting
176	a patch for the appropriate manpages as well as the source code is
177	always a good idea.
178
179	=item Is there another way to do it?
180
181	Larry said "Although the Perl Slogan is I<There's More Than One Way
182	to Do It>, I hesitate to make 10 ways to do something". This is a
183	tricky heuristic to navigate, though--one man's essential addition is
184	another man's pointless cruft.
185
186	=item Does it create too much work?
187
188	Work for the pumpking, work for Perl programmers, work for module
189	authors, ... Perl is supposed to be easy.
190
191	=item Patches speak louder than words
192
193	Working code is always preferred to pie-in-the-sky ideas. A patch to
194	add a feature stands a much higher chance of making it to the language
195	than does a random feature request, no matter how fervently argued the
196	request might be. This ties into "Will it be useful?", as the fact
197	that someone took the time to make the patch demonstrates a strong
198	desire for the feature.
199
200	=back
201
202	If you're on the list, you might hear the word "core" bandied
203	around. It refers to the standard distribution. "Hacking on the
204	core" means you're changing the C source code to the Perl
205	interpreter. "A core module" is one that ships with Perl.
206
207	=head2 Keeping in sync
208
209	The source code to the Perl interpreter, in its different versions, is
210	kept in a repository managed by a revision control system ( which is
211	currently the Perforce program, see http://perforce.com/ ). The
212	pumpkings and a few others have access to the repository to check in
213	changes. Periodically the pumpking for the development version of Perl
214	will release a new version, so the rest of the porters can see what's
215	changed. The current state of the main trunk of repository, and patches
216	that describe the individual changes that have happened since the last
217	public release are available at this location:
218
219	http://public.activestate.com/pub/apc/
220	ftp://public.activestate.com/pub/apc/
221
222	If you're looking for a particular change, or a change that affected
223	a particular set of files, you may find the B<Perl Repository Browser>
224	useful:
225
226	http://public.activestate.com/cgi-bin/perlbrowse
227
228	You may also want to subscribe to the perl5-changes mailing list to
229	receive a copy of each patch that gets submitted to the maintenance
230	and development "branches" of the perl repository. See
231	http://lists.perl.org/ for subscription information.
232
233	If you are a member of the perl5-porters mailing list, it is a good
234	thing to keep in touch with the most recent changes. If not only to
235	verify if what you would have posted as a bug report isn't already
236	solved in the most recent available perl development branch, also
237	known as perl-current, bleading edge perl, bleedperl or bleadperl.
238
239	Needless to say, the source code in perl-current is usually in a perpetual
240	state of evolution. You should expect it to be very buggy. Do B<not> use
241	it for any purpose other than testing and development.
242
243	Keeping in sync with the most recent branch can be done in several ways,
244	but the most convenient and reliable way is using B<rsync>, available at
245	ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent
246	branch by FTP.)
247
248	If you choose to keep in sync using rsync, there are two approaches
249	to doing so:
250
251	=over 4
252
253	=item rsync'ing the source tree
254
255	Presuming you are in the directory where your perl source resides
256	and you have rsync installed and available, you can "upgrade" to
257	the bleadperl using:
258
259	# rsync -avz rsync://public.activestate.com/perl-current/ .
260
261	This takes care of updating every single item in the source tree to
262	the latest applied patch level, creating files that are new (to your
263	distribution) and setting date/time stamps of existing files to
264	reflect the bleadperl status.
265
266	Note that this will not delete any files that were in '.' before
267	the rsync. Once you are sure that the rsync is running correctly,
268	run it with the --delete and the --dry-run options like this:
269
270	# rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ .
271
272	This will I<simulate> an rsync run that also deletes files not
273	present in the bleadperl master copy. Observe the results from
274	this run closely. If you are sure that the actual run would delete
275	no files precious to you, you could remove the '--dry-run' option.
276
277	You can than check what patch was the latest that was applied by
278	looking in the file B<.patch>, which will show the number of the
279	latest patch.
280
281	If you have more than one machine to keep in sync, and not all of
282	them have access to the WAN (so you are not able to rsync all the
283	source trees to the real source), there are some ways to get around
284	this problem.
285
286	=over 4
287
288	=item Using rsync over the LAN
289
290	Set up a local rsync server which makes the rsynced source tree
291	available to the LAN and sync the other machines against this
292	directory.
293
294	From http://rsync.samba.org/README.html :
295
296	"Rsync uses rsh or ssh for communication. It does not need to be
297	setuid and requires no special privileges for installation. It
298	does not require an inetd entry or a daemon. You must, however,
299	have a working rsh or ssh system. Using ssh is recommended for
300	its security features."
301
302	=item Using pushing over the NFS
303
304	Having the other systems mounted over the NFS, you can take an
305	active pushing approach by checking the just updated tree against
306	the other not-yet synced trees. An example would be
307
308	#!/usr/bin/perl -w
309
310	use strict;
311	use File::Copy;
312
313	my %MF = map {
314	m/(\S+)/;
315	$1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
316	} `cat MANIFEST`;
317
318	my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
319
320	foreach my $host (keys %remote) {
321	unless (-d $remote{$host}) {
322	print STDERR "Cannot Xsync for host $host\n";
323	next;
324	}
325	foreach my $file (keys %MF) {
326	my $rfile = "$remote{$host}/$file";
327	my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
328	defined $size or ($mode, $size, $mtime) = (0, 0, 0);
329	$size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
330	printf "%4s %-34s %8d %9d %8d %9d\n",
331	$host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
332	unlink $rfile;
333	copy ($file, $rfile);
334	utime time, $MF{$file}[2], $rfile;
335	chmod $MF{$file}[0], $rfile;
336	}
337	}
338
339	though this is not perfect. It could be improved with checking
340	file checksums before updating. Not all NFS systems support
341	reliable utime support (when used over the NFS).
342
343	=back
344
345	=item rsync'ing the patches
346
347	The source tree is maintained by the pumpking who applies patches to
348	the files in the tree. These patches are either created by the
349	pumpking himself using C<diff -c> after updating the file manually or
350	by applying patches sent in by posters on the perl5-porters list.
351	These patches are also saved and rsync'able, so you can apply them
352	yourself to the source files.
353
354	Presuming you are in a directory where your patches reside, you can
355	get them in sync with
356
357	# rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
358
359	This makes sure the latest available patch is downloaded to your
360	patch directory.
361
362	It's then up to you to apply these patches, using something like
363
364	# last=`ls -t *.gz \| sed q`
365	# rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
366	# find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
367	# cd ../perl-current
368	# patch -p1 -N <../perl-current-diffs/blead.patch
369
370	or, since this is only a hint towards how it works, use CPAN-patchaperl
371	from Andreas König to have better control over the patching process.
372
373	=back
374
375	=head2 Why rsync the source tree
376
377	=over 4
378
379	=item It's easier to rsync the source tree
380
381	Since you don't have to apply the patches yourself, you are sure all
382	files in the source tree are in the right state.
383
384	=item It's more reliable
385
386	While both the rsync-able source and patch areas are automatically
387	updated every few minutes, keep in mind that applying patches may
388	sometimes mean careful hand-holding, especially if your version of
389	the C<patch> program does not understand how to deal with new files,
390	files with 8-bit characters, or files without trailing newlines.
391
392	=back
393
394	=head2 Why rsync the patches
395
396	=over 4
397
398	=item It's easier to rsync the patches
399
400	If you have more than one machine that you want to keep in track with
401	bleadperl, it's easier to rsync the patches only once and then apply
402	them to all the source trees on the different machines.
403
404	In case you try to keep in pace on 5 different machines, for which
405	only one of them has access to the WAN, rsync'ing all the source
406	trees should than be done 5 times over the NFS. Having
407	rsync'ed the patches only once, I can apply them to all the source
408	trees automatically. Need you say more ;-)
409
410	=item It's a good reference
411
412	If you do not only like to have the most recent development branch,
413	but also like to B<fix> bugs, or extend features, you want to dive
414	into the sources. If you are a seasoned perl core diver, you don't
415	need no manuals, tips, roadmaps, perlguts.pod or other aids to find
416	your way around. But if you are a starter, the patches may help you
417	in finding where you should start and how to change the bits that
418	bug you.
419
420	The file B<Changes> is updated on occasions the pumpking sees as his
421	own little sync points. On those occasions, he releases a tar-ball of
422	the current source tree (i.e. [email protected]), which will be an
423	excellent point to start with when choosing to use the 'rsync the
424	patches' scheme. Starting with perl@7582, which means a set of source
425	files on which the latest applied patch is number 7582, you apply all
426	succeeding patches available from then on (7583, 7584, ...).
427
428	You can use the patches later as a kind of search archive.
429
430	=over 4
431
432	=item Finding a start point
433
434	If you want to fix/change the behaviour of function/feature Foo, just
435	scan the patches for patches that mention Foo either in the subject,
436	the comments, or the body of the fix. A good chance the patch shows
437	you the files that are affected by that patch which are very likely
438	to be the starting point of your journey into the guts of perl.
439
440	=item Finding how to fix a bug
441
442	If you've found I<where> the function/feature Foo misbehaves, but you
443	don't know how to fix it (but you do know the change you want to
444	make), you can, again, peruse the patches for similar changes and
445	look how others apply the fix.
446
447	=item Finding the source of misbehaviour
448
449	When you keep in sync with bleadperl, the pumpking would love to
450	I<see> that the community efforts really work. So after each of his
451	sync points, you are to 'make test' to check if everything is still
452	in working order. If it is, you do 'make ok', which will send an OK
453	report to [email protected]. (If you do not have access to a mailer
454	from the system you just finished successfully 'make test', you can
455	do 'make okfile', which creates the file C<perl.ok>, which you can
456	than take to your favourite mailer and mail yourself).
457
458	But of course, as always, things will not always lead to a success
459	path, and one or more test do not pass the 'make test'. Before
460	sending in a bug report (using 'make nok' or 'make nokfile'), check
461	the mailing list if someone else has reported the bug already and if
462	so, confirm it by replying to that message. If not, you might want to
463	trace the source of that misbehaviour B<before> sending in the bug,
464	which will help all the other porters in finding the solution.
465
466	Here the saved patches come in very handy. You can check the list of
467	patches to see which patch changed what file and what change caused
468	the misbehaviour. If you note that in the bug report, it saves the
469	one trying to solve it, looking for that point.
470
471	=back
472
473	If searching the patches is too bothersome, you might consider using
474	perl's bugtron to find more information about discussions and
475	ramblings on posted bugs.
476
477	If you want to get the best of both worlds, rsync both the source
478	tree for convenience, reliability and ease and rsync the patches
479	for reference.
480
481	=back
482
483	=head2 Working with the source
484
485	Because you cannot use the Perforce client, you cannot easily generate
486	diffs against the repository, nor will merges occur when you update
487	via rsync. If you edit a file locally and then rsync against the
488	latest source, changes made in the remote copy will I<overwrite> your
489	local versions!
490
491	The best way to deal with this is to maintain a tree of symlinks to
492	the rsync'd source. Then, when you want to edit a file, you remove
493	the symlink, copy the real file into the other tree, and edit it. You
494	can then diff your edited file against the original to generate a
495	patch, and you can safely update the original tree.
496
497	Perl's F<Configure> script can generate this tree of symlinks for you.
498	The following example assumes that you have used rsync to pull a copy
499	of the Perl source into the F<perl-rsync> directory. In the directory
500	above that one, you can execute the following commands:
501
502	mkdir perl-dev
503	cd perl-dev
504	../perl-rsync/Configure -Dmksymlinks -Dusedevel -D"optimize=-g"
505
506	This will start the Perl configuration process. After a few prompts,
507	you should see something like this:
508
509	Symbolic links are supported.
510
511	Checking how to test for symbolic links...
512	Your builtin 'test -h' may be broken.
513	Trying external '/usr/bin/test -h'.
514	You can test for symbolic links with '/usr/bin/test -h'.
515
516	Creating the symbolic links...
517	(First creating the subdirectories...)
518	(Then creating the symlinks...)
519
520	The specifics may vary based on your operating system, of course.
521	After you see this, you can abort the F<Configure> script, and you
522	will see that the directory you are in has a tree of symlinks to the
523	F<perl-rsync> directories and files.
524
525	If you plan to do a lot of work with the Perl source, here are some
526	Bourne shell script functions that can make your life easier:
527
528	function edit {
529	if [ -L $1 ]; then
530	mv $1 $1.orig
531	cp $1.orig $1
532	vi $1
533	else
534	/bin/vi $1
535	fi
536	}
537
538	function unedit {
539	if [ -L $1.orig ]; then
540	rm $1
541	mv $1.orig $1
542	fi
543	}
544
545	Replace "vi" with your favorite flavor of editor.
546
547	Here is another function which will quickly generate a patch for the
548	files which have been edited in your symlink tree:
549
550	mkpatchorig() {
551	local diffopts
552	for f in `find . -name '*.orig' \| sed s,^\./,,`
553	do
554	case `echo $f \| sed 's,.orig$,,;s,.*\.,,'` in
555	c) diffopts=-p ;;
556	pod) diffopts='-F^=' ;;
557	*) diffopts= ;;
558	esac
559	diff -du $diffopts $f `echo $f \| sed 's,.orig$,,'`
560	done
561	}
562
563	This function produces patches which include enough context to make
564	your changes obvious. This makes it easier for the Perl pumpking(s)
565	to review them when you send them to the perl5-porters list, and that
566	means they're more likely to get applied.
567
568	This function assumed a GNU diff, and may require some tweaking for
569	other diff variants.
570
571	=head2 Perlbug administration
572
573	There is a single remote administrative interface for modifying bug status,
574	category, open issues etc. using the B<RT> I<bugtracker> system, maintained
575	by I<Robert Spier>. Become an administrator, and close any bugs you can get
576	your sticky mitts on:
577
578	http://rt.perl.org
579
580	The bugtracker mechanism for B<perl5> bugs in particular is at:
581
582	http://bugs6.perl.org/perlbug
583
584	To email the bug system administrators:
585
586	"perlbug-admin" <[email protected]>
587
588
589	=head2 Submitting patches
590
591	Always submit patches to I<[email protected]>. If you're
592	patching a core module and there's an author listed, send the author a
593	copy (see L<Patching a core module>). This lets other porters review
594	your patch, which catches a surprising number of errors in patches.
595	Either use the diff program (available in source code form from
596	ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch>
597	(available from I<CPAN/authors/id/JV/>). Unified diffs are preferred,
598	but context diffs are accepted. Do not send RCS-style diffs or diffs
599	without context lines. More information is given in the
600	I<Porting/patching.pod> file in the Perl source distribution. Please
601	patch against the latest B<development> version (e.g., if you're
602	fixing a bug in the 5.005 track, patch against the latest 5.005_5x
603	version). Only patches that survive the heat of the development
604	branch get applied to maintenance versions.
605
606	Your patch should update the documentation and test suite. See
607	L<Writing a test>.
608
609	To report a bug in Perl, use the program I<perlbug> which comes with
610	Perl (if you can't get Perl to work, send mail to the address
611	I<[email protected]> or I<[email protected]>). Reporting bugs through
612	I<perlbug> feeds into the automated bug-tracking system, access to
613	which is provided through the web at http://bugs.perl.org/ . It
614	often pays to check the archives of the perl5-porters mailing list to
615	see whether the bug you're reporting has been reported before, and if
616	so whether it was considered a bug. See above for the location of
617	the searchable archives.
618
619	The CPAN testers ( http://testers.cpan.org/ ) are a group of
620	volunteers who test CPAN modules on a variety of platforms. Perl
621	Smokers ( http://archives.develooper.com/[email protected]/ )
622	automatically tests Perl source releases on platforms with various
623	configurations. Both efforts welcome volunteers.
624
625	It's a good idea to read and lurk for a while before chipping in.
626	That way you'll get to see the dynamic of the conversations, learn the
627	personalities of the players, and hopefully be better prepared to make
628	a useful contribution when do you speak up.
629
630	If after all this you still think you want to join the perl5-porters
631	mailing list, send mail to I<[email protected]>. To
632	unsubscribe, send mail to I<[email protected]>.
633
634	To hack on the Perl guts, you'll need to read the following things:
635
636	=over 3
637
638	=item L<perlguts>
639
640	This is of paramount importance, since it's the documentation of what
641	goes where in the Perl source. Read it over a couple of times and it
642	might start to make sense - don't worry if it doesn't yet, because the
643	best way to study it is to read it in conjunction with poking at Perl
644	source, and we'll do that later on.
645
646	You might also want to look at Gisle Aas's illustrated perlguts -
647	there's no guarantee that this will be absolutely up-to-date with the
648	latest documentation in the Perl core, but the fundamentals will be
649	right. ( http://gisle.aas.no/perl/illguts/ )
650
651	=item L<perlxstut> and L<perlxs>
652
653	A working knowledge of XSUB programming is incredibly useful for core
654	hacking; XSUBs use techniques drawn from the PP code, the portion of the
655	guts that actually executes a Perl program. It's a lot gentler to learn
656	those techniques from simple examples and explanation than from the core
657	itself.
658
659	=item L<perlapi>
660
661	The documentation for the Perl API explains what some of the internal
662	functions do, as well as the many macros used in the source.
663
664	=item F<Porting/pumpkin.pod>
665
666	This is a collection of words of wisdom for a Perl porter; some of it is
667	only useful to the pumpkin holder, but most of it applies to anyone
668	wanting to go about Perl development.
669
670	=item The perl5-porters FAQ
671
672	This should be available from http://simon-cozens.org/writings/p5p-faq ;
673	alternatively, you can get the FAQ emailed to you by sending mail to
674	C<[email protected]>. It contains hints on reading perl5-porters,
675	information on how perl5-porters works and how Perl development in general
676	works.
677
678	=back
679
680	=head2 Finding Your Way Around
681
682	Perl maintenance can be split into a number of areas, and certain people
683	(pumpkins) will have responsibility for each area. These areas sometimes
684	correspond to files or directories in the source kit. Among the areas are:
685
686	=over 3
687
688	=item Core modules
689
690	Modules shipped as part of the Perl core live in the F<lib/> and F<ext/>
691	subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
692	contains the core XS modules.
693
694	=item Tests
695
696	There are tests for nearly all the modules, built-ins and major bits
697	of functionality. Test files all have a .t suffix. Module tests live
698	in the F<lib/> and F<ext/> directories next to the module being
699	tested. Others live in F<t/>. See L<Writing a test>
700
701	=item Documentation
702
703	Documentation maintenance includes looking after everything in the
704	F<pod/> directory, (as well as contributing new documentation) and
705	the documentation to the modules in core.
706
707	=item Configure
708
709	The configure process is the way we make Perl portable across the
710	myriad of operating systems it supports. Responsibility for the
711	configure, build and installation process, as well as the overall
712	portability of the core code rests with the configure pumpkin - others
713	help out with individual operating systems.
714
715	The files involved are the operating system directories, (F<win32/>,
716	F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
717	and F<Makefile>, as well as the metaconfig files which generate
718	F<Configure>. (metaconfig isn't included in the core distribution.)
719
720	=item Interpreter
721
722	And of course, there's the core of the Perl interpreter itself. Let's
723	have a look at that in a little more detail.
724
725	=back
726
727	Before we leave looking at the layout, though, don't forget that
728	F<MANIFEST> contains not only the file names in the Perl distribution,
729	but short descriptions of what's in them, too. For an overview of the
730	important files, try this:
731
732	perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
733
734	=head2 Elements of the interpreter
735
736	The work of the interpreter has two main stages: compiling the code
737	into the internal representation, or bytecode, and then executing it.
738	L<perlguts/Compiled code> explains exactly how the compilation stage
739	happens.
740
741	Here is a short breakdown of perl's operation:
742
743	=over 3
744
745	=item Startup
746
747	The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
748	This is very high-level code, enough to fit on a single screen, and it
749	resembles the code found in L<perlembed>; most of the real action takes
750	place in F<perl.c>
751
752	First, F<perlmain.c> allocates some memory and constructs a Perl
753	interpreter:
754
755	1 PERL_SYS_INIT3(&argc,&argv,&env);
756	2
757	3 if (!PL_do_undump) {
758	4 my_perl = perl_alloc();
759	5 if (!my_perl)
760	6 exit(1);
761	7 perl_construct(my_perl);
762	8 PL_perl_destruct_level = 0;
763	9 }
764
765	Line 1 is a macro, and its definition is dependent on your operating
766	system. Line 3 references C<PL_do_undump>, a global variable - all
767	global variables in Perl start with C<PL_>. This tells you whether the
768	current running program was created with the C<-u> flag to perl and then
769	F<undump>, which means it's going to be false in any sane context.
770
771	Line 4 calls a function in F<perl.c> to allocate memory for a Perl
772	interpreter. It's quite a simple function, and the guts of it looks like
773	this:
774
775	my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
776
777	Here you see an example of Perl's system abstraction, which we'll see
778	later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
779	own C<malloc> as defined in F<malloc.c> if you selected that option at
780	configure time.
781
782	Next, in line 7, we construct the interpreter; this sets up all the
783	special variables that Perl needs, the stacks, and so on.
784
785	Now we pass Perl the command line options, and tell it to go:
786
787	exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
788	if (!exitstatus) {
789	exitstatus = perl_run(my_perl);
790	}
791
792
793	C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
794	in F<perl.c>, which processes the command line options, sets up any
795	statically linked XS modules, opens the program and calls C<yyparse> to
796	parse it.
797
798	=item Parsing
799
800	The aim of this stage is to take the Perl source, and turn it into an op
801	tree. We'll see what one of those looks like later. Strictly speaking,
802	there's three things going on here.
803
804	C<yyparse>, the parser, lives in F<perly.c>, although you're better off
805	reading the original YACC input in F<perly.y>. (Yes, Virginia, there
806	B<is> a YACC grammar for Perl!) The job of the parser is to take your
807	code and "understand" it, splitting it into sentences, deciding which
808	operands go with which operators and so on.
809
810	The parser is nobly assisted by the lexer, which chunks up your input
811	into tokens, and decides what type of thing each token is: a variable
812	name, an operator, a bareword, a subroutine, a core function, and so on.
813	The main point of entry to the lexer is C<yylex>, and that and its
814	associated routines can be found in F<toke.c>. Perl isn't much like
815	other computer languages; it's highly context sensitive at times, it can
816	be tricky to work out what sort of token something is, or where a token
817	ends. As such, there's a lot of interplay between the tokeniser and the
818	parser, which can get pretty frightening if you're not used to it.
819
820	As the parser understands a Perl program, it builds up a tree of
821	operations for the interpreter to perform during execution. The routines
822	which construct and link together the various operations are to be found
823	in F<op.c>, and will be examined later.
824
825	=item Optimization
826
827	Now the parsing stage is complete, and the finished tree represents
828	the operations that the Perl interpreter needs to perform to execute our
829	program. Next, Perl does a dry run over the tree looking for
830	optimisations: constant expressions such as C<3 + 4> will be computed
831	now, and the optimizer will also see if any multiple operations can be
832	replaced with a single one. For instance, to fetch the variable C<$foo>,
833	instead of grabbing the glob C<*foo> and looking at the scalar
834	component, the optimizer fiddles the op tree to use a function which
835	directly looks up the scalar in question. The main optimizer is C<peep>
836	in F<op.c>, and many ops have their own optimizing functions.
837
838	=item Running
839
840	Now we're finally ready to go: we have compiled Perl byte code, and all
841	that's left to do is run it. The actual execution is done by the
842	C<runops_standard> function in F<run.c>; more specifically, it's done by
843	these three innocent looking lines:
844
845	while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
846	PERL_ASYNC_CHECK();
847	}
848
849	You may be more comfortable with the Perl version of that:
850
851	PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
852
853	Well, maybe not. Anyway, each op contains a function pointer, which
854	stipulates the function which will actually carry out the operation.
855	This function will return the next op in the sequence - this allows for
856	things like C<if> which choose the next op dynamically at run time.
857	The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
858	execution if required.
859
860	The actual functions called are known as PP code, and they're spread
861	between four files: F<pp_hot.c> contains the "hot" code, which is most
862	often used and highly optimized, F<pp_sys.c> contains all the
863	system-specific functions, F<pp_ctl.c> contains the functions which
864	implement control structures (C<if>, C<while> and the like) and F<pp.c>
865	contains everything else. These are, if you like, the C code for Perl's
866	built-in functions and operators.
867
868	Note that each C<pp_> function is expected to return a pointer to the next
869	op. Calls to perl subs (and eval blocks) are handled within the same
870	runops loop, and do not consume extra space on the C stack. For example,
871	C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> block
872	struct onto the context stack which contain the address of the op
873	following the sub call or eval. They then return the first op of that sub
874	or eval block, and so execution continues of that sub or block. Later, a
875	C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>,
876	retrieves the return op from it, and returns it.