source: trunk/essentials/dev-lang/perl/pod/perlguts.pod

Last change on this file was 3181, checked in by bird, 19 years ago

perl 5.8.8

File size: 98.0 KB
Line 
1=head1 NAME
2
3perlguts - Introduction to the Perl API
4
5=head1 DESCRIPTION
6
7This document attempts to describe how to use the Perl API, as well as
8to provide some info on the basic workings of the Perl core. It is far
9from complete and probably contains many errors. Please refer any
10questions or comments to the author below.
11
12=head1 Variables
13
14=head2 Datatypes
15
16Perl has three typedefs that handle Perl's three main data types:
17
18 SV Scalar Value
19 AV Array Value
20 HV Hash Value
21
22Each typedef has specific routines that manipulate the various data types.
23
24=head2 What is an "IV"?
25
26Perl uses a special typedef IV which is a simple signed integer type that is
27guaranteed to be large enough to hold a pointer (as well as an integer).
28Additionally, there is the UV, which is simply an unsigned IV.
29
30Perl also uses two special typedefs, I32 and I16, which will always be at
31least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
32as well.) They will usually be exactly 32 and 16 bits long, but on Crays
33they will both be 64 bits.
34
35=head2 Working with SVs
36
37An SV can be created and loaded with one command. There are five types of
38values that can be loaded: an integer value (IV), an unsigned integer
39value (UV), a double (NV), a string (PV), and another scalar (SV).
40
41The seven routines are:
42
43 SV* newSViv(IV);
44 SV* newSVuv(UV);
45 SV* newSVnv(double);
46 SV* newSVpv(const char*, STRLEN);
47 SV* newSVpvn(const char*, STRLEN);
48 SV* newSVpvf(const char*, ...);
49 SV* newSVsv(SV*);
50
51C<STRLEN> is an integer type (Size_t, usually defined as size_t in
52F<config.h>) guaranteed to be large enough to represent the size of
53any string that perl can handle.
54
55In the unlikely case of a SV requiring more complex initialisation, you
56can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
57type NULL is returned, else an SV of type PV is returned with len + 1 (for
58the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
59the SV has value undef.
60
61 SV *sv = newSV(0); /* no storage allocated */
62 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
63
64To change the value of an I<already-existing> SV, there are eight routines:
65
66 void sv_setiv(SV*, IV);
67 void sv_setuv(SV*, UV);
68 void sv_setnv(SV*, double);
69 void sv_setpv(SV*, const char*);
70 void sv_setpvn(SV*, const char*, STRLEN)
71 void sv_setpvf(SV*, const char*, ...);
72 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
73 void sv_setsv(SV*, SV*);
74
75Notice that you can choose to specify the length of the string to be
76assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
77allow Perl to calculate the length by using C<sv_setpv> or by specifying
780 as the second argument to C<newSVpv>. Be warned, though, that Perl will
79determine the string's length by using C<strlen>, which depends on the
80string terminating with a NUL character.
81
82The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
83formatted output becomes the value.
84
85C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
86either a pointer to a variable argument list or the address and length of
87an array of SVs. The last argument points to a boolean; on return, if that
88boolean is true, then locale-specific information has been used to format
89the string, and the string's contents are therefore untrustworthy (see
90L<perlsec>). This pointer may be NULL if that information is not
91important. Note that this function requires you to specify the length of
92the format.
93
94The C<sv_set*()> functions are not generic enough to operate on values
95that have "magic". See L<Magic Virtual Tables> later in this document.
96
97All SVs that contain strings should be terminated with a NUL character.
98If it is not NUL-terminated there is a risk of
99core dumps and corruptions from code which passes the string to C
100functions or system calls which expect a NUL-terminated string.
101Perl's own functions typically add a trailing NUL for this reason.
102Nevertheless, you should be very careful when you pass a string stored
103in an SV to a C function or system call.
104
105To access the actual value that an SV points to, you can use the macros:
106
107 SvIV(SV*)
108 SvUV(SV*)
109 SvNV(SV*)
110 SvPV(SV*, STRLEN len)
111 SvPV_nolen(SV*)
112
113which will automatically coerce the actual scalar type into an IV, UV, double,
114or string.
115
116In the C<SvPV> macro, the length of the string returned is placed into the
117variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
118not care what the length of the data is, use the C<SvPV_nolen> macro.
119Historically the C<SvPV> macro with the global variable C<PL_na> has been
120used in this case. But that can be quite inefficient because C<PL_na> must
121be accessed in thread-local storage in threaded Perl. In any case, remember
122that Perl allows arbitrary strings of data that may both contain NULs and
123might not be terminated by a NUL.
124
125Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
126len);>. It might work with your compiler, but it won't work for everyone.
127Break this sort of statement up into separate assignments:
128
129 SV *s;
130 STRLEN len;
131 char * ptr;
132 ptr = SvPV(s, len);
133 foo(ptr, len);
134
135If you want to know if the scalar value is TRUE, you can use:
136
137 SvTRUE(SV*)
138
139Although Perl will automatically grow strings for you, if you need to force
140Perl to allocate more memory for your SV, you can use the macro
141
142 SvGROW(SV*, STRLEN newlen)
143
144which will determine if more memory needs to be allocated. If so, it will
145call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
146decrease, the allocated memory of an SV and that it does not automatically
147add a byte for the a trailing NUL (perl's own string functions typically do
148C<SvGROW(sv, len + 1)>).
149
150If you have an SV and want to know what kind of data Perl thinks is stored
151in it, you can use the following macros to check the type of SV you have.
152
153 SvIOK(SV*)
154 SvNOK(SV*)
155 SvPOK(SV*)
156
157You can get and set the current length of the string stored in an SV with
158the following macros:
159
160 SvCUR(SV*)
161 SvCUR_set(SV*, I32 val)
162
163You can also get a pointer to the end of the string stored in the SV
164with the macro:
165
166 SvEND(SV*)
167
168But note that these last three macros are valid only if C<SvPOK()> is true.
169
170If you want to append something to the end of string stored in an C<SV*>,
171you can use the following functions:
172
173 void sv_catpv(SV*, const char*);
174 void sv_catpvn(SV*, const char*, STRLEN);
175 void sv_catpvf(SV*, const char*, ...);
176 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
177 void sv_catsv(SV*, SV*);
178
179The first function calculates the length of the string to be appended by
180using C<strlen>. In the second, you specify the length of the string
181yourself. The third function processes its arguments like C<sprintf> and
182appends the formatted output. The fourth function works like C<vsprintf>.
183You can specify the address and length of an array of SVs instead of the
184va_list argument. The fifth function extends the string stored in the first
185SV with the string stored in the second SV. It also forces the second SV
186to be interpreted as a string.
187
188The C<sv_cat*()> functions are not generic enough to operate on values that
189have "magic". See L<Magic Virtual Tables> later in this document.
190
191If you know the name of a scalar variable, you can get a pointer to its SV
192by using the following:
193
194 SV* get_sv("package::varname", FALSE);
195
196This returns NULL if the variable does not exist.
197
198If you want to know if this variable (or any other SV) is actually C<defined>,
199you can call:
200
201 SvOK(SV*)
202
203The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
204
205Its address can be used whenever an C<SV*> is needed. Make sure that
206you don't try to compare a random sv with C<&PL_sv_undef>. For example
207when interfacing Perl code, it'll work correctly for:
208
209 foo(undef);
210
211But won't work when called as:
212
213 $x = undef;
214 foo($x);
215
216So to repeat always use SvOK() to check whether an sv is defined.
217
218Also you have to be careful when using C<&PL_sv_undef> as a value in
219AVs or HVs (see L<AVs, HVs and undefined values>).
220
221There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
222boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
223addresses can be used whenever an C<SV*> is needed.
224
225Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
226Take this code:
227
228 SV* sv = (SV*) 0;
229 if (I-am-to-return-a-real-value) {
230 sv = sv_2mortal(newSViv(42));
231 }
232 sv_setsv(ST(0), sv);
233
234This code tries to return a new SV (which contains the value 42) if it should
235return a real value, or undef otherwise. Instead it has returned a NULL
236pointer which, somewhere down the line, will cause a segmentation violation,
237bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
238first line and all will be well.
239
240To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
241call is not necessary (see L<Reference Counts and Mortality>).
242
243=head2 Offsets
244
245Perl provides the function C<sv_chop> to efficiently remove characters
246from the beginning of a string; you give it an SV and a pointer to
247somewhere inside the PV, and it discards everything before the
248pointer. The efficiency comes by means of a little hack: instead of
249actually removing the characters, C<sv_chop> sets the flag C<OOK>
250(offset OK) to signal to other functions that the offset hack is in
251effect, and it puts the number of bytes chopped off into the IV field
252of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
253many bytes, and adjusts C<SvCUR> and C<SvLEN>.
254
255Hence, at this point, the start of the buffer that we allocated lives
256at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
257into the middle of this allocated storage.
258
259This is best demonstrated by example:
260
261 % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
262 SV = PVIV(0x8128450) at 0x81340f0
263 REFCNT = 1
264 FLAGS = (POK,OOK,pPOK)
265 IV = 1 (OFFSET)
266 PV = 0x8135781 ( "1" . ) "2345"\0
267 CUR = 4
268 LEN = 5
269
270Here the number of bytes chopped off (1) is put into IV, and
271C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
272portion of the string between the "real" and the "fake" beginnings is
273shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
274the fake beginning, not the real one.
275
276Something similar to the offset hack is performed on AVs to enable
277efficient shifting and splicing off the beginning of the array; while
278C<AvARRAY> points to the first element in the array that is visible from
279Perl, C<AvALLOC> points to the real start of the C array. These are
280usually the same, but a C<shift> operation can be carried out by
281increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
282Again, the location of the real start of the C array only comes into
283play when freeing the array. See C<av_shift> in F<av.c>.
284
285=head2 What's Really Stored in an SV?
286
287Recall that the usual method of determining the type of scalar you have is
288to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
289usually these macros will always return TRUE and calling the C<Sv*V>
290macros will do the appropriate conversion of string to integer/double or