| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlguts - Introduction to the Perl API
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | This document attempts to describe how to use the Perl API, as well as
|
|---|
| 8 | to provide some info on the basic workings of the Perl core. It is far
|
|---|
| 9 | from complete and probably contains many errors. Please refer any
|
|---|
| 10 | questions or comments to the author below.
|
|---|
| 11 |
|
|---|
| 12 | =head1 Variables
|
|---|
| 13 |
|
|---|
| 14 | =head2 Datatypes
|
|---|
| 15 |
|
|---|
| 16 | Perl has three typedefs that handle Perl's three main data types:
|
|---|
| 17 |
|
|---|
| 18 | SV Scalar Value
|
|---|
| 19 | AV Array Value
|
|---|
| 20 | HV Hash Value
|
|---|
| 21 |
|
|---|
| 22 | Each typedef has specific routines that manipulate the various data types.
|
|---|
| 23 |
|
|---|
| 24 | =head2 What is an "IV"?
|
|---|
| 25 |
|
|---|
| 26 | Perl uses a special typedef IV which is a simple signed integer type that is
|
|---|
| 27 | guaranteed to be large enough to hold a pointer (as well as an integer).
|
|---|
| 28 | Additionally, there is the UV, which is simply an unsigned IV.
|
|---|
| 29 |
|
|---|
| 30 | Perl also uses two special typedefs, I32 and I16, which will always be at
|
|---|
| 31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
|
|---|
| 32 | as well.) They will usually be exactly 32 and 16 bits long, but on Crays
|
|---|
| 33 | they will both be 64 bits.
|
|---|
| 34 |
|
|---|
| 35 | =head2 Working with SVs
|
|---|
| 36 |
|
|---|
| 37 | An SV can be created and loaded with one command. There are five types of
|
|---|
| 38 | values that can be loaded: an integer value (IV), an unsigned integer
|
|---|
| 39 | value (UV), a double (NV), a string (PV), and another scalar (SV).
|
|---|
| 40 |
|
|---|
| 41 | The seven routines are:
|
|---|
| 42 |
|
|---|
| 43 | SV* newSViv(IV);
|
|---|
| 44 | SV* newSVuv(UV);
|
|---|
| 45 | SV* newSVnv(double);
|
|---|
| 46 | SV* newSVpv(const char*, STRLEN);
|
|---|
| 47 | SV* newSVpvn(const char*, STRLEN);
|
|---|
| 48 | SV* newSVpvf(const char*, ...);
|
|---|
| 49 | SV* newSVsv(SV*);
|
|---|
| 50 |
|
|---|
| 51 | C<STRLEN> is an integer type (Size_t, usually defined as size_t in
|
|---|
| 52 | F<config.h>) guaranteed to be large enough to represent the size of
|
|---|
| 53 | any string that perl can handle.
|
|---|
| 54 |
|
|---|
| 55 | In the unlikely case of a SV requiring more complex initialisation, you
|
|---|
| 56 | can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
|
|---|
| 57 | type NULL is returned, else an SV of type PV is returned with len + 1 (for
|
|---|
| 58 | the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
|
|---|
| 59 | the SV has value undef.
|
|---|
| 60 |
|
|---|
| 61 | SV *sv = newSV(0); /* no storage allocated */
|
|---|
| 62 | SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
|
|---|
| 63 |
|
|---|
| 64 | To change the value of an I<already-existing> SV, there are eight routines:
|
|---|
| 65 |
|
|---|
| 66 | void sv_setiv(SV*, IV);
|
|---|
| 67 | void sv_setuv(SV*, UV);
|
|---|
| 68 | void sv_setnv(SV*, double);
|
|---|
| 69 | void sv_setpv(SV*, const char*);
|
|---|
| 70 | void sv_setpvn(SV*, const char*, STRLEN)
|
|---|
| 71 | void sv_setpvf(SV*, const char*, ...);
|
|---|
| 72 | void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
|
|---|
| 73 | void sv_setsv(SV*, SV*);
|
|---|
| 74 |
|
|---|
| 75 | Notice that you can choose to specify the length of the string to be
|
|---|
| 76 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
|
|---|
| 77 | allow Perl to calculate the length by using C<sv_setpv> or by specifying
|
|---|
| 78 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
|
|---|
| 79 | determine the string's length by using C<strlen>, which depends on the
|
|---|
| 80 | string terminating with a NUL character.
|
|---|
| 81 |
|
|---|
| 82 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
|
|---|
| 83 | formatted output becomes the value.
|
|---|
| 84 |
|
|---|
| 85 | C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
|
|---|
| 86 | either a pointer to a variable argument list or the address and length of
|
|---|
| 87 | an array of SVs. The last argument points to a boolean; on return, if that
|
|---|
| 88 | boolean is true, then locale-specific information has been used to format
|
|---|
| 89 | the string, and the string's contents are therefore untrustworthy (see
|
|---|
| 90 | L<perlsec>). This pointer may be NULL if that information is not
|
|---|
| 91 | important. Note that this function requires you to specify the length of
|
|---|
| 92 | the format.
|
|---|
| 93 |
|
|---|
| 94 | The C<sv_set*()> functions are not generic enough to operate on values
|
|---|
| 95 | that have "magic". See L<Magic Virtual Tables> later in this document.
|
|---|
| 96 |
|
|---|
| 97 | All SVs that contain strings should be terminated with a NUL character.
|
|---|
| 98 | If it is not NUL-terminated there is a risk of
|
|---|
| 99 | core dumps and corruptions from code which passes the string to C
|
|---|
| 100 | functions or system calls which expect a NUL-terminated string.
|
|---|
| 101 | Perl's own functions typically add a trailing NUL for this reason.
|
|---|
| 102 | Nevertheless, you should be very careful when you pass a string stored
|
|---|
| 103 | in an SV to a C function or system call.
|
|---|
| 104 |
|
|---|
| 105 | To access the actual value that an SV points to, you can use the macros:
|
|---|
| 106 |
|
|---|
| 107 | SvIV(SV*)
|
|---|
| 108 | SvUV(SV*)
|
|---|
| 109 | SvNV(SV*)
|
|---|
| 110 | SvPV(SV*, STRLEN len)
|
|---|
| 111 | SvPV_nolen(SV*)
|
|---|
| 112 |
|
|---|
| 113 | which will automatically coerce the actual scalar type into an IV, UV, double,
|
|---|
| 114 | or string.
|
|---|
| 115 |
|
|---|
| 116 | In the C<SvPV> macro, the length of the string returned is placed into the
|
|---|
| 117 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
|
|---|
| 118 | not care what the length of the data is, use the C<SvPV_nolen> macro.
|
|---|
| 119 | Historically the C<SvPV> macro with the global variable C<PL_na> has been
|
|---|
| 120 | used in this case. But that can be quite inefficient because C<PL_na> must
|
|---|
| 121 | be accessed in thread-local storage in threaded Perl. In any case, remember
|
|---|
| 122 | that Perl allows arbitrary strings of data that may both contain NULs and
|
|---|
| 123 | might not be terminated by a NUL.
|
|---|
| 124 |
|
|---|
| 125 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
|
|---|
| 126 | len);>. It might work with your compiler, but it won't work for everyone.
|
|---|
| 127 | Break this sort of statement up into separate assignments:
|
|---|
| 128 |
|
|---|
| 129 | SV *s;
|
|---|
| 130 | STRLEN len;
|
|---|
| 131 | char * ptr;
|
|---|
| 132 | ptr = SvPV(s, len);
|
|---|
| 133 | foo(ptr, len);
|
|---|
| 134 |
|
|---|
| 135 | If you want to know if the scalar value is TRUE, you can use:
|
|---|
| 136 |
|
|---|
| 137 | SvTRUE(SV*)
|
|---|
| 138 |
|
|---|
| 139 | Although Perl will automatically grow strings for you, if you need to force
|
|---|
| 140 | Perl to allocate more memory for your SV, you can use the macro
|
|---|
| 141 |
|
|---|
| 142 | SvGROW(SV*, STRLEN newlen)
|
|---|
| 143 |
|
|---|
| 144 | which will determine if more memory needs to be allocated. If so, it will
|
|---|
| 145 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
|
|---|
| 146 | decrease, the allocated memory of an SV and that it does not automatically
|
|---|
| 147 | add a byte for the a trailing NUL (perl's own string functions typically do
|
|---|
| 148 | C<SvGROW(sv, len + 1)>).
|
|---|
| 149 |
|
|---|
| 150 | If you have an SV and want to know what kind of data Perl thinks is stored
|
|---|
| 151 | in it, you can use the following macros to check the type of SV you have.
|
|---|
| 152 |
|
|---|
| 153 | SvIOK(SV*)
|
|---|
| 154 | SvNOK(SV*)
|
|---|
| 155 | SvPOK(SV*)
|
|---|
| 156 |
|
|---|
| 157 | You can get and set the current length of the string stored in an SV with
|
|---|
| 158 | the following macros:
|
|---|
| 159 |
|
|---|
| 160 | SvCUR(SV*)
|
|---|
| 161 | SvCUR_set(SV*, I32 val)
|
|---|
| 162 |
|
|---|
| 163 | You can also get a pointer to the end of the string stored in the SV
|
|---|
| 164 | with the macro:
|
|---|
| 165 |
|
|---|
| 166 | SvEND(SV*)
|
|---|
| 167 |
|
|---|
| 168 | But note that these last three macros are valid only if C<SvPOK()> is true.
|
|---|
| 169 |
|
|---|
| 170 | If you want to append something to the end of string stored in an C<SV*>,
|
|---|
| 171 | you can use the following functions:
|
|---|
| 172 |
|
|---|
| 173 | void sv_catpv(SV*, const char*);
|
|---|
| 174 | void sv_catpvn(SV*, const char*, STRLEN);
|
|---|
| 175 | void sv_catpvf(SV*, const char*, ...);
|
|---|
| 176 | void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
|
|---|
| 177 | void sv_catsv(SV*, SV*);
|
|---|
| 178 |
|
|---|
| 179 | The first function calculates the length of the string to be appended by
|
|---|
| 180 | using C<strlen>. In the second, you specify the length of the string
|
|---|
| 181 | yourself. The third function processes its arguments like C<sprintf> and
|
|---|
| 182 | appends the formatted output. The fourth function works like C<vsprintf>.
|
|---|
| 183 | You can specify the address and length of an array of SVs instead of the
|
|---|
| 184 | va_list argument. The fifth function extends the string stored in the first
|
|---|
| 185 | SV with the string stored in the second SV. It also forces the second SV
|
|---|
| 186 | to be interpreted as a string.
|
|---|
| 187 |
|
|---|
| 188 | The C<sv_cat*()> functions are not generic enough to operate on values that
|
|---|
| 189 | have "magic". See L<Magic Virtual Tables> later in this document.
|
|---|
| 190 |
|
|---|
| 191 | If you know the name of a scalar variable, you can get a pointer to its SV
|
|---|
| 192 | by using the following:
|
|---|
| 193 |
|
|---|
| 194 | SV* get_sv("package::varname", FALSE);
|
|---|
| 195 |
|
|---|
| 196 | This returns NULL if the variable does not exist.
|
|---|
| 197 |
|
|---|
| 198 | If you want to know if this variable (or any other SV) is actually C<defined>,
|
|---|
| 199 | you can call:
|
|---|
| 200 |
|
|---|
| 201 | SvOK(SV*)
|
|---|
| 202 |
|
|---|
| 203 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
|
|---|
| 204 |
|
|---|
| 205 | Its address can be used whenever an C<SV*> is needed. Make sure that
|
|---|
| 206 | you don't try to compare a random sv with C<&PL_sv_undef>. For example
|
|---|
| 207 | when interfacing Perl code, it'll work correctly for:
|
|---|
| 208 |
|
|---|
| 209 | foo(undef);
|
|---|
| 210 |
|
|---|
| 211 | But won't work when called as:
|
|---|
| 212 |
|
|---|
| 213 | $x = undef;
|
|---|
| 214 | foo($x);
|
|---|
| 215 |
|
|---|
| 216 | So to repeat always use SvOK() to check whether an sv is defined.
|
|---|
| 217 |
|
|---|
| 218 | Also you have to be careful when using C<&PL_sv_undef> as a value in
|
|---|
| 219 | AVs or HVs (see L<AVs, HVs and undefined values>).
|
|---|
| 220 |
|
|---|
| 221 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
|
|---|
| 222 | boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
|
|---|
| 223 | addresses can be used whenever an C<SV*> is needed.
|
|---|
| 224 |
|
|---|
| 225 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
|
|---|
| 226 | Take this code:
|
|---|
| 227 |
|
|---|
| 228 | SV* sv = (SV*) 0;
|
|---|
| 229 | if (I-am-to-return-a-real-value) {
|
|---|
| 230 | sv = sv_2mortal(newSViv(42));
|
|---|
| 231 | }
|
|---|
| 232 | sv_setsv(ST(0), sv);
|
|---|
| 233 |
|
|---|
| 234 | This code tries to return a new SV (which contains the value 42) if it should
|
|---|
| 235 | return a real value, or undef otherwise. Instead it has returned a NULL
|
|---|
| 236 | pointer which, somewhere down the line, will cause a segmentation violation,
|
|---|
| 237 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
|
|---|
| 238 | first line and all will be well.
|
|---|
| 239 |
|
|---|
| 240 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
|
|---|
| 241 | call is not necessary (see L<Reference Counts and Mortality>).
|
|---|
| 242 |
|
|---|
| 243 | =head2 Offsets
|
|---|
| 244 |
|
|---|
| 245 | Perl provides the function C<sv_chop> to efficiently remove characters
|
|---|
| 246 | from the beginning of a string; you give it an SV and a pointer to
|
|---|
| 247 | somewhere inside the PV, and it discards everything before the
|
|---|
| 248 | pointer. The efficiency comes by means of a little hack: instead of
|
|---|
| 249 | actually removing the characters, C<sv_chop> sets the flag C<OOK>
|
|---|
| 250 | (offset OK) to signal to other functions that the offset hack is in
|
|---|
| 251 | effect, and it puts the number of bytes chopped off into the IV field
|
|---|
| 252 | of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
|
|---|
| 253 | many bytes, and adjusts C<SvCUR> and C<SvLEN>.
|
|---|
| 254 |
|
|---|
| 255 | Hence, at this point, the start of the buffer that we allocated lives
|
|---|
| 256 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
|
|---|
| 257 | into the middle of this allocated storage.
|
|---|
| 258 |
|
|---|
| 259 | This is best demonstrated by example:
|
|---|
| 260 |
|
|---|
| 261 | % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
|
|---|
| 262 | SV = PVIV(0x8128450) at 0x81340f0
|
|---|
| 263 | REFCNT = 1
|
|---|
| 264 | FLAGS = (POK,OOK,pPOK)
|
|---|
| 265 | IV = 1 (OFFSET)
|
|---|
| 266 | PV = 0x8135781 ( "1" . ) "2345"\0
|
|---|
| 267 | CUR = 4
|
|---|
| 268 | LEN = 5
|
|---|
| 269 |
|
|---|
| 270 | Here the number of bytes chopped off (1) is put into IV, and
|
|---|
| 271 | C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
|
|---|
| 272 | portion of the string between the "real" and the "fake" beginnings is
|
|---|
| 273 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
|
|---|
| 274 | the fake beginning, not the real one.
|
|---|
| 275 |
|
|---|
| 276 | Something similar to the offset hack is performed on AVs to enable
|
|---|
| 277 | efficient shifting and splicing off the beginning of the array; while
|
|---|
| 278 | C<AvARRAY> points to the first element in the array that is visible from
|
|---|
| 279 | Perl, C<AvALLOC> points to the real start of the C array. These are
|
|---|
| 280 | usually the same, but a C<shift> operation can be carried out by
|
|---|
| 281 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
|
|---|
| 282 | Again, the location of the real start of the C array only comes into
|
|---|
| 283 | play when freeing the array. See C<av_shift> in F<av.c>.
|
|---|
| 284 |
|
|---|
| 285 | =head2 What's Really Stored in an SV?
|
|---|
| 286 |
|
|---|
| 287 | Recall that the usual method of determining the type of scalar you have is
|
|---|
| 288 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
|
|---|
| 289 | usually these macros will always return TRUE and calling the C<Sv*V>
|
|---|
| 290 | macros will do the appropriate conversion of string to integer/double or
|
|---|
|
|---|