| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlguts - Introduction to the Perl API
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | This document attempts to describe how to use the Perl API, as well as
|
|---|
| 8 | to provide some info on the basic workings of the Perl core. It is far
|
|---|
| 9 | from complete and probably contains many errors. Please refer any
|
|---|
| 10 | questions or comments to the author below.
|
|---|
| 11 |
|
|---|
| 12 | =head1 Variables
|
|---|
| 13 |
|
|---|
| 14 | =head2 Datatypes
|
|---|
| 15 |
|
|---|
| 16 | Perl has three typedefs that handle Perl's three main data types:
|
|---|
| 17 |
|
|---|
| 18 | SV Scalar Value
|
|---|
| 19 | AV Array Value
|
|---|
| 20 | HV Hash Value
|
|---|
| 21 |
|
|---|
| 22 | Each typedef has specific routines that manipulate the various data types.
|
|---|
| 23 |
|
|---|
| 24 | =head2 What is an "IV"?
|
|---|
| 25 |
|
|---|
| 26 | Perl uses a special typedef IV which is a simple signed integer type that is
|
|---|
| 27 | guaranteed to be large enough to hold a pointer (as well as an integer).
|
|---|
| 28 | Additionally, there is the UV, which is simply an unsigned IV.
|
|---|
| 29 |
|
|---|
| 30 | Perl also uses two special typedefs, I32 and I16, which will always be at
|
|---|
| 31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
|
|---|
| 32 | as well.) They will usually be exactly 32 and 16 bits long, but on Crays
|
|---|
| 33 | they will both be 64 bits.
|
|---|
| 34 |
|
|---|
| 35 | =head2 Working with SVs
|
|---|
| 36 |
|
|---|
| 37 | An SV can be created and loaded with one command. There are five types of
|
|---|
| 38 | values that can be loaded: an integer value (IV), an unsigned integer
|
|---|
| 39 | value (UV), a double (NV), a string (PV), and another scalar (SV).
|
|---|
| 40 |
|
|---|
| 41 | The seven routines are:
|
|---|
| 42 |
|
|---|
| 43 | SV* newSViv(IV);
|
|---|
| 44 | SV* newSVuv(UV);
|
|---|
| 45 | SV* newSVnv(double);
|
|---|
| 46 | SV* newSVpv(const char*, STRLEN);
|
|---|
| 47 | SV* newSVpvn(const char*, STRLEN);
|
|---|
| 48 | SV* newSVpvf(const char*, ...);
|
|---|
| 49 | SV* newSVsv(SV*);
|
|---|
| 50 |
|
|---|
| 51 | C<STRLEN> is an integer type (Size_t, usually defined as size_t in
|
|---|
| 52 | F<config.h>) guaranteed to be large enough to represent the size of
|
|---|
| 53 | any string that perl can handle.
|
|---|
| 54 |
|
|---|
| 55 | In the unlikely case of a SV requiring more complex initialisation, you
|
|---|
| 56 | can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
|
|---|
| 57 | type NULL is returned, else an SV of type PV is returned with len + 1 (for
|
|---|
| 58 | the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
|
|---|
| 59 | the SV has value undef.
|
|---|
| 60 |
|
|---|
| 61 | SV *sv = newSV(0); /* no storage allocated */
|
|---|
| 62 | SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
|
|---|
| 63 |
|
|---|
| 64 | To change the value of an I<already-existing> SV, there are eight routines:
|
|---|
| 65 |
|
|---|
| 66 | void sv_setiv(SV*, IV);
|
|---|
| 67 | void sv_setuv(SV*, UV);
|
|---|
| 68 | void sv_setnv(SV*, double);
|
|---|
| 69 | void sv_setpv(SV*, const char*);
|
|---|
| 70 | void sv_setpvn(SV*, const char*, STRLEN)
|
|---|
| 71 | void sv_setpvf(SV*, const char*, ...);
|
|---|
| 72 | void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
|
|---|
| 73 | void sv_setsv(SV*, SV*);
|
|---|
| 74 |
|
|---|
| 75 | Notice that you can choose to specify the length of the string to be
|
|---|
| 76 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
|
|---|
| 77 | allow Perl to calculate the length by using C<sv_setpv> or by specifying
|
|---|
| 78 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
|
|---|
| 79 | determine the string's length by using C<strlen>, which depends on the
|
|---|
| 80 | string terminating with a NUL character.
|
|---|
| 81 |
|
|---|
| 82 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
|
|---|
| 83 | formatted output becomes the value.
|
|---|
| 84 |
|
|---|
| 85 | C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
|
|---|
| 86 | either a pointer to a variable argument list or the address and length of
|
|---|
| 87 | an array of SVs. The last argument points to a boolean; on return, if that
|
|---|
| 88 | boolean is true, then locale-specific information has been used to format
|
|---|
| 89 | the string, and the string's contents are therefore untrustworthy (see
|
|---|
| 90 | L<perlsec>). This pointer may be NULL if that information is not
|
|---|
| 91 | important. Note that this function requires you to specify the length of
|
|---|
| 92 | the format.
|
|---|
| 93 |
|
|---|
| 94 | The C<sv_set*()> functions are not generic enough to operate on values
|
|---|
| 95 | that have "magic". See L<Magic Virtual Tables> later in this document.
|
|---|
| 96 |
|
|---|
| 97 | All SVs that contain strings should be terminated with a NUL character.
|
|---|
| 98 | If it is not NUL-terminated there is a risk of
|
|---|
| 99 | core dumps and corruptions from code which passes the string to C
|
|---|
| 100 | functions or system calls which expect a NUL-terminated string.
|
|---|
| 101 | Perl's own functions typically add a trailing NUL for this reason.
|
|---|
| 102 | Nevertheless, you should be very careful when you pass a string stored
|
|---|
| 103 | in an SV to a C function or system call.
|
|---|
| 104 |
|
|---|
| 105 | To access the actual value that an SV points to, you can use the macros:
|
|---|
| 106 |
|
|---|
| 107 | SvIV(SV*)
|
|---|
| 108 | SvUV(SV*)
|
|---|
| 109 | SvNV(SV*)
|
|---|
| 110 | SvPV(SV*, STRLEN len)
|
|---|
| 111 | SvPV_nolen(SV*)
|
|---|
| 112 |
|
|---|
| 113 | which will automatically coerce the actual scalar type into an IV, UV, double,
|
|---|
| 114 | or string.
|
|---|
| 115 |
|
|---|
| 116 | In the C<SvPV> macro, the length of the string returned is placed into the
|
|---|
| 117 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
|
|---|
| 118 | not care what the length of the data is, use the C<SvPV_nolen> macro.
|
|---|
| 119 | Historically the C<SvPV> macro with the global variable C<PL_na> has been
|
|---|
| 120 | used in this case. But that can be quite inefficient because C<PL_na> must
|
|---|
| 121 | be accessed in thread-local storage in threaded Perl. In any case, remember
|
|---|
| 122 | that Perl allows arbitrary strings of data that may both contain NULs and
|
|---|
| 123 | might not be terminated by a NUL.
|
|---|
| 124 |
|
|---|
| 125 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
|
|---|
| 126 | len);>. It might work with your compiler, but it won't work for everyone.
|
|---|
| 127 | Break this sort of statement up into separate assignments:
|
|---|
| 128 |
|
|---|
| 129 | SV *s;
|
|---|
| 130 | STRLEN len;
|
|---|
| 131 | char * ptr;
|
|---|
| 132 | ptr = SvPV(s, len);
|
|---|
| 133 | foo(ptr, len);
|
|---|
| 134 |
|
|---|
| 135 | If you want to know if the scalar value is TRUE, you can use:
|
|---|
| 136 |
|
|---|
| 137 | SvTRUE(SV*)
|
|---|
| 138 |
|
|---|
| 139 | Although Perl will automatically grow strings for you, if you need to force
|
|---|
| 140 | Perl to allocate more memory for your SV, you can use the macro
|
|---|
| 141 |
|
|---|
| 142 | SvGROW(SV*, STRLEN newlen)
|
|---|
| 143 |
|
|---|
| 144 | which will determine if more memory needs to be allocated. If so, it will
|
|---|
| 145 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
|
|---|
| 146 | decrease, the allocated memory of an SV and that it does not automatically
|
|---|
| 147 | add a byte for the a trailing NUL (perl's own string functions typically do
|
|---|
| 148 | C<SvGROW(sv, len + 1)>).
|
|---|
| 149 |
|
|---|
| 150 | If you have an SV and want to know what kind of data Perl thinks is stored
|
|---|
| 151 | in it, you can use the following macros to check the type of SV you have.
|
|---|
| 152 |
|
|---|
| 153 | SvIOK(SV*)
|
|---|
| 154 | SvNOK(SV*)
|
|---|
| 155 | SvPOK(SV*)
|
|---|
| 156 |
|
|---|
| 157 | You can get and set the current length of the string stored in an SV with
|
|---|
| 158 | the following macros:
|
|---|
| 159 |
|
|---|
| 160 | SvCUR(SV*)
|
|---|
| 161 | SvCUR_set(SV*, I32 val)
|
|---|
| 162 |
|
|---|
| 163 | You can also get a pointer to the end of the string stored in the SV
|
|---|
| 164 | with the macro:
|
|---|
| 165 |
|
|---|
| 166 | SvEND(SV*)
|
|---|
| 167 |
|
|---|
| 168 | But note that these last three macros are valid only if C<SvPOK()> is true.
|
|---|
| 169 |
|
|---|
| 170 | If you want to append something to the end of string stored in an C<SV*>,
|
|---|
| 171 | you can use the following functions:
|
|---|
| 172 |
|
|---|
| 173 | void sv_catpv(SV*, const char*);
|
|---|
| 174 | void sv_catpvn(SV*, const char*, STRLEN);
|
|---|
| 175 | void sv_catpvf(SV*, const char*, ...);
|
|---|
| 176 | void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
|
|---|
| 177 | void sv_catsv(SV*, SV*);
|
|---|
| 178 |
|
|---|
| 179 | The first function calculates the length of the string to be appended by
|
|---|
| 180 | using C<strlen>. In the second, you specify the length of the string
|
|---|
| 181 | yourself. The third function processes its arguments like C<sprintf> and
|
|---|
| 182 | appends the formatted output. The fourth function works like C<vsprintf>.
|
|---|
| 183 | You can specify the address and length of an array of SVs instead of the
|
|---|
| 184 | va_list argument. The fifth function extends the string stored in the first
|
|---|
| 185 | SV with the string stored in the second SV. It also forces the second SV
|
|---|
| 186 | to be interpreted as a string.
|
|---|
| 187 |
|
|---|
| 188 | The C<sv_cat*()> functions are not generic enough to operate on values that
|
|---|
| 189 | have "magic". See L<Magic Virtual Tables> later in this document.
|
|---|
| 190 |
|
|---|
| 191 | If you know the name of a scalar variable, you can get a pointer to its SV
|
|---|
| 192 | by using the following:
|
|---|
| 193 |
|
|---|
| 194 | SV* get_sv("package::varname", FALSE);
|
|---|
| 195 |
|
|---|
| 196 | This returns NULL if the variable does not exist.
|
|---|
| 197 |
|
|---|
| 198 | If you want to know if this variable (or any other SV) is actually C<defined>,
|
|---|
| 199 | you can call:
|
|---|
| 200 |
|
|---|
| 201 | SvOK(SV*)
|
|---|
| 202 |
|
|---|
| 203 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
|
|---|
| 204 |
|
|---|
| 205 | Its address can be used whenever an C<SV*> is needed. Make sure that
|
|---|
| 206 | you don't try to compare a random sv with C<&PL_sv_undef>. For example
|
|---|
| 207 | when interfacing Perl code, it'll work correctly for:
|
|---|
| 208 |
|
|---|
| 209 | foo(undef);
|
|---|
| 210 |
|
|---|
| 211 | But won't work when called as:
|
|---|
| 212 |
|
|---|
| 213 | $x = undef;
|
|---|
| 214 | foo($x);
|
|---|
| 215 |
|
|---|
| 216 | So to repeat always use SvOK() to check whether an sv is defined.
|
|---|
| 217 |
|
|---|
| 218 | Also you have to be careful when using C<&PL_sv_undef> as a value in
|
|---|
| 219 | AVs or HVs (see L<AVs, HVs and undefined values>).
|
|---|
| 220 |
|
|---|
| 221 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
|
|---|
| 222 | boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
|
|---|
| 223 | addresses can be used whenever an C<SV*> is needed.
|
|---|
| 224 |
|
|---|
| 225 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
|
|---|
| 226 | Take this code:
|
|---|
| 227 |
|
|---|
| 228 | SV* sv = (SV*) 0;
|
|---|
| 229 | if (I-am-to-return-a-real-value) {
|
|---|
| 230 | sv = sv_2mortal(newSViv(42));
|
|---|
| 231 | }
|
|---|
| 232 | sv_setsv(ST(0), sv);
|
|---|
| 233 |
|
|---|
| 234 | This code tries to return a new SV (which contains the value 42) if it should
|
|---|
| 235 | return a real value, or undef otherwise. Instead it has returned a NULL
|
|---|
| 236 | pointer which, somewhere down the line, will cause a segmentation violation,
|
|---|
| 237 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
|
|---|
| 238 | first line and all will be well.
|
|---|
| 239 |
|
|---|
| 240 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
|
|---|
| 241 | call is not necessary (see L<Reference Counts and Mortality>).
|
|---|
| 242 |
|
|---|
| 243 | =head2 Offsets
|
|---|
| 244 |
|
|---|
| 245 | Perl provides the function C<sv_chop> to efficiently remove characters
|
|---|
| 246 | from the beginning of a string; you give it an SV and a pointer to
|
|---|
| 247 | somewhere inside the PV, and it discards everything before the
|
|---|
| 248 | pointer. The efficiency comes by means of a little hack: instead of
|
|---|
| 249 | actually removing the characters, C<sv_chop> sets the flag C<OOK>
|
|---|
| 250 | (offset OK) to signal to other functions that the offset hack is in
|
|---|
| 251 | effect, and it puts the number of bytes chopped off into the IV field
|
|---|
| 252 | of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
|
|---|
| 253 | many bytes, and adjusts C<SvCUR> and C<SvLEN>.
|
|---|
| 254 |
|
|---|
| 255 | Hence, at this point, the start of the buffer that we allocated lives
|
|---|
| 256 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
|
|---|
| 257 | into the middle of this allocated storage.
|
|---|
| 258 |
|
|---|
| 259 | This is best demonstrated by example:
|
|---|
| 260 |
|
|---|
| 261 | % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
|
|---|
| 262 | SV = PVIV(0x8128450) at 0x81340f0
|
|---|
| 263 | REFCNT = 1
|
|---|
| 264 | FLAGS = (POK,OOK,pPOK)
|
|---|
| 265 | IV = 1 (OFFSET)
|
|---|
| 266 | PV = 0x8135781 ( "1" . ) "2345"\0
|
|---|
| 267 | CUR = 4
|
|---|
| 268 | LEN = 5
|
|---|
| 269 |
|
|---|
| 270 | Here the number of bytes chopped off (1) is put into IV, and
|
|---|
| 271 | C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
|
|---|
| 272 | portion of the string between the "real" and the "fake" beginnings is
|
|---|
| 273 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
|
|---|
| 274 | the fake beginning, not the real one.
|
|---|
| 275 |
|
|---|
| 276 | Something similar to the offset hack is performed on AVs to enable
|
|---|
| 277 | efficient shifting and splicing off the beginning of the array; while
|
|---|
| 278 | C<AvARRAY> points to the first element in the array that is visible from
|
|---|
| 279 | Perl, C<AvALLOC> points to the real start of the C array. These are
|
|---|
| 280 | usually the same, but a C<shift> operation can be carried out by
|
|---|
| 281 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
|
|---|
| 282 | Again, the location of the real start of the C array only comes into
|
|---|
| 283 | play when freeing the array. See C<av_shift> in F<av.c>.
|
|---|
| 284 |
|
|---|
| 285 | =head2 What's Really Stored in an SV?
|
|---|
| 286 |
|
|---|
| 287 | Recall that the usual method of determining the type of scalar you have is
|
|---|
| 288 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
|
|---|
| 289 | usually these macros will always return TRUE and calling the C<Sv*V>
|
|---|
| 290 | macros will do the appropriate conversion of string to integer/double or
|
|---|
| 291 | integer/double to string.
|
|---|
| 292 |
|
|---|
| 293 | If you I<really> need to know if you have an integer, double, or string
|
|---|
| 294 | pointer in an SV, you can use the following three macros instead:
|
|---|
| 295 |
|
|---|
| 296 | SvIOKp(SV*)
|
|---|
| 297 | SvNOKp(SV*)
|
|---|
| 298 | SvPOKp(SV*)
|
|---|
| 299 |
|
|---|
| 300 | These will tell you if you truly have an integer, double, or string pointer
|
|---|
| 301 | stored in your SV. The "p" stands for private.
|
|---|
| 302 |
|
|---|
| 303 | The are various ways in which the private and public flags may differ.
|
|---|
| 304 | For example, a tied SV may have a valid underlying value in the IV slot
|
|---|
| 305 | (so SvIOKp is true), but the data should be accessed via the FETCH
|
|---|
| 306 | routine rather than directly, so SvIOK is false. Another is when
|
|---|
| 307 | numeric conversion has occurred and precision has been lost: only the
|
|---|
| 308 | private flag is set on 'lossy' values. So when an NV is converted to an
|
|---|
| 309 | IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
|
|---|
| 310 |
|
|---|
| 311 | In general, though, it's best to use the C<Sv*V> macros.
|
|---|
| 312 |
|
|---|
| 313 | =head2 Working with AVs
|
|---|
| 314 |
|
|---|
| 315 | There are two ways to create and load an AV. The first method creates an
|
|---|
| 316 | empty AV:
|
|---|
| 317 |
|
|---|
| 318 | AV* newAV();
|
|---|
| 319 |
|
|---|
| 320 | The second method both creates the AV and initially populates it with SVs:
|
|---|
| 321 |
|
|---|
| 322 | AV* av_make(I32 num, SV **ptr);
|
|---|
| 323 |
|
|---|
| 324 | The second argument points to an array containing C<num> C<SV*>'s. Once the
|
|---|
| 325 | AV has been created, the SVs can be destroyed, if so desired.
|
|---|
| 326 |
|
|---|
| 327 | Once the AV has been created, the following operations are possible on AVs:
|
|---|
| 328 |
|
|---|
| 329 | void av_push(AV*, SV*);
|
|---|
| 330 | SV* av_pop(AV*);
|
|---|
| 331 | SV* av_shift(AV*);
|
|---|
| 332 | void av_unshift(AV*, I32 num);
|
|---|
| 333 |
|
|---|
| 334 | These should be familiar operations, with the exception of C<av_unshift>.
|
|---|
| 335 | This routine adds C<num> elements at the front of the array with the C<undef>
|
|---|
| 336 | value. You must then use C<av_store> (described below) to assign values
|
|---|
| 337 | to these new elements.
|
|---|
| 338 |
|
|---|
| 339 | Here are some other functions:
|
|---|
| 340 |
|
|---|
| 341 | I32 av_len(AV*);
|
|---|
| 342 | SV** av_fetch(AV*, I32 key, I32 lval);
|
|---|
| 343 | SV** av_store(AV*, I32 key, SV* val);
|
|---|
| 344 |
|
|---|
| 345 | The C<av_len> function returns the highest index value in array (just
|
|---|
| 346 | like $#array in Perl). If the array is empty, -1 is returned. The
|
|---|
| 347 | C<av_fetch> function returns the value at index C<key>, but if C<lval>
|
|---|
| 348 | is non-zero, then C<av_fetch> will store an undef value at that index.
|
|---|
| 349 | The C<av_store> function stores the value C<val> at index C<key>, and does
|
|---|
| 350 | not increment the reference count of C<val>. Thus the caller is responsible
|
|---|
| 351 | for taking care of that, and if C<av_store> returns NULL, the caller will
|
|---|
| 352 | have to decrement the reference count to avoid a memory leak. Note that
|
|---|
| 353 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
|
|---|
| 354 | return value.
|
|---|
| 355 |
|
|---|
| 356 | void av_clear(AV*);
|
|---|
| 357 | void av_undef(AV*);
|
|---|
| 358 | void av_extend(AV*, I32 key);
|
|---|
| 359 |
|
|---|
| 360 | The C<av_clear> function deletes all the elements in the AV* array, but
|
|---|
| 361 | does not actually delete the array itself. The C<av_undef> function will
|
|---|
| 362 | delete all the elements in the array plus the array itself. The
|
|---|
| 363 | C<av_extend> function extends the array so that it contains at least C<key+1>
|
|---|
| 364 | elements. If C<key+1> is less than the currently allocated length of the array,
|
|---|
| 365 | then nothing is done.
|
|---|
| 366 |
|
|---|
| 367 | If you know the name of an array variable, you can get a pointer to its AV
|
|---|
| 368 | by using the following:
|
|---|
| 369 |
|
|---|
| 370 | AV* get_av("package::varname", FALSE);
|
|---|
| 371 |
|
|---|
| 372 | This returns NULL if the variable does not exist.
|
|---|
| 373 |
|
|---|
| 374 | See L<Understanding the Magic of Tied Hashes and Arrays> for more
|
|---|
| 375 | information on how to use the array access functions on tied arrays.
|
|---|
| 376 |
|
|---|
| 377 | =head2 Working with HVs
|
|---|
| 378 |
|
|---|
| 379 | To create an HV, you use the following routine:
|
|---|
| 380 |
|
|---|
| 381 | HV* newHV();
|
|---|
| 382 |
|
|---|
| 383 | Once the HV has been created, the following operations are possible on HVs:
|
|---|
| 384 |
|
|---|
| 385 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
|
|---|
| 386 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
|
|---|
| 387 |
|
|---|
| 388 | The C<klen> parameter is the length of the key being passed in (Note that
|
|---|
| 389 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
|
|---|
| 390 | length of the key). The C<val> argument contains the SV pointer to the
|
|---|
| 391 | scalar being stored, and C<hash> is the precomputed hash value (zero if
|
|---|
| 392 | you want C<hv_store> to calculate it for you). The C<lval> parameter
|
|---|
| 393 | indicates whether this fetch is actually a part of a store operation, in
|
|---|
| 394 | which case a new undefined value will be added to the HV with the supplied
|
|---|
| 395 | key and C<hv_fetch> will return as if the value had already existed.
|
|---|
| 396 |
|
|---|
| 397 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
|
|---|
| 398 | C<SV*>. To access the scalar value, you must first dereference the return
|
|---|
| 399 | value. However, you should check to make sure that the return value is
|
|---|
| 400 | not NULL before dereferencing it.
|
|---|
| 401 |
|
|---|
| 402 | These two functions check if a hash table entry exists, and deletes it.
|
|---|
| 403 |
|
|---|
| 404 | bool hv_exists(HV*, const char* key, U32 klen);
|
|---|
| 405 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
|
|---|
| 406 |
|
|---|
| 407 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
|
|---|
| 408 | create and return a mortal copy of the deleted value.
|
|---|
| 409 |
|
|---|
| 410 | And more miscellaneous functions:
|
|---|
| 411 |
|
|---|
| 412 | void hv_clear(HV*);
|
|---|
| 413 | void hv_undef(HV*);
|
|---|
| 414 |
|
|---|
| 415 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
|
|---|
| 416 | table but does not actually delete the hash table. The C<hv_undef> deletes
|
|---|
| 417 | both the entries and the hash table itself.
|
|---|
| 418 |
|
|---|
| 419 | Perl keeps the actual data in linked list of structures with a typedef of HE.
|
|---|
| 420 | These contain the actual key and value pointers (plus extra administrative
|
|---|
| 421 | overhead). The key is a string pointer; the value is an C<SV*>. However,
|
|---|
| 422 | once you have an C<HE*>, to get the actual key and value, use the routines
|
|---|
| 423 | specified below.
|
|---|
| 424 |
|
|---|
| 425 | I32 hv_iterinit(HV*);
|
|---|
| 426 | /* Prepares starting point to traverse hash table */
|
|---|
| 427 | HE* hv_iternext(HV*);
|
|---|
| 428 | /* Get the next entry, and return a pointer to a
|
|---|
| 429 | structure that has both the key and value */
|
|---|
| 430 | char* hv_iterkey(HE* entry, I32* retlen);
|
|---|
| 431 | /* Get the key from an HE structure and also return
|
|---|
| 432 | the length of the key string */
|
|---|
| 433 | SV* hv_iterval(HV*, HE* entry);
|
|---|
| 434 | /* Return an SV pointer to the value of the HE
|
|---|
| 435 | structure */
|
|---|
| 436 | SV* hv_iternextsv(HV*, char** key, I32* retlen);
|
|---|
| 437 | /* This convenience routine combines hv_iternext,
|
|---|
| 438 | hv_iterkey, and hv_iterval. The key and retlen
|
|---|
| 439 | arguments are return values for the key and its
|
|---|
| 440 | length. The value is returned in the SV* argument */
|
|---|
| 441 |
|
|---|
| 442 | If you know the name of a hash variable, you can get a pointer to its HV
|
|---|
| 443 | by using the following:
|
|---|
| 444 |
|
|---|
| 445 | HV* get_hv("package::varname", FALSE);
|
|---|
| 446 |
|
|---|
| 447 | This returns NULL if the variable does not exist.
|
|---|
| 448 |
|
|---|
| 449 | The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
|
|---|
| 450 |
|
|---|
| 451 | hash = 0;
|
|---|
| 452 | while (klen--)
|
|---|
| 453 | hash = (hash * 33) + *key++;
|
|---|
| 454 | hash = hash + (hash >> 5); /* after 5.6 */
|
|---|
| 455 |
|
|---|
| 456 | The last step was added in version 5.6 to improve distribution of
|
|---|
| 457 | lower bits in the resulting hash value.
|
|---|
| 458 |
|
|---|
| 459 | See L<Understanding the Magic of Tied Hashes and Arrays> for more
|
|---|
| 460 | information on how to use the hash access functions on tied hashes.
|
|---|
| 461 |
|
|---|
| 462 | =head2 Hash API Extensions
|
|---|
| 463 |
|
|---|
| 464 | Beginning with version 5.004, the following functions are also supported:
|
|---|
| 465 |
|
|---|
| 466 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
|
|---|
| 467 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
|
|---|
| 468 |
|
|---|
| 469 | bool hv_exists_ent (HV* tb, SV* key, U32 hash);
|
|---|
| 470 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
|
|---|
| 471 |
|
|---|
| 472 | SV* hv_iterkeysv (HE* entry);
|
|---|
| 473 |
|
|---|
| 474 | Note that these functions take C<SV*> keys, which simplifies writing
|
|---|
| 475 | of extension code that deals with hash structures. These functions
|
|---|
| 476 | also allow passing of C<SV*> keys to C<tie> functions without forcing
|
|---|
| 477 | you to stringify the keys (unlike the previous set of functions).
|
|---|
| 478 |
|
|---|
| 479 | They also return and accept whole hash entries (C<HE*>), making their
|
|---|
| 480 | use more efficient (since the hash number for a particular string
|
|---|
| 481 | doesn't have to be recomputed every time). See L<perlapi> for detailed
|
|---|
| 482 | descriptions.
|
|---|
| 483 |
|
|---|
| 484 | The following macros must always be used to access the contents of hash
|
|---|
| 485 | entries. Note that the arguments to these macros must be simple
|
|---|
| 486 | variables, since they may get evaluated more than once. See
|
|---|
| 487 | L<perlapi> for detailed descriptions of these macros.
|
|---|
| 488 |
|
|---|
| 489 | HePV(HE* he, STRLEN len)
|
|---|
| 490 | HeVAL(HE* he)
|
|---|
| 491 | HeHASH(HE* he)
|
|---|
| 492 | HeSVKEY(HE* he)
|
|---|
| 493 | HeSVKEY_force(HE* he)
|
|---|
| 494 | HeSVKEY_set(HE* he, SV* sv)
|
|---|
| 495 |
|
|---|
| 496 | These two lower level macros are defined, but must only be used when
|
|---|
| 497 | dealing with keys that are not C<SV*>s:
|
|---|
| 498 |
|
|---|
| 499 | HeKEY(HE* he)
|
|---|
| 500 | HeKLEN(HE* he)
|
|---|
| 501 |
|
|---|
| 502 | Note that both C<hv_store> and C<hv_store_ent> do not increment the
|
|---|
| 503 | reference count of the stored C<val>, which is the caller's responsibility.
|
|---|
| 504 | If these functions return a NULL value, the caller will usually have to
|
|---|
| 505 | decrement the reference count of C<val> to avoid a memory leak.
|
|---|
| 506 |
|
|---|
| 507 | =head2 AVs, HVs and undefined values
|
|---|
| 508 |
|
|---|
| 509 | Sometimes you have to store undefined values in AVs or HVs. Although
|
|---|
| 510 | this may be a rare case, it can be tricky. That's because you're
|
|---|
| 511 | used to using C<&PL_sv_undef> if you need an undefined SV.
|
|---|
| 512 |
|
|---|
| 513 | For example, intuition tells you that this XS code:
|
|---|
| 514 |
|
|---|
| 515 | AV *av = newAV();
|
|---|
| 516 | av_store( av, 0, &PL_sv_undef );
|
|---|
| 517 |
|
|---|
| 518 | is equivalent to this Perl code:
|
|---|
| 519 |
|
|---|
| 520 | my @av;
|
|---|
| 521 | $av[0] = undef;
|
|---|
| 522 |
|
|---|
| 523 | Unfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker
|
|---|
| 524 | for indicating that an array element has not yet been initialized.
|
|---|
| 525 | Thus, C<exists $av[0]> would be true for the above Perl code, but
|
|---|
| 526 | false for the array generated by the XS code.
|
|---|
| 527 |
|
|---|
| 528 | Other problems can occur when storing C<&PL_sv_undef> in HVs:
|
|---|
| 529 |
|
|---|
| 530 | hv_store( hv, "key", 3, &PL_sv_undef, 0 );
|
|---|
| 531 |
|
|---|
| 532 | This will indeed make the value C<undef>, but if you try to modify
|
|---|
| 533 | the value of C<key>, you'll get the following error:
|
|---|
| 534 |
|
|---|
| 535 | Modification of non-creatable hash value attempted
|
|---|
| 536 |
|
|---|
| 537 | In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
|
|---|
| 538 | in restricted hashes. This caused such hash entries not to appear
|
|---|
| 539 | when iterating over the hash or when checking for the keys
|
|---|
| 540 | with the C<hv_exists> function.
|
|---|
| 541 |
|
|---|
| 542 | You can run into similar problems when you store C<&PL_sv_true> or
|
|---|
| 543 | C<&PL_sv_false> into AVs or HVs. Trying to modify such elements
|
|---|
| 544 | will give you the following error:
|
|---|
| 545 |
|
|---|
| 546 | Modification of a read-only value attempted
|
|---|
| 547 |
|
|---|
| 548 | To make a long story short, you can use the special variables
|
|---|
| 549 | C<&PL_sv_undef>, C<&PL_sv_true> and C<&PL_sv_false> with AVs and
|
|---|
| 550 | HVs, but you have to make sure you know what you're doing.
|
|---|
| 551 |
|
|---|
| 552 | Generally, if you want to store an undefined value in an AV
|
|---|
| 553 | or HV, you should not use C<&PL_sv_undef>, but rather create a
|
|---|
| 554 | new undefined value using the C<newSV> function, for example:
|
|---|
| 555 |
|
|---|
| 556 | av_store( av, 42, newSV(0) );
|
|---|
| 557 | hv_store( hv, "foo", 3, newSV(0), 0 );
|
|---|
| 558 |
|
|---|
| 559 | =head2 References
|
|---|
| 560 |
|
|---|
| 561 | References are a special type of scalar that point to other data types
|
|---|
| 562 | (including references).
|
|---|
| 563 |
|
|---|
| 564 | To create a reference, use either of the following functions:
|
|---|
| 565 |
|
|---|
| 566 | SV* newRV_inc((SV*) thing);
|
|---|
| 567 | SV* newRV_noinc((SV*) thing);
|
|---|
| 568 |
|
|---|
| 569 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The
|
|---|
| 570 | functions are identical except that C<newRV_inc> increments the reference
|
|---|
| 571 | count of the C<thing>, while C<newRV_noinc> does not. For historical
|
|---|
| 572 | reasons, C<newRV> is a synonym for C<newRV_inc>.
|
|---|
| 573 |
|
|---|
| 574 | Once you have a reference, you can use the following macro to dereference
|
|---|
| 575 | the reference:
|
|---|
| 576 |
|
|---|
| 577 | SvRV(SV*)
|
|---|
| 578 |
|
|---|
| 579 | then call the appropriate routines, casting the returned C<SV*> to either an
|
|---|
| 580 | C<AV*> or C<HV*>, if required.
|
|---|
| 581 |
|
|---|
| 582 | To determine if an SV is a reference, you can use the following macro:
|
|---|
| 583 |
|
|---|
| 584 | SvROK(SV*)
|
|---|
| 585 |
|
|---|
| 586 | To discover what type of value the reference refers to, use the following
|
|---|
| 587 | macro and then check the return value.
|
|---|
| 588 |
|
|---|
| 589 | SvTYPE(SvRV(SV*))
|
|---|
| 590 |
|
|---|
| 591 | The most useful types that will be returned are:
|
|---|
| 592 |
|
|---|
| 593 | SVt_IV Scalar
|
|---|
| 594 | SVt_NV Scalar
|
|---|
| 595 | SVt_PV Scalar
|
|---|
| 596 | SVt_RV Scalar
|
|---|
| 597 | SVt_PVAV Array
|
|---|
| 598 | SVt_PVHV Hash
|
|---|
| 599 | SVt_PVCV Code
|
|---|
| 600 | SVt_PVGV Glob (possible a file handle)
|
|---|
| 601 | SVt_PVMG Blessed or Magical Scalar
|
|---|
| 602 |
|
|---|
| 603 | See the sv.h header file for more details.
|
|---|
| 604 |
|
|---|
| 605 | =head2 Blessed References and Class Objects
|
|---|
| 606 |
|
|---|
| 607 | References are also used to support object-oriented programming. In perl's
|
|---|
| 608 | OO lexicon, an object is simply a reference that has been blessed into a
|
|---|
| 609 | package (or class). Once blessed, the programmer may now use the reference
|
|---|
| 610 | to access the various methods in the class.
|
|---|
| 611 |
|
|---|
| 612 | A reference can be blessed into a package with the following function:
|
|---|
| 613 |
|
|---|
| 614 | SV* sv_bless(SV* sv, HV* stash);
|
|---|
| 615 |
|
|---|
| 616 | The C<sv> argument must be a reference value. The C<stash> argument
|
|---|
| 617 | specifies which class the reference will belong to. See
|
|---|
| 618 | L<Stashes and Globs> for information on converting class names into stashes.
|
|---|
| 619 |
|
|---|
| 620 | /* Still under construction */
|
|---|
| 621 |
|
|---|
| 622 | Upgrades rv to reference if not already one. Creates new SV for rv to
|
|---|
| 623 | point to. If C<classname> is non-null, the SV is blessed into the specified
|
|---|
| 624 | class. SV is returned.
|
|---|
| 625 |
|
|---|
| 626 | SV* newSVrv(SV* rv, const char* classname);
|
|---|
| 627 |
|
|---|
| 628 | Copies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed
|
|---|
| 629 | if C<classname> is non-null.
|
|---|
| 630 |
|
|---|
| 631 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
|
|---|
| 632 | SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
|
|---|
| 633 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
|
|---|
| 634 |
|
|---|
| 635 | Copies the pointer value (I<the address, not the string!>) into an SV whose
|
|---|
| 636 | reference is rv. SV is blessed if C<classname> is non-null.
|
|---|
| 637 |
|
|---|
| 638 | SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
|
|---|
| 639 |
|
|---|
| 640 | Copies string into an SV whose reference is C<rv>. Set length to 0 to let
|
|---|
| 641 | Perl calculate the string length. SV is blessed if C<classname> is non-null.
|
|---|
| 642 |
|
|---|
| 643 | SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
|
|---|
| 644 |
|
|---|
| 645 | Tests whether the SV is blessed into the specified class. It does not
|
|---|
| 646 | check inheritance relationships.
|
|---|
| 647 |
|
|---|
| 648 | int sv_isa(SV* sv, const char* name);
|
|---|
| 649 |
|
|---|
| 650 | Tests whether the SV is a reference to a blessed object.
|
|---|
| 651 |
|
|---|
| 652 | int sv_isobject(SV* sv);
|
|---|
| 653 |
|
|---|
| 654 | Tests whether the SV is derived from the specified class. SV can be either
|
|---|
| 655 | a reference to a blessed object or a string containing a class name. This
|
|---|
| 656 | is the function implementing the C<UNIVERSAL::isa> functionality.
|
|---|
| 657 |
|
|---|
| 658 | bool sv_derived_from(SV* sv, const char* name);
|
|---|
| 659 |
|
|---|
| 660 | To check if you've got an object derived from a specific class you have
|
|---|
| 661 | to write:
|
|---|
| 662 |
|
|---|
| 663 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
|
|---|
| 664 |
|
|---|
| 665 | =head2 Creating New Variables
|
|---|
| 666 |
|
|---|
| 667 | To create a new Perl variable with an undef value which can be accessed from
|
|---|
| 668 | your Perl script, use the following routines, depending on the variable type.
|
|---|
| 669 |
|
|---|
| 670 | SV* get_sv("package::varname", TRUE);
|
|---|
| 671 | AV* get_av("package::varname", TRUE);
|
|---|
| 672 | HV* get_hv("package::varname", TRUE);
|
|---|
| 673 |
|
|---|
| 674 | Notice the use of TRUE as the second parameter. The new variable can now
|
|---|
| 675 | be set, using the routines appropriate to the data type.
|
|---|
| 676 |
|
|---|
| 677 | There are additional macros whose values may be bitwise OR'ed with the
|
|---|
| 678 | C<TRUE> argument to enable certain extra features. Those bits are:
|
|---|
| 679 |
|
|---|
| 680 | =over
|
|---|
| 681 |
|
|---|
| 682 | =item GV_ADDMULTI
|
|---|
| 683 |
|
|---|
| 684 | Marks the variable as multiply defined, thus preventing the:
|
|---|
| 685 |
|
|---|
| 686 | Name <varname> used only once: possible typo
|
|---|
| 687 |
|
|---|
| 688 | warning.
|
|---|
| 689 |
|
|---|
| 690 | =item GV_ADDWARN
|
|---|
| 691 |
|
|---|
| 692 | Issues the warning:
|
|---|
| 693 |
|
|---|
| 694 | Had to create <varname> unexpectedly
|
|---|
| 695 |
|
|---|
| 696 | if the variable did not exist before the function was called.
|
|---|
| 697 |
|
|---|
| 698 | =back
|
|---|
| 699 |
|
|---|
| 700 | If you do not specify a package name, the variable is created in the current
|
|---|
| 701 | package.
|
|---|
| 702 |
|
|---|
| 703 | =head2 Reference Counts and Mortality
|
|---|
| 704 |
|
|---|
| 705 | Perl uses a reference count-driven garbage collection mechanism. SVs,
|
|---|
| 706 | AVs, or HVs (xV for short in the following) start their life with a
|
|---|
| 707 | reference count of 1. If the reference count of an xV ever drops to 0,
|
|---|
| 708 | then it will be destroyed and its memory made available for reuse.
|
|---|
| 709 |
|
|---|
| 710 | This normally doesn't happen at the Perl level unless a variable is
|
|---|
| 711 | undef'ed or the last variable holding a reference to it is changed or
|
|---|
| 712 | overwritten. At the internal level, however, reference counts can be
|
|---|
| 713 | manipulated with the following macros:
|
|---|
| 714 |
|
|---|
| 715 | int SvREFCNT(SV* sv);
|
|---|
| 716 | SV* SvREFCNT_inc(SV* sv);
|
|---|
| 717 | void SvREFCNT_dec(SV* sv);
|
|---|
| 718 |
|
|---|
| 719 | However, there is one other function which manipulates the reference
|
|---|
| 720 | count of its argument. The C<newRV_inc> function, you will recall,
|
|---|
| 721 | creates a reference to the specified argument. As a side effect,
|
|---|
| 722 | it increments the argument's reference count. If this is not what
|
|---|
| 723 | you want, use C<newRV_noinc> instead.
|
|---|
| 724 |
|
|---|
| 725 | For example, imagine you want to return a reference from an XSUB function.
|
|---|
| 726 | Inside the XSUB routine, you create an SV which initially has a reference
|
|---|
| 727 | count of one. Then you call C<newRV_inc>, passing it the just-created SV.
|
|---|
| 728 | This returns the reference as a new SV, but the reference count of the
|
|---|
| 729 | SV you passed to C<newRV_inc> has been incremented to two. Now you
|
|---|
| 730 | return the reference from the XSUB routine and forget about the SV.
|
|---|
| 731 | But Perl hasn't! Whenever the returned reference is destroyed, the
|
|---|
| 732 | reference count of the original SV is decreased to one and nothing happens.
|
|---|
| 733 | The SV will hang around without any way to access it until Perl itself
|
|---|
| 734 | terminates. This is a memory leak.
|
|---|
| 735 |
|
|---|
| 736 | The correct procedure, then, is to use C<newRV_noinc> instead of
|
|---|
| 737 | C<newRV_inc>. Then, if and when the last reference is destroyed,
|
|---|
| 738 | the reference count of the SV will go to zero and it will be destroyed,
|
|---|
| 739 | stopping any memory leak.
|
|---|
| 740 |
|
|---|
| 741 | There are some convenience functions available that can help with the
|
|---|
| 742 | destruction of xVs. These functions introduce the concept of "mortality".
|
|---|
| 743 | An xV that is mortal has had its reference count marked to be decremented,
|
|---|
| 744 | but not actually decremented, until "a short time later". Generally the
|
|---|
| 745 | term "short time later" means a single Perl statement, such as a call to
|
|---|
| 746 | an XSUB function. The actual determinant for when mortal xVs have their
|
|---|
| 747 | reference count decremented depends on two macros, SAVETMPS and FREETMPS.
|
|---|
| 748 | See L<perlcall> and L<perlxs> for more details on these macros.
|
|---|
| 749 |
|
|---|
| 750 | "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
|
|---|
| 751 | However, if you mortalize a variable twice, the reference count will
|
|---|
| 752 | later be decremented twice.
|
|---|
| 753 |
|
|---|
| 754 | "Mortal" SVs are mainly used for SVs that are placed on perl's stack.
|
|---|
| 755 | For example an SV which is created just to pass a number to a called sub
|
|---|
| 756 | is made mortal to have it cleaned up automatically when it's popped off
|
|---|
| 757 | the stack. Similarly, results returned by XSUBs (which are pushed on the
|
|---|
| 758 | stack) are often made mortal.
|
|---|
| 759 |
|
|---|
| 760 | To create a mortal variable, use the functions:
|
|---|
| 761 |
|
|---|
| 762 | SV* sv_newmortal()
|
|---|
| 763 | SV* sv_2mortal(SV*)
|
|---|
| 764 | SV* sv_mortalcopy(SV*)
|
|---|
| 765 |
|
|---|
| 766 | The first call creates a mortal SV (with no value), the second converts an existing
|
|---|
| 767 | SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
|
|---|
| 768 | third creates a mortal copy of an existing SV.
|
|---|
| 769 | Because C<sv_newmortal> gives the new SV no value,it must normally be given one
|
|---|
| 770 | via C<sv_setpv>, C<sv_setiv>, etc. :
|
|---|
| 771 |
|
|---|
| 772 | SV *tmp = sv_newmortal();
|
|---|
| 773 | sv_setiv(tmp, an_integer);
|
|---|
| 774 |
|
|---|
| 775 | As that is multiple C statements it is quite common so see this idiom instead:
|
|---|
| 776 |
|
|---|
| 777 | SV *tmp = sv_2mortal(newSViv(an_integer));
|
|---|
| 778 |
|
|---|
| 779 |
|
|---|
| 780 | You should be careful about creating mortal variables. Strange things
|
|---|
| 781 | can happen if you make the same value mortal within multiple contexts,
|
|---|
| 782 | or if you make a variable mortal multiple times. Thinking of "Mortalization"
|
|---|
| 783 | as deferred C<SvREFCNT_dec> should help to minimize such problems.
|
|---|
| 784 | For example if you are passing an SV which you I<know> has high enough REFCNT
|
|---|
| 785 | to survive its use on the stack you need not do any mortalization.
|
|---|
| 786 | If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
|
|---|
| 787 | making a C<sv_mortalcopy> is safer.
|
|---|
| 788 |
|
|---|
| 789 | The mortal routines are not just for SVs -- AVs and HVs can be
|
|---|
| 790 | made mortal by passing their address (type-casted to C<SV*>) to the
|
|---|
| 791 | C<sv_2mortal> or C<sv_mortalcopy> routines.
|
|---|
| 792 |
|
|---|
| 793 | =head2 Stashes and Globs
|
|---|
| 794 |
|
|---|
| 795 | A B<stash> is a hash that contains all variables that are defined
|
|---|
| 796 | within a package. Each key of the stash is a symbol
|
|---|
| 797 | name (shared by all the different types of objects that have the same
|
|---|
|
|---|