| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlpacktut - tutorial on C<pack> and C<unpack>
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | C<pack> and C<unpack> are two functions for transforming data according
|
|---|
| 8 | to a user-defined template, between the guarded way Perl stores values
|
|---|
| 9 | and some well-defined representation as might be required in the
|
|---|
| 10 | environment of a Perl program. Unfortunately, they're also two of
|
|---|
| 11 | the most misunderstood and most often overlooked functions that Perl
|
|---|
| 12 | provides. This tutorial will demystify them for you.
|
|---|
| 13 |
|
|---|
| 14 |
|
|---|
| 15 | =head1 The Basic Principle
|
|---|
| 16 |
|
|---|
| 17 | Most programming languages don't shelter the memory where variables are
|
|---|
| 18 | stored. In C, for instance, you can take the address of some variable,
|
|---|
| 19 | and the C<sizeof> operator tells you how many bytes are allocated to
|
|---|
| 20 | the variable. Using the address and the size, you may access the storage
|
|---|
| 21 | to your heart's content.
|
|---|
| 22 |
|
|---|
| 23 | In Perl, you just can't access memory at random, but the structural and
|
|---|
| 24 | representational conversion provided by C<pack> and C<unpack> is an
|
|---|
| 25 | excellent alternative. The C<pack> function converts values to a byte
|
|---|
| 26 | sequence containing representations according to a given specification,
|
|---|
| 27 | the so-called "template" argument. C<unpack> is the reverse process,
|
|---|
| 28 | deriving some values from the contents of a string of bytes. (Be cautioned,
|
|---|
| 29 | however, that not all that has been packed together can be neatly unpacked -
|
|---|
| 30 | a very common experience as seasoned travellers are likely to confirm.)
|
|---|
| 31 |
|
|---|
| 32 | Why, you may ask, would you need a chunk of memory containing some values
|
|---|
| 33 | in binary representation? One good reason is input and output accessing
|
|---|
| 34 | some file, a device, or a network connection, whereby this binary
|
|---|
| 35 | representation is either forced on you or will give you some benefit
|
|---|
| 36 | in processing. Another cause is passing data to some system call that
|
|---|
| 37 | is not available as a Perl function: C<syscall> requires you to provide
|
|---|
| 38 | parameters stored in the way it happens in a C program. Even text processing
|
|---|
| 39 | (as shown in the next section) may be simplified with judicious usage
|
|---|
| 40 | of these two functions.
|
|---|
| 41 |
|
|---|
| 42 | To see how (un)packing works, we'll start with a simple template
|
|---|
| 43 | code where the conversion is in low gear: between the contents of a byte
|
|---|
| 44 | sequence and a string of hexadecimal digits. Let's use C<unpack>, since
|
|---|
| 45 | this is likely to remind you of a dump program, or some desperate last
|
|---|
| 46 | message unfortunate programs are wont to throw at you before they expire
|
|---|
| 47 | into the wild blue yonder. Assuming that the variable C<$mem> holds a
|
|---|
| 48 | sequence of bytes that we'd like to inspect without assuming anything
|
|---|
| 49 | about its meaning, we can write
|
|---|
| 50 |
|
|---|
| 51 | my( $hex ) = unpack( 'H*', $mem );
|
|---|
| 52 | print "$hex\n";
|
|---|
| 53 |
|
|---|
| 54 | whereupon we might see something like this, with each pair of hex digits
|
|---|
| 55 | corresponding to a byte:
|
|---|
| 56 |
|
|---|
| 57 | 41204d414e204120504c414e20412043414e414c2050414e414d41
|
|---|
| 58 |
|
|---|
| 59 | What was in this chunk of memory? Numbers, characters, or a mixture of
|
|---|
| 60 | both? Assuming that we're on a computer where ASCII (or some similar)
|
|---|
| 61 | encoding is used: hexadecimal values in the range C<0x40> - C<0x5A>
|
|---|
| 62 | indicate an uppercase letter, and C<0x20> encodes a space. So we might
|
|---|
| 63 | assume it is a piece of text, which some are able to read like a tabloid;
|
|---|
| 64 | but others will have to get hold of an ASCII table and relive that
|
|---|
| 65 | firstgrader feeling. Not caring too much about which way to read this,
|
|---|
| 66 | we note that C<unpack> with the template code C<H> converts the contents
|
|---|
| 67 | of a sequence of bytes into the customary hexadecimal notation. Since
|
|---|
| 68 | "a sequence of" is a pretty vague indication of quantity, C<H> has been
|
|---|
| 69 | defined to convert just a single hexadecimal digit unless it is followed
|
|---|
| 70 | by a repeat count. An asterisk for the repeat count means to use whatever
|
|---|
| 71 | remains.
|
|---|
| 72 |
|
|---|
| 73 | The inverse operation - packing byte contents from a string of hexadecimal
|
|---|
| 74 | digits - is just as easily written. For instance:
|
|---|
| 75 |
|
|---|
| 76 | my $s = pack( 'H2' x 10, map { "3$_" } ( 0..9 ) );
|
|---|
| 77 | print "$s\n";
|
|---|
| 78 |
|
|---|
| 79 | Since we feed a list of ten 2-digit hexadecimal strings to C<pack>, the
|
|---|
| 80 | pack template should contain ten pack codes. If this is run on a computer
|
|---|
| 81 | with ASCII character coding, it will print C<0123456789>.
|
|---|
| 82 |
|
|---|
| 83 |
|
|---|
| 84 | =head1 Packing Text
|
|---|
| 85 |
|
|---|
| 86 | Let's suppose you've got to read in a data file like this:
|
|---|
| 87 |
|
|---|
| 88 | Date |Description | Income|Expenditure
|
|---|
| 89 | 01/24/2001 Ahmed's Camel Emporium 1147.99
|
|---|
| 90 | 01/28/2001 Flea spray 24.99
|
|---|
| 91 | 01/29/2001 Camel rides to tourists 235.00
|
|---|
| 92 |
|
|---|
| 93 | How do we do it? You might think first to use C<split>; however, since
|
|---|
| 94 | C<split> collapses blank fields, you'll never know whether a record was
|
|---|
| 95 | income or expenditure. Oops. Well, you could always use C<substr>:
|
|---|
| 96 |
|
|---|
| 97 | while (<>) {
|
|---|
| 98 | my $date = substr($_, 0, 11);
|
|---|
| 99 | my $desc = substr($_, 12, 27);
|
|---|
| 100 | my $income = substr($_, 40, 7);
|
|---|
| 101 | my $expend = substr($_, 52, 7);
|
|---|
| 102 | ...
|
|---|
| 103 | }
|
|---|
| 104 |
|
|---|
| 105 | It's not really a barrel of laughs, is it? In fact, it's worse than it
|
|---|
| 106 | may seem; the eagle-eyed may notice that the first field should only be
|
|---|
| 107 | 10 characters wide, and the error has propagated right through the other
|
|---|
| 108 | numbers - which we've had to count by hand. So it's error-prone as well
|
|---|
| 109 | as horribly unfriendly.
|
|---|
| 110 |
|
|---|
| 111 | Or maybe we could use regular expressions:
|
|---|
| 112 |
|
|---|
| 113 | while (<>) {
|
|---|
| 114 | my($date, $desc, $income, $expend) =
|
|---|
| 115 | m|(\d\d/\d\d/\d{4}) (.{27}) (.{7})(.*)|;
|
|---|
| 116 | ...
|
|---|
| 117 | }
|
|---|
| 118 |
|
|---|
| 119 | Urgh. Well, it's a bit better, but - well, would you want to maintain
|
|---|
| 120 | that?
|
|---|
| 121 |
|
|---|
| 122 | Hey, isn't Perl supposed to make this sort of thing easy? Well, it does,
|
|---|
| 123 | if you use the right tools. C<pack> and C<unpack> are designed to help
|
|---|
| 124 | you out when dealing with fixed-width data like the above. Let's have a
|
|---|
| 125 | look at a solution with C<unpack>:
|
|---|
| 126 |
|
|---|
| 127 | while (<>) {
|
|---|
| 128 | my($date, $desc, $income, $expend) = unpack("A10xA27xA7A*", $_);
|
|---|
| 129 | ...
|
|---|
| 130 | }
|
|---|
| 131 |
|
|---|
| 132 | That looks a bit nicer; but we've got to take apart that weird template.
|
|---|
| 133 | Where did I pull that out of?
|
|---|
| 134 |
|
|---|
| 135 | OK, let's have a look at some of our data again; in fact, we'll include
|
|---|
| 136 | the headers, and a handy ruler so we can keep track of where we are.
|
|---|
| 137 |
|
|---|
| 138 | 1 2 3 4 5
|
|---|
| 139 | 1234567890123456789012345678901234567890123456789012345678
|
|---|
| 140 | Date |Description | Income|Expenditure
|
|---|
| 141 | 01/28/2001 Flea spray 24.99
|
|---|
| 142 | 01/29/2001 Camel rides to tourists 235.00
|
|---|
| 143 |
|
|---|
| 144 | From this, we can see that the date column stretches from column 1 to
|
|---|
| 145 | column 10 - ten characters wide. The C<pack>-ese for "character" is
|
|---|
| 146 | C<A>, and ten of them are C<A10>. So if we just wanted to extract the
|
|---|
| 147 | dates, we could say this:
|
|---|
| 148 |
|
|---|
| 149 | my($date) = unpack("A10", $_);
|
|---|
| 150 |
|
|---|
| 151 | OK, what's next? Between the date and the description is a blank column;
|
|---|
| 152 | we want to skip over that. The C<x> template means "skip forward", so we
|
|---|
| 153 | want one of those. Next, we have another batch of characters, from 12 to
|
|---|
| 154 | 38. That's 27 more characters, hence C<A27>. (Don't make the fencepost
|
|---|
| 155 | error - there are 27 characters between 12 and 38, not 26. Count 'em!)
|
|---|
| 156 |
|
|---|
| 157 | Now we skip another character and pick up the next 7 characters:
|
|---|
| 158 |
|
|---|
| 159 | my($date,$description,$income) = unpack("A10xA27xA7", $_);
|
|---|
| 160 |
|
|---|
| 161 | Now comes the clever bit. Lines in our ledger which are just income and
|
|---|
| 162 | not expenditure might end at column 46. Hence, we don't want to tell our
|
|---|
| 163 | C<unpack> pattern that we B<need> to find another 12 characters; we'll
|
|---|
| 164 | just say "if there's anything left, take it". As you might guess from
|
|---|
| 165 | regular expressions, that's what the C<*> means: "use everything
|
|---|
| 166 | remaining".
|
|---|
| 167 |
|
|---|
| 168 | =over 3
|
|---|
| 169 |
|
|---|
| 170 | =item *
|
|---|
| 171 |
|
|---|
| 172 | Be warned, though, that unlike regular expressions, if the C<unpack>
|
|---|
| 173 | template doesn't match the incoming data, Perl will scream and die.
|
|---|
| 174 |
|
|---|
| 175 | =back
|
|---|
| 176 |
|
|---|
| 177 |
|
|---|
| 178 | Hence, putting it all together:
|
|---|
| 179 |
|
|---|
| 180 | my($date,$description,$income,$expend) = unpack("A10xA27xA7xA*", $_);
|
|---|
| 181 |
|
|---|
| 182 | Now, that's our data parsed. I suppose what we might want to do now is
|
|---|
| 183 | total up our income and expenditure, and add another line to the end of
|
|---|
| 184 | our ledger - in the same format - saying how much we've brought in and
|
|---|
| 185 | how much we've spent:
|
|---|
| 186 |
|
|---|
| 187 | while (<>) {
|
|---|
| 188 | my($date, $desc, $income, $expend) = unpack("A10xA27xA7xA*", $_);
|
|---|
| 189 | $tot_income += $income;
|
|---|
| 190 | $tot_expend += $expend;
|
|---|
| 191 | }
|
|---|
| 192 |
|
|---|
| 193 | $tot_income = sprintf("%.2f", $tot_income); # Get them into
|
|---|
| 194 | $tot_expend = sprintf("%.2f", $tot_expend); # "financial" format
|
|---|
| 195 |
|
|---|
| 196 | $date = POSIX::strftime("%m/%d/%Y", localtime);
|
|---|
| 197 |
|
|---|
| 198 | # OK, let's go:
|
|---|
| 199 |
|
|---|
| 200 | print pack("A10xA27xA7xA*", $date, "Totals", $tot_income, $tot_expend);
|
|---|
| 201 |
|
|---|
| 202 | Oh, hmm. That didn't quite work. Let's see what happened:
|
|---|
| 203 |
|
|---|
| 204 | 01/24/2001 Ahmed's Camel Emporium 1147.99
|
|---|
| 205 | 01/28/2001 Flea spray 24.99
|
|---|
| 206 | 01/29/2001 Camel rides to tourists 1235.00
|
|---|
| 207 | 03/23/2001Totals 1235.001172.98
|
|---|
| 208 |
|
|---|
| 209 | OK, it's a start, but what happened to the spaces? We put C<x>, didn't
|
|---|
| 210 | we? Shouldn't it skip forward? Let's look at what L<perlfunc/pack> says:
|
|---|
| 211 |
|
|---|
| 212 | x A null byte.
|
|---|
| 213 |
|
|---|
| 214 | Urgh. No wonder. There's a big difference between "a null byte",
|
|---|
| 215 | character zero, and "a space", character 32. Perl's put something
|
|---|
| 216 | between the date and the description - but unfortunately, we can't see
|
|---|
| 217 | it!
|
|---|
| 218 |
|
|---|
| 219 | What we actually need to do is expand the width of the fields. The C<A>
|
|---|
| 220 | format pads any non-existent characters with spaces, so we can use the
|
|---|
| 221 | additional spaces to line up our fields, like this:
|
|---|
| 222 |
|
|---|
| 223 | print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);
|
|---|
| 224 |
|
|---|
| 225 | (Note that you can put spaces in the template to make it more readable,
|
|---|
| 226 | but they don't translate to spaces in the output.) Here's what we got
|
|---|
| 227 | this time:
|
|---|
| 228 |
|
|---|
| 229 | 01/24/2001 Ahmed's Camel Emporium 1147.99
|
|---|
| 230 | 01/28/2001 Flea spray 24.99
|
|---|
| 231 | 01/29/2001 Camel rides to tourists 1235.00
|
|---|
| 232 | 03/23/2001 Totals 1235.00 1172.98
|
|---|
| 233 |
|
|---|
| 234 | That's a bit better, but we still have that last column which needs to
|
|---|
| 235 | be moved further over. There's an easy way to fix this up:
|
|---|
| 236 | unfortunately, we can't get C<pack> to right-justify our fields, but we
|
|---|
| 237 | can get C<sprintf> to do it:
|
|---|
| 238 |
|
|---|
| 239 | $tot_income = sprintf("%.2f", $tot_income);
|
|---|
| 240 | $tot_expend = sprintf("%12.2f", $tot_expend);
|
|---|
| 241 | $date = POSIX::strftime("%m/%d/%Y", localtime);
|
|---|
| 242 | print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);
|
|---|
| 243 |
|
|---|
| 244 | This time we get the right answer:
|
|---|
| 245 |
|
|---|
| 246 | 01/28/2001 Flea spray 24.99
|
|---|
| 247 | 01/29/2001 Camel rides to tourists 1235.00
|
|---|
| 248 | 03/23/2001 Totals 1235.00 1172.98
|
|---|
| 249 |
|
|---|
| 250 | So that's how we consume and produce fixed-width data. Let's recap what
|
|---|
| 251 | we've seen of C<pack> and C<unpack> so far:
|
|---|
| 252 |
|
|---|
| 253 | =over 3
|
|---|
| 254 |
|
|---|
| 255 | =item *
|
|---|
| 256 |
|
|---|
| 257 | Use C<pack> to go from several pieces of data to one fixed-width
|
|---|
| 258 | version; use C<unpack> to turn a fixed-width-format string into several
|
|---|
| 259 | pieces of data.
|
|---|
| 260 |
|
|---|
| 261 | =item *
|
|---|
| 262 |
|
|---|
| 263 | The pack format C<A> means "any character"; if you're C<pack>ing and
|
|---|
| 264 | you've run out of things to pack, C<pack> will fill the rest up with
|
|---|
| 265 | spaces.
|
|---|
| 266 |
|
|---|
| 267 | =item *
|
|---|
| 268 |
|
|---|
| 269 | C<x> means "skip a byte" when C<unpack>ing; when C<pack>ing, it means
|
|---|
| 270 | "introduce a null byte" - that's probably not what you mean if you're
|
|---|
| 271 | dealing with plain text.
|
|---|
| 272 |
|
|---|
| 273 | =item *
|
|---|
| 274 |
|
|---|
| 275 | You can follow the formats with numbers to say how many characters
|
|---|
| 276 | should be affected by that format: C<A12> means "take 12 characters";
|
|---|
| 277 | C<x6> means "skip 6 bytes" or "character 0, 6 times".
|
|---|
| 278 |
|
|---|
| 279 | =item *
|
|---|
| 280 |
|
|---|
| 281 | Instead of a number, you can use C<*> to mean "consume everything else
|
|---|
| 282 | left".
|
|---|
| 283 |
|
|---|
| 284 | B<Warning>: when packing multiple pieces of data, C<*> only means
|
|---|
| 285 | "consume all of the current piece of data". That's to say
|
|---|
| 286 |
|
|---|
| 287 | pack("A*A*", $one, $two)
|
|---|
| 288 |
|
|---|
| 289 | packs all of C<$one> into the first C<A*> and then all of C<$two> into
|
|---|
| 290 | the second. This is a general principle: each format character
|
|---|
| 291 | corresponds to one piece of data to be C<pack>ed.
|
|---|
| 292 |
|
|---|
| 293 | =back
|
|---|
| 294 |
|
|---|
| 295 |
|
|---|
| 296 |
|
|---|
| 297 | =head1 Packing Numbers
|
|---|
| 298 |
|
|---|
| 299 | So much for textual data. Let's get onto the meaty stuff that C<pack>
|
|---|
| 300 | and C<unpack> are best at: handling binary formats for numbers. There is,
|
|---|
| 301 | of course, not just one binary format - life would be too simple - but
|
|---|
| 302 | Perl will do all the finicky labor for you.
|
|---|
| 303 |
|
|---|
| 304 |
|
|---|
| 305 | =head2 Integers
|
|---|
| 306 |
|
|---|
| 307 | Packing and unpacking numbers implies conversion to and from some
|
|---|
| 308 | I<specific> binary representation. Leaving floating point numbers
|
|---|
| 309 | aside for the moment, the salient properties of any such representation
|
|---|
| 310 | are:
|
|---|
| 311 |
|
|---|
| 312 | =over 4
|
|---|
| 313 |
|
|---|
| 314 | =item *
|
|---|
| 315 |
|
|---|
| 316 | the number of bytes used for storing the integer,
|
|---|
| 317 |
|
|---|
| 318 | =item *
|
|---|
| 319 |
|
|---|
| 320 | whether the contents are interpreted as a signed or unsigned number,
|
|---|
| 321 |
|
|---|
| 322 | =item *
|
|---|
| 323 |
|
|---|
| 324 | the byte ordering: whether the first byte is the least or most
|
|---|
| 325 | significant byte (or: little-endian or big-endian, respectively).
|
|---|
| 326 |
|
|---|
| 327 | =back
|
|---|
| 328 |
|
|---|
| 329 | So, for instance, to pack 20302 to a signed 16 bit integer in your
|
|---|
| 330 | computer's representation you write
|
|---|
| 331 |
|
|---|
| 332 | my $ps = pack( 's', 20302 );
|
|---|
| 333 |
|
|---|
| 334 | Again, the result is a string, now containing 2 bytes. If you print
|
|---|
| 335 | this string (which is, generally, not recommended) you might see
|
|---|
| 336 | C<ON> or C<NO> (depending on your system's byte ordering) - or something
|
|---|
| 337 | entirely different if your computer doesn't use ASCII character encoding.
|
|---|
| 338 | Unpacking C<$ps> with the same template returns the original integer value:
|
|---|
| 339 |
|
|---|
| 340 | my( $s ) = unpack( 's', $ps );
|
|---|
| 341 |
|
|---|
| 342 | This is true for all numeric template codes. But don't expect miracles:
|
|---|
| 343 | if the packed value exceeds the allotted byte capacity, high order bits
|
|---|
| 344 | are silently discarded, and unpack certainly won't be able to pull them
|
|---|
| 345 | back out of some magic hat. And, when you pack using a signed template
|
|---|
| 346 | code such as C<s>, an excess value may result in the sign bit
|
|---|
| 347 | getting set, and unpacking this will smartly return a negative value.
|
|---|
| 348 |
|
|---|
| 349 | 16 bits won't get you too far with integers, but there is C<l> and C<L>
|
|---|
| 350 | for signed and unsigned 32-bit integers. And if this is not enough and
|
|---|
| 351 | your system supports 64 bit integers you can push the limits much closer
|
|---|
| 352 | to infinity with pack codes C<q> and C<Q>. A notable exception is provided
|
|---|
| 353 | by pack codes C<i> and C<I> for signed and unsigned integers of the
|
|---|
| 354 | "local custom" variety: Such an integer will take up as many bytes as
|
|---|
| 355 | a local C compiler returns for C<sizeof(int)>, but it'll use I<at least>
|
|---|
| 356 | 32 bits.
|
|---|
| 357 |
|
|---|
| 358 | Each of the integer pack codes C<sSlLqQ> results in a fixed number of bytes,
|
|---|
| 359 | no matter where you execute your program. This may be useful for some
|
|---|
| 360 | applications, but it does not provide for a portable way to pass data
|
|---|
| 361 | structures between Perl and C programs (bound to happen when you call
|
|---|
| 362 | XS extensions or the Perl function C<syscall>), or when you read or
|
|---|
| 363 | write binary files. What you'll need in this case are template codes that
|
|---|
| 364 | depend on what your local C compiler compiles when you code C<short> or
|
|---|
| 365 | C<unsigned long>, for instance. These codes and their corresponding
|
|---|
| 366 | byte lengths are shown in the table below. Since the C standard leaves
|
|---|
| 367 | much leeway with respect to the relative sizes of these data types, actual
|
|---|
| 368 | values may vary, and that's why the values are given as expressions in
|
|---|
| 369 | C and Perl. (If you'd like to use values from C<%Config> in your program
|
|---|
| 370 | you have to import it with C<use Config>.)
|
|---|
| 371 |
|
|---|
| 372 | signed unsigned byte length in C byte length in Perl
|
|---|
| 373 | s! S! sizeof(short) $Config{shortsize}
|
|---|
| 374 | i! I! sizeof(int) $Config{intsize}
|
|---|
| 375 | l! L! sizeof(long) $Config{longsize}
|
|---|
| 376 | q! Q! sizeof(long long) $Config{longlongsize}
|
|---|
| 377 |
|
|---|
| 378 | The C<i!> and C<I!> codes aren't different from C<i> and C<I>; they are
|
|---|
| 379 | tolerated for completeness' sake.
|
|---|
| 380 |
|
|---|
| 381 |
|
|---|
| 382 | =head2 Unpacking a Stack Frame
|
|---|
| 383 |
|
|---|
| 384 | Requesting a particular byte ordering may be necessary when you work with
|
|---|
| 385 | binary data coming from some specific architecture whereas your program could
|
|---|
| 386 | run on a totally different system. As an example, assume you have 24 bytes
|
|---|
| 387 | containing a stack frame as it happens on an Intel 8086:
|
|---|
| 388 |
|
|---|
| 389 | +---------+ +----+----+ +---------+
|
|---|
| 390 | TOS: | IP | TOS+4:| FL | FH | FLAGS TOS+14:| SI |
|
|---|
| 391 | +---------+ +----+----+ +---------+
|
|---|
| 392 | | CS | | AL | AH | AX | DI |
|
|---|
| 393 | +---------+ +----+----+ +---------+
|
|---|
| 394 | | BL | BH | BX | BP |
|
|---|
| 395 | +----+----+ +---------+
|
|---|
| 396 | | CL | CH | CX | DS |
|
|---|
| 397 | +----+----+ +---------+
|
|---|
| 398 | | DL | DH | DX | ES |
|
|---|
| 399 | +----+----+ +---------+
|
|---|
| 400 |
|
|---|
| 401 | First, we note that this time-honored 16-bit CPU uses little-endian order,
|
|---|
| 402 | and that's why the low order byte is stored at the lower address. To
|
|---|
| 403 | unpack such a (signed) short we'll have to use code C<v>. A repeat
|
|---|
| 404 | count unpacks all 12 shorts:
|
|---|
| 405 |
|
|---|
| 406 | my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =
|
|---|
| 407 | unpack( 'v12', $frame );
|
|---|
| 408 |
|
|---|
| 409 | Alternatively, we could have used C<C> to unpack the individually
|
|---|
| 410 | accessible byte registers FL, FH, AL, AH, etc.:
|
|---|
| 411 |
|
|---|
| 412 | my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =
|
|---|
| 413 | unpack( 'C10', substr( $frame, 4, 10 ) );
|
|---|
| 414 |
|
|---|
| 415 | It would be nice if we could do this in one fell swoop: unpack a short,
|
|---|
| 416 | back up a little, and then unpack 2 bytes. Since Perl I<is> nice, it
|
|---|
| 417 | proffers the template code C<X> to back up one byte. Putting this all
|
|---|
| 418 | together, we may now write:
|
|---|
| 419 |
|
|---|
| 420 | my( $ip, $cs,
|
|---|
| 421 | $flags,$fl,$fh,
|
|---|
| 422 | $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh,
|
|---|
| 423 | $si, $di, $bp, $ds, $es ) =
|
|---|
| 424 | unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame );
|
|---|
| 425 |
|
|---|
| 426 | (The clumsy construction of the template can be avoided - just read on!)
|
|---|
| 427 |
|
|---|
| 428 | We've taken some pains to construct the template so that it matches
|
|---|
| 429 | the contents of our frame buffer. Otherwise we'd either get undefined values,
|
|---|
| 430 | or C<unpack> could not unpack all. If C<pack> runs out of items, it will
|
|---|
| 431 | supply null strings (which are coerced into zeroes whenever the pack code
|
|---|
| 432 | says so).
|
|---|
| 433 |
|
|---|
| 434 |
|
|---|
| 435 | =head2 How to Eat an Egg on a Net
|
|---|
| 436 |
|
|---|
| 437 | The pack code for big-endian (high order byte at the lowest address) is
|
|---|
| 438 | C<n> for 16 bit and C<N> for 32 bit integers. You use these codes
|
|---|
| 439 | if you know that your data comes from a compliant architecture, but,
|
|---|
| 440 | surprisingly enough, you should also use these pack codes if you
|
|---|
| 441 | exchange binary data, across the network, with some system that you
|
|---|
| 442 | know next to nothing about. The simple reason is that this
|
|---|
| 443 | order has been chosen as the I<network order>, and all standard-fearing
|
|---|
| 444 | programs ought to follow this convention. (This is, of course, a stern
|
|---|
| 445 | backing for one of the Lilliputian parties and may well influence the
|
|---|
| 446 | political development there.) So, if the protocol expects you to send
|
|---|
| 447 | a message by sending the length first, followed by just so many bytes,
|
|---|
| 448 | you could write:
|
|---|
| 449 |
|
|---|
| 450 | my $buf = pack( 'N', length( $msg ) ) . $msg;
|
|---|
| 451 |
|
|---|
| 452 | or even:
|
|---|
| 453 |
|
|---|
| 454 | my $buf = pack( 'NA*', length( $msg ), $msg );
|
|---|
|
|---|