| 1 | package PerlIO;
|
|---|
| 2 |
|
|---|
| 3 | our $VERSION = '1.04';
|
|---|
| 4 |
|
|---|
| 5 | # Map layer name to package that defines it
|
|---|
| 6 | our %alias;
|
|---|
| 7 |
|
|---|
| 8 | sub import
|
|---|
| 9 | {
|
|---|
| 10 | my $class = shift;
|
|---|
| 11 | while (@_)
|
|---|
| 12 | {
|
|---|
| 13 | my $layer = shift;
|
|---|
| 14 | if (exists $alias{$layer})
|
|---|
| 15 | {
|
|---|
| 16 | $layer = $alias{$layer}
|
|---|
| 17 | }
|
|---|
| 18 | else
|
|---|
| 19 | {
|
|---|
| 20 | $layer = "${class}::$layer";
|
|---|
| 21 | }
|
|---|
| 22 | eval "require $layer";
|
|---|
| 23 | warn $@ if $@;
|
|---|
| 24 | }
|
|---|
| 25 | }
|
|---|
| 26 |
|
|---|
| 27 | sub F_UTF8 () { 0x8000 }
|
|---|
| 28 |
|
|---|
| 29 | 1;
|
|---|
| 30 | __END__
|
|---|
| 31 |
|
|---|
| 32 | =head1 NAME
|
|---|
| 33 |
|
|---|
| 34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
|
|---|
| 35 |
|
|---|
| 36 | =head1 SYNOPSIS
|
|---|
| 37 |
|
|---|
| 38 | open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files
|
|---|
| 39 |
|
|---|
| 40 | open($fh,"<","his.jpg"); # portably open a binary file for reading
|
|---|
| 41 | binmode($fh);
|
|---|
| 42 |
|
|---|
| 43 | Shell:
|
|---|
| 44 | PERLIO=perlio perl ....
|
|---|
| 45 |
|
|---|
| 46 | =head1 DESCRIPTION
|
|---|
| 47 |
|
|---|
| 48 | When an undefined layer 'foo' is encountered in an C<open> or
|
|---|
| 49 | C<binmode> layer specification then C code performs the equivalent of:
|
|---|
| 50 |
|
|---|
| 51 | use PerlIO 'foo';
|
|---|
| 52 |
|
|---|
| 53 | The perl code in PerlIO.pm then attempts to locate a layer by doing
|
|---|
| 54 |
|
|---|
| 55 | require PerlIO::foo;
|
|---|
| 56 |
|
|---|
| 57 | Otherwise the C<PerlIO> package is a place holder for additional
|
|---|
| 58 | PerlIO related functions.
|
|---|
| 59 |
|
|---|
| 60 | The following layers are currently defined:
|
|---|
| 61 |
|
|---|
| 62 | =over 4
|
|---|
| 63 |
|
|---|
| 64 | =item :unix
|
|---|
| 65 |
|
|---|
| 66 | Lowest level layer which provides basic PerlIO operations in terms of
|
|---|
| 67 | UNIX/POSIX numeric file descriptor calls
|
|---|
| 68 | (open(), read(), write(), lseek(), close()).
|
|---|
| 69 |
|
|---|
| 70 | =item :stdio
|
|---|
| 71 |
|
|---|
| 72 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note
|
|---|
| 73 | that as this is "real" stdio it will ignore any layers beneath it and
|
|---|
| 74 | got straight to the operating system via the C library as usual.
|
|---|
| 75 |
|
|---|
| 76 | =item :perlio
|
|---|
| 77 |
|
|---|
| 78 | A from scratch implementation of buffering for PerlIO. Provides fast
|
|---|
| 79 | access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt>
|
|---|
| 80 | and in general attempts to minimize data copying.
|
|---|
| 81 |
|
|---|
| 82 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
|
|---|
| 83 |
|
|---|
| 84 | =item :crlf
|
|---|
| 85 |
|
|---|
| 86 | A layer that implements DOS/Windows like CRLF line endings. On read
|
|---|
| 87 | converts pairs of CR,LF to a single "\n" newline character. On write
|
|---|
| 88 | converts each "\n" to a CR,LF pair. Note that this layer likes to be
|
|---|
| 89 | one of its kind: it silently ignores attempts to be pushed into the
|
|---|
| 90 | layer stack more than once.
|
|---|
| 91 |
|
|---|
| 92 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z
|
|---|
| 93 | as being an end-of-file marker.
|
|---|
| 94 |
|
|---|
| 95 | (Gory details follow) To be more exact what happens is this: after
|
|---|
| 96 | pushing itself to the stack, the C<:crlf> layer checks all the layers
|
|---|
| 97 | below itself to find the first layer that is capable of being a CRLF
|
|---|
| 98 | layer but is not yet enabled to be a CRLF layer. If it finds such a
|
|---|
| 99 | layer, it enables the CRLFness of that other deeper layer, and then
|
|---|
| 100 | pops itself off the stack. If not, fine, use the one we just pushed.
|
|---|
| 101 |
|
|---|
| 102 | The end result is that a C<:crlf> means "please enable the first CRLF
|
|---|
| 103 | layer you can find, and if you can't find one, here would be a good
|
|---|
| 104 | spot to place a new one."
|
|---|
| 105 |
|
|---|
| 106 | Based on the C<:perlio> layer.
|
|---|
| 107 |
|
|---|
| 108 | =item :mmap
|
|---|
| 109 |
|
|---|
| 110 | A layer which implements "reading" of files by using C<mmap()> to
|
|---|
| 111 | make (whole) file appear in the process's address space, and then
|
|---|
| 112 | using that as PerlIO's "buffer". This I<may> be faster in certain
|
|---|
| 113 | circumstances for large files, and may result in less physical memory
|
|---|
| 114 | use when multiple processes are reading the same file.
|
|---|
| 115 |
|
|---|
| 116 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio>
|
|---|
| 117 | layer. Writes also behave like C<:perlio> layer as C<mmap()> for write
|
|---|
| 118 | needs extra house-keeping (to extend the file) which negates any advantage.
|
|---|
| 119 |
|
|---|
| 120 | The C<:mmap> layer will not exist if platform does not support C<mmap()>.
|
|---|
| 121 |
|
|---|
| 122 | =item :utf8
|
|---|
| 123 |
|
|---|
| 124 | Declares that the stream accepts perl's internal encoding of
|
|---|
| 125 | characters. (Which really is UTF-8 on ASCII machines, but is
|
|---|
| 126 | UTF-EBCDIC on EBCDIC machines.) This allows any character perl can
|
|---|
| 127 | represent to be read from or written to the stream. The UTF-X encoding
|
|---|
| 128 | is chosen to render simple text parts (i.e. non-accented letters,
|
|---|
| 129 | digits and common punctuation) human readable in the encoded file.
|
|---|
| 130 |
|
|---|
| 131 | Here is how to write your native data out using UTF-8 (or UTF-EBCDIC)
|
|---|
| 132 | and then read it back in.
|
|---|
| 133 |
|
|---|
| 134 | open(F, ">:utf8", "data.utf");
|
|---|
| 135 | print F $out;
|
|---|
| 136 | close(F);
|
|---|
| 137 |
|
|---|
| 138 | open(F, "<:utf8", "data.utf");
|
|---|
| 139 | $in = <F>;
|
|---|
| 140 | close(F);
|
|---|
| 141 |
|
|---|
| 142 | =item :bytes
|
|---|
| 143 |
|
|---|
| 144 | This is the inverse of C<:utf8> layer. It turns off the flag
|
|---|
| 145 | on the layer below so that data read from it is considered to
|
|---|
| 146 | be "octets" i.e. characters in range 0..255 only. Likewise
|
|---|
| 147 | on output perl will warn if a "wide" character is written
|
|---|
| 148 | to a such a stream.
|
|---|
| 149 |
|
|---|
| 150 | =item :raw
|
|---|
| 151 |
|
|---|
| 152 | The C<:raw> layer is I<defined> as being identical to calling
|
|---|
| 153 | C<binmode($fh)> - the stream is made suitable for passing binary data
|
|---|
| 154 | i.e. each byte is passed as-is. The stream will still be
|
|---|
| 155 | buffered.
|
|---|
| 156 |
|
|---|
| 157 | In Perl 5.6 and some books the C<:raw> layer (previously sometimes also
|
|---|
| 158 | referred to as a "discipline") is documented as the inverse of the
|
|---|
| 159 | C<:crlf> layer. That is no longer the case - other layers which would
|
|---|
| 160 | alter binary nature of the stream are also disabled. If you want UNIX
|
|---|
| 161 | line endings on a platform that normally does CRLF translation, but still
|
|---|
| 162 | want UTF-8 or encoding defaults the appropriate thing to do is to add
|
|---|
| 163 | C<:perlio> to PERLIO environment variable.
|
|---|
| 164 |
|
|---|
| 165 | The implementation of C<:raw> is as a pseudo-layer which when "pushed"
|
|---|
| 166 | pops itself and then any layers which do not declare themselves as suitable
|
|---|
| 167 | for binary data. (Undoing :utf8 and :crlf are implemented by clearing
|
|---|
| 168 | flags rather than popping layers but that is an implementation detail.)
|
|---|
| 169 |
|
|---|
| 170 | As a consequence of the fact that C<:raw> normally pops layers
|
|---|
| 171 | it usually only makes sense to have it as the only or first element in
|
|---|
| 172 | a layer specification. When used as the first element it provides
|
|---|
| 173 | a known base on which to build e.g.
|
|---|
| 174 |
|
|---|
| 175 | open($fh,":raw:utf8",...)
|
|---|
| 176 |
|
|---|
| 177 | will construct a "binary" stream, but then enable UTF-8 translation.
|
|---|
| 178 |
|
|---|
| 179 | =item :pop
|
|---|
| 180 |
|
|---|
| 181 | A pseudo layer that removes the top-most layer. Gives perl code
|
|---|
| 182 | a way to manipulate the layer stack. Should be considered
|
|---|
| 183 | as experimental. Note that C<:pop> only works on real layers
|
|---|
| 184 | and will not undo the effects of pseudo layers like C<:utf8>.
|
|---|
| 185 | An example of a possible use might be:
|
|---|
| 186 |
|
|---|
| 187 | open($fh,...)
|
|---|
| 188 | ...
|
|---|
| 189 | binmode($fh,":encoding(...)"); # next chunk is encoded
|
|---|
| 190 | ...
|
|---|
| 191 | binmode($fh,":pop"); # back to un-encoded
|
|---|
| 192 |
|
|---|
| 193 | A more elegant (and safer) interface is needed.
|
|---|
| 194 |
|
|---|
| 195 | =item :win32
|
|---|
| 196 |
|
|---|
| 197 | On Win32 platforms this I<experimental> layer uses native "handle" IO
|
|---|
| 198 | rather than unix-like numeric file descriptor layer. Known to be
|
|---|
| 199 | buggy as of perl 5.8.2.
|
|---|
| 200 |
|
|---|
| 201 | =back
|
|---|
| 202 |
|
|---|
| 203 | =head2 Custom Layers
|
|---|
| 204 |
|
|---|
| 205 | It is possible to write custom layers in addition to the above builtin
|
|---|
| 206 | ones, both in C/XS and Perl. Two such layers (and one example written
|
|---|
| 207 | in Perl using the latter) come with the Perl distribution.
|
|---|
| 208 |
|
|---|
| 209 | =over 4
|
|---|
| 210 |
|
|---|
| 211 | =item :encoding
|
|---|
| 212 |
|
|---|
| 213 | Use C<:encoding(ENCODING)> either in open() or binmode() to install
|
|---|
| 214 | a layer that does transparently character set and encoding transformations,
|
|---|
| 215 | for example from Shift-JIS to Unicode. Note that under C<stdio>
|
|---|
| 216 | an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding>
|
|---|
| 217 | for more information.
|
|---|
| 218 |
|
|---|
| 219 | =item :via
|
|---|
| 220 |
|
|---|
| 221 | Use C<:via(MODULE)> either in open() or binmode() to install a layer
|
|---|
| 222 | that does whatever transformation (for example compression /
|
|---|
| 223 | decompression, encryption / decryption) to the filehandle.
|
|---|
| 224 | See L<PerlIO::via> for more information.
|
|---|
| 225 |
|
|---|
| 226 | =back
|
|---|
| 227 |
|
|---|
| 228 | =head2 Alternatives to raw
|
|---|
| 229 |
|
|---|
| 230 | To get a binary stream an alternate method is to use:
|
|---|
| 231 |
|
|---|
| 232 | open($fh,"whatever")
|
|---|
| 233 | binmode($fh);
|
|---|
| 234 |
|
|---|
| 235 | this has advantage of being backward compatible with how such things have
|
|---|
| 236 | had to be coded on some platforms for years.
|
|---|
| 237 |
|
|---|
| 238 | To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>)
|
|---|
| 239 | in the open call:
|
|---|
| 240 |
|
|---|
| 241 | open($fh,"<:unix",$path)
|
|---|
| 242 |
|
|---|
| 243 | =head2 Defaults and how to override them
|
|---|
| 244 |
|
|---|
| 245 | If the platform is MS-DOS like and normally does CRLF to "\n"
|
|---|
| 246 | translation for text files then the default layers are :
|
|---|
| 247 |
|
|---|
| 248 | unix crlf
|
|---|
| 249 |
|
|---|
| 250 | (The low level "unix" layer may be replaced by a platform specific low
|
|---|
| 251 | level layer.)
|
|---|
| 252 |
|
|---|
| 253 | Otherwise if C<Configure> found out how to do "fast" IO using system's
|
|---|
| 254 | stdio, then the default layers are:
|
|---|
| 255 |
|
|---|
| 256 | unix stdio
|
|---|
| 257 |
|
|---|
| 258 | Otherwise the default layers are
|
|---|
| 259 |
|
|---|
| 260 | unix perlio
|
|---|
| 261 |
|
|---|
| 262 | These defaults may change once perlio has been better tested and tuned.
|
|---|
| 263 |
|
|---|
| 264 | The default can be overridden by setting the environment variable
|
|---|
| 265 | PERLIO to a space separated list of layers (C<unix> or platform low
|
|---|
| 266 | level layer is always pushed first).
|
|---|
| 267 |
|
|---|
| 268 | This can be used to see the effect of/bugs in the various layers e.g.
|
|---|
| 269 |
|
|---|
| 270 | cd .../perl/t
|
|---|
| 271 | PERLIO=stdio ./perl harness
|
|---|
| 272 | PERLIO=perlio ./perl harness
|
|---|
| 273 |
|
|---|
| 274 | For the various value of PERLIO see L<perlrun/PERLIO>.
|
|---|
| 275 |
|
|---|
| 276 | =head2 Querying the layers of filehandles
|
|---|
| 277 |
|
|---|
| 278 | The following returns the B<names> of the PerlIO layers on a filehandle.
|
|---|
| 279 |
|
|---|
| 280 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
|
|---|
| 281 |
|
|---|
| 282 | The layers are returned in the order an open() or binmode() call would
|
|---|
| 283 | use them. Note that the "default stack" depends on the operating
|
|---|
| 284 | system and on the Perl version, and both the compile-time and
|
|---|
| 285 | runtime configurations of Perl.
|
|---|
| 286 |
|
|---|
| 287 | The following table summarizes the default layers on UNIX-like and
|
|---|
| 288 | DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>:
|
|---|
| 289 |
|
|---|
| 290 | PERLIO UNIX-like DOS-like
|
|---|
| 291 | ------ --------- --------
|
|---|
| 292 | unset / "" unix perlio / stdio [1] unix crlf
|
|---|
| 293 | stdio unix perlio / stdio [1] stdio
|
|---|
| 294 | perlio unix perlio unix perlio
|
|---|
| 295 | mmap unix mmap unix mmap
|
|---|
| 296 |
|
|---|
| 297 | # [1] "stdio" if Configure found out how to do "fast stdio" (depends
|
|---|
| 298 | # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio"
|
|---|
| 299 |
|
|---|
| 300 | By default the layers from the input side of the filehandle is
|
|---|
| 301 | returned, to get the output side use the optional C<output> argument:
|
|---|
| 302 |
|
|---|
| 303 | my @layers = PerlIO::get_layers($fh, output => 1);
|
|---|
| 304 |
|
|---|
| 305 | (Usually the layers are identical on either side of a filehandle but
|
|---|
| 306 | for example with sockets there may be differences, or if you have
|
|---|
| 307 | been using the C<open> pragma.)
|
|---|
| 308 |
|
|---|
| 309 | There is no set_layers(), nor does get_layers() return a tied array
|
|---|
| 310 | mirroring the stack, or anything fancy like that. This is not
|
|---|
| 311 | accidental or unintentional. The PerlIO layer stack is a bit more
|
|---|
| 312 | complicated than just a stack (see for example the behaviour of C<:raw>).
|
|---|
| 313 | You are supposed to use open() and binmode() to manipulate the stack.
|
|---|
| 314 |
|
|---|
| 315 | B<Implementation details follow, please close your eyes.>
|
|---|
| 316 |
|
|---|
| 317 | The arguments to layers are by default returned in parenthesis after
|
|---|
| 318 | the name of the layer, and certain layers (like C<utf8>) are not real
|
|---|
| 319 | layers but instead flags on real layers: to get all of these returned
|
|---|
| 320 | separately use the optional C<details> argument:
|
|---|
| 321 |
|
|---|
| 322 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
|
|---|
| 323 |
|
|---|
| 324 | The result will be up to be three times the number of layers:
|
|---|
| 325 | the first element will be a name, the second element the arguments
|
|---|
| 326 | (unspecified arguments will be C<undef>), the third element the flags,
|
|---|
| 327 | the fourth element a name again, and so forth.
|
|---|
| 328 |
|
|---|
| 329 | B<You may open your eyes now.>
|
|---|
| 330 |
|
|---|
| 331 | =head1 AUTHOR
|
|---|
| 332 |
|
|---|
| 333 | Nick Ing-Simmons E<lt>[email protected]<gt>
|
|---|
| 334 |
|
|---|
| 335 | =head1 SEE ALSO
|
|---|
| 336 |
|
|---|
| 337 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>,
|
|---|
| 338 | L<Encode>
|
|---|
| 339 |
|
|---|
| 340 | =cut
|
|---|