| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perliol - C API for Perl's implementation of IO in Layers.
|
|---|
| 4 |
|
|---|
| 5 | =head1 SYNOPSIS
|
|---|
| 6 |
|
|---|
| 7 | /* Defining a layer ... */
|
|---|
| 8 | #include <perliol.h>
|
|---|
| 9 |
|
|---|
| 10 | =head1 DESCRIPTION
|
|---|
| 11 |
|
|---|
| 12 | This document describes the behavior and implementation of the PerlIO
|
|---|
| 13 | abstraction described in L<perlapio> when C<USE_PERLIO> is defined (and
|
|---|
| 14 | C<USE_SFIO> is not).
|
|---|
| 15 |
|
|---|
| 16 | =head2 History and Background
|
|---|
| 17 |
|
|---|
| 18 | The PerlIO abstraction was introduced in perl5.003_02 but languished as
|
|---|
| 19 | just an abstraction until perl5.7.0. However during that time a number
|
|---|
| 20 | of perl extensions switched to using it, so the API is mostly fixed to
|
|---|
| 21 | maintain (source) compatibility.
|
|---|
| 22 |
|
|---|
| 23 | The aim of the implementation is to provide the PerlIO API in a flexible
|
|---|
| 24 | and platform neutral manner. It is also a trial of an "Object Oriented
|
|---|
| 25 | C, with vtables" approach which may be applied to perl6.
|
|---|
| 26 |
|
|---|
| 27 | =head2 Basic Structure
|
|---|
| 28 |
|
|---|
| 29 | PerlIO is a stack of layers.
|
|---|
| 30 |
|
|---|
| 31 | The low levels of the stack work with the low-level operating system
|
|---|
| 32 | calls (file descriptors in C) getting bytes in and out, the higher
|
|---|
| 33 | layers of the stack buffer, filter, and otherwise manipulate the I/O,
|
|---|
| 34 | and return characters (or bytes) to Perl. Terms I<above> and I<below>
|
|---|
| 35 | are used to refer to the relative positioning of the stack layers.
|
|---|
| 36 |
|
|---|
| 37 | A layer contains a "vtable", the table of I/O operations (at C level
|
|---|
| 38 | a table of function pointers), and status flags. The functions in the
|
|---|
| 39 | vtable implement operations like "open", "read", and "write".
|
|---|
| 40 |
|
|---|
| 41 | When I/O, for example "read", is requested, the request goes from Perl
|
|---|
| 42 | first down the stack using "read" functions of each layer, then at the
|
|---|
| 43 | bottom the input is requested from the operating system services, then
|
|---|
| 44 | the result is returned up the stack, finally being interpreted as Perl
|
|---|
| 45 | data.
|
|---|
| 46 |
|
|---|
| 47 | The requests do not necessarily go always all the way down to the
|
|---|
| 48 | operating system: that's where PerlIO buffering comes into play.
|
|---|
| 49 |
|
|---|
| 50 | When you do an open() and specify extra PerlIO layers to be deployed,
|
|---|
| 51 | the layers you specify are "pushed" on top of the already existing
|
|---|
| 52 | default stack. One way to see it is that "operating system is
|
|---|
| 53 | on the left" and "Perl is on the right".
|
|---|
| 54 |
|
|---|
| 55 | What exact layers are in this default stack depends on a lot of
|
|---|
| 56 | things: your operating system, Perl version, Perl compile time
|
|---|
| 57 | configuration, and Perl runtime configuration. See L<PerlIO>,
|
|---|
| 58 | L<perlrun/PERLIO>, and L<open> for more information.
|
|---|
| 59 |
|
|---|
| 60 | binmode() operates similarly to open(): by default the specified
|
|---|
| 61 | layers are pushed on top of the existing stack.
|
|---|
| 62 |
|
|---|
| 63 | However, note that even as the specified layers are "pushed on top"
|
|---|
| 64 | for open() and binmode(), this doesn't mean that the effects are
|
|---|
| 65 | limited to the "top": PerlIO layers can be very 'active' and inspect
|
|---|
| 66 | and affect layers also deeper in the stack. As an example there
|
|---|
| 67 | is a layer called "raw" which repeatedly "pops" layers until
|
|---|
| 68 | it reaches the first layer that has declared itself capable of
|
|---|
| 69 | handling binary data. The "pushed" layers are processed in left-to-right
|
|---|
| 70 | order.
|
|---|
| 71 |
|
|---|
| 72 | sysopen() operates (unsurprisingly) at a lower level in the stack than
|
|---|
| 73 | open(). For example in UNIX or UNIX-like systems sysopen() operates
|
|---|
| 74 | directly at the level of file descriptors: in the terms of PerlIO
|
|---|
| 75 | layers, it uses only the "unix" layer, which is a rather thin wrapper
|
|---|
| 76 | on top of the UNIX file descriptors.
|
|---|
| 77 |
|
|---|
| 78 | =head2 Layers vs Disciplines
|
|---|
| 79 |
|
|---|
| 80 | Initial discussion of the ability to modify IO streams behaviour used
|
|---|
| 81 | the term "discipline" for the entities which were added. This came (I
|
|---|
| 82 | believe) from the use of the term in "sfio", which in turn borrowed it
|
|---|
| 83 | from "line disciplines" on Unix terminals. However, this document (and
|
|---|
| 84 | the C code) uses the term "layer".
|
|---|
| 85 |
|
|---|
| 86 | This is, I hope, a natural term given the implementation, and should
|
|---|
| 87 | avoid connotations that are inherent in earlier uses of "discipline"
|
|---|
| 88 | for things which are rather different.
|
|---|
| 89 |
|
|---|
| 90 | =head2 Data Structures
|
|---|
| 91 |
|
|---|
| 92 | The basic data structure is a PerlIOl:
|
|---|
| 93 |
|
|---|
| 94 | typedef struct _PerlIO PerlIOl;
|
|---|
| 95 | typedef struct _PerlIO_funcs PerlIO_funcs;
|
|---|
| 96 | typedef PerlIOl *PerlIO;
|
|---|
| 97 |
|
|---|
| 98 | struct _PerlIO
|
|---|
| 99 | {
|
|---|
| 100 | PerlIOl * next; /* Lower layer */
|
|---|
| 101 | PerlIO_funcs * tab; /* Functions for this layer */
|
|---|
| 102 | IV flags; /* Various flags for state */
|
|---|
| 103 | };
|
|---|
| 104 |
|
|---|
| 105 | A C<PerlIOl *> is a pointer to the struct, and the I<application>
|
|---|
| 106 | level C<PerlIO *> is a pointer to a C<PerlIOl *> - i.e. a pointer
|
|---|
| 107 | to a pointer to the struct. This allows the application level C<PerlIO *>
|
|---|
| 108 | to remain constant while the actual C<PerlIOl *> underneath
|
|---|
| 109 | changes. (Compare perl's C<SV *> which remains constant while its
|
|---|
| 110 | C<sv_any> field changes as the scalar's type changes.) An IO stream is
|
|---|
| 111 | then in general represented as a pointer to this linked-list of
|
|---|
| 112 | "layers".
|
|---|
| 113 |
|
|---|
| 114 | It should be noted that because of the double indirection in a C<PerlIO *>,
|
|---|
| 115 | a C<< &(perlio->next) >> "is" a C<PerlIO *>, and so to some degree
|
|---|
| 116 | at least one layer can use the "standard" API on the next layer down.
|
|---|
| 117 |
|
|---|
| 118 | A "layer" is composed of two parts:
|
|---|
| 119 |
|
|---|
| 120 | =over 4
|
|---|
| 121 |
|
|---|
| 122 | =item 1.
|
|---|
| 123 |
|
|---|
| 124 | The functions and attributes of the "layer class".
|
|---|
| 125 |
|
|---|
| 126 | =item 2.
|
|---|
| 127 |
|
|---|
| 128 | The per-instance data for a particular handle.
|
|---|
| 129 |
|
|---|
| 130 | =back
|
|---|
| 131 |
|
|---|
| 132 | =head2 Functions and Attributes
|
|---|
| 133 |
|
|---|
| 134 | The functions and attributes are accessed via the "tab" (for table)
|
|---|
| 135 | member of C<PerlIOl>. The functions (methods of the layer "class") are
|
|---|
| 136 | fixed, and are defined by the C<PerlIO_funcs> type. They are broadly the
|
|---|
| 137 | same as the public C<PerlIO_xxxxx> functions:
|
|---|
| 138 |
|
|---|
| 139 | struct _PerlIO_funcs
|
|---|
| 140 | {
|
|---|
| 141 | Size_t fsize;
|
|---|
| 142 | char * name;
|
|---|
| 143 | Size_t size;
|
|---|
| 144 | IV kind;
|
|---|
| 145 | IV (*Pushed)(pTHX_ PerlIO *f,const char *mode,SV *arg, PerlIO_funcs *tab);
|
|---|
| 146 | IV (*Popped)(pTHX_ PerlIO *f);
|
|---|
| 147 | PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
|
|---|
| 148 | AV *layers, IV n,
|
|---|
| 149 | const char *mode,
|
|---|
| 150 | int fd, int imode, int perm,
|
|---|
| 151 | PerlIO *old,
|
|---|
| 152 | int narg, SV **args);
|
|---|
| 153 | IV (*Binmode)(pTHX_ PerlIO *f);
|
|---|
| 154 | SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags)
|
|---|
| 155 | IV (*Fileno)(pTHX_ PerlIO *f);
|
|---|
| 156 | PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags)
|
|---|
| 157 | /* Unix-like functions - cf sfio line disciplines */
|
|---|
| 158 | SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count);
|
|---|
| 159 | SSize_t (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
|
|---|
| 160 | SSize_t (*Write)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
|
|---|
| 161 | IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence);
|
|---|
| 162 | Off_t (*Tell)(pTHX_ PerlIO *f);
|
|---|
| 163 | IV (*Close)(pTHX_ PerlIO *f);
|
|---|
| 164 | /* Stdio-like buffered IO functions */
|
|---|
| 165 | IV (*Flush)(pTHX_ PerlIO *f);
|
|---|
| 166 | IV (*Fill)(pTHX_ PerlIO *f);
|
|---|
| 167 | IV (*Eof)(pTHX_ PerlIO *f);
|
|---|
| 168 | IV (*Error)(pTHX_ PerlIO *f);
|
|---|
| 169 | void (*Clearerr)(pTHX_ PerlIO *f);
|
|---|
| 170 | void (*Setlinebuf)(pTHX_ PerlIO *f);
|
|---|
| 171 | /* Perl's snooping functions */
|
|---|
| 172 | STDCHAR * (*Get_base)(pTHX_ PerlIO *f);
|
|---|
| 173 | Size_t (*Get_bufsiz)(pTHX_ PerlIO *f);
|
|---|
| 174 | STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f);
|
|---|
| 175 | SSize_t (*Get_cnt)(pTHX_ PerlIO *f);
|
|---|
| 176 | void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt);
|
|---|
| 177 | };
|
|---|
| 178 |
|
|---|
| 179 | The first few members of the struct give a function table size for
|
|---|
| 180 | compatibility check "name" for the layer, the size to C<malloc> for the per-instance data,
|
|---|
| 181 | and some flags which are attributes of the class as whole (such as whether it is a buffering
|
|---|
| 182 | layer), then follow the functions which fall into four basic groups:
|
|---|
| 183 |
|
|---|
| 184 | =over 4
|
|---|
| 185 |
|
|---|
| 186 | =item 1.
|
|---|
| 187 |
|
|---|
| 188 | Opening and setup functions
|
|---|
| 189 |
|
|---|
| 190 | =item 2.
|
|---|
| 191 |
|
|---|
| 192 | Basic IO operations
|
|---|
| 193 |
|
|---|
| 194 | =item 3.
|
|---|
| 195 |
|
|---|
| 196 | Stdio class buffering options.
|
|---|
| 197 |
|
|---|
| 198 | =item 4.
|
|---|
| 199 |
|
|---|
| 200 | Functions to support Perl's traditional "fast" access to the buffer.
|
|---|
| 201 |
|
|---|
| 202 | =back
|
|---|
| 203 |
|
|---|
| 204 | A layer does not have to implement all the functions, but the whole
|
|---|
| 205 | table has to be present. Unimplemented slots can be NULL (which will
|
|---|
| 206 | result in an error when called) or can be filled in with stubs to
|
|---|
| 207 | "inherit" behaviour from a "base class". This "inheritance" is fixed
|
|---|
| 208 | for all instances of the layer, but as the layer chooses which stubs
|
|---|
| 209 | to populate the table, limited "multiple inheritance" is possible.
|
|---|
| 210 |
|
|---|
| 211 | =head2 Per-instance Data
|
|---|
| 212 |
|
|---|
| 213 | The per-instance data are held in memory beyond the basic PerlIOl
|
|---|
| 214 | struct, by making a PerlIOl the first member of the layer's struct
|
|---|
| 215 | thus:
|
|---|
| 216 |
|
|---|
| 217 | typedef struct
|
|---|
| 218 | {
|
|---|
| 219 | struct _PerlIO base; /* Base "class" info */
|
|---|
| 220 | STDCHAR * buf; /* Start of buffer */
|
|---|
| 221 | STDCHAR * end; /* End of valid part of buffer */
|
|---|
| 222 | STDCHAR * ptr; /* Current position in buffer */
|
|---|
| 223 | Off_t posn; /* Offset of buf into the file */
|
|---|
| 224 | Size_t bufsiz; /* Real size of buffer */
|
|---|
| 225 | IV oneword; /* Emergency buffer */
|
|---|
| 226 | } PerlIOBuf;
|
|---|
| 227 |
|
|---|
| 228 | In this way (as for perl's scalars) a pointer to a PerlIOBuf can be
|
|---|
| 229 | treated as a pointer to a PerlIOl.
|
|---|
| 230 |
|
|---|
| 231 | =head2 Layers in action.
|
|---|
| 232 |
|
|---|
| 233 | table perlio unix
|
|---|
| 234 | | |
|
|---|
| 235 | +-----------+ +----------+ +--------+
|
|---|
| 236 | PerlIO ->| |--->| next |--->| NULL |
|
|---|
| 237 | +-----------+ +----------+ +--------+
|
|---|
| 238 | | | | buffer | | fd |
|
|---|
| 239 | +-----------+ | | +--------+
|
|---|
| 240 | | | +----------+
|
|---|
| 241 |
|
|---|
| 242 |
|
|---|
| 243 | The above attempts to show how the layer scheme works in a simple case.
|
|---|
| 244 | The application's C<PerlIO *> points to an entry in the table(s)
|
|---|
| 245 | representing open (allocated) handles. For example the first three slots
|
|---|
| 246 | in the table correspond to C<stdin>,C<stdout> and C<stderr>. The table
|
|---|
| 247 | in turn points to the current "top" layer for the handle - in this case
|
|---|
| 248 | an instance of the generic buffering layer "perlio". That layer in turn
|
|---|
| 249 | points to the next layer down - in this case the lowlevel "unix" layer.
|
|---|
| 250 |
|
|---|
| 251 | The above is roughly equivalent to a "stdio" buffered stream, but with
|
|---|
| 252 | much more flexibility:
|
|---|
| 253 |
|
|---|
| 254 | =over 4
|
|---|
| 255 |
|
|---|
| 256 | =item *
|
|---|
| 257 |
|
|---|
| 258 | If Unix level C<read>/C<write>/C<lseek> is not appropriate for (say)
|
|---|
| 259 | sockets then the "unix" layer can be replaced (at open time or even
|
|---|
| 260 | dynamically) with a "socket" layer.
|
|---|
| 261 |
|
|---|
| 262 | =item *
|
|---|
| 263 |
|
|---|
| 264 | Different handles can have different buffering schemes. The "top"
|
|---|
| 265 | layer could be the "mmap" layer if reading disk files was quicker
|
|---|
| 266 | using C<mmap> than C<read>. An "unbuffered" stream can be implemented
|
|---|
| 267 | simply by not having a buffer layer.
|
|---|
| 268 |
|
|---|
| 269 | =item *
|
|---|
| 270 |
|
|---|
| 271 | Extra layers can be inserted to process the data as it flows through.
|
|---|
| 272 | This was the driving need for including the scheme in perl 5.7.0+ - we
|
|---|
| 273 | needed a mechanism to allow data to be translated between perl's
|
|---|
| 274 | internal encoding (conceptually at least Unicode as UTF-8), and the
|
|---|
| 275 | "native" format used by the system. This is provided by the
|
|---|
| 276 | ":encoding(xxxx)" layer which typically sits above the buffering layer.
|
|---|
| 277 |
|
|---|
| 278 | =item *
|
|---|
| 279 |
|
|---|
| 280 | A layer can be added that does "\n" to CRLF translation. This layer
|
|---|
| 281 | can be used on any platform, not just those that normally do such
|
|---|
| 282 | things.
|
|---|
| 283 |
|
|---|
| 284 | =back
|
|---|
| 285 |
|
|---|
| 286 | =head2 Per-instance flag bits
|
|---|
| 287 |
|
|---|
| 288 | The generic flag bits are a hybrid of C<O_XXXXX> style flags deduced
|
|---|
| 289 | from the mode string passed to C<PerlIO_open()>, and state bits for
|
|---|
| 290 | typical buffer layers.
|
|---|
| 291 |
|
|---|
| 292 | =over 4
|
|---|
| 293 |
|
|---|
| 294 | =item PERLIO_F_EOF
|
|---|
| 295 |
|
|---|
| 296 | End of file.
|
|---|
| 297 |
|
|---|
| 298 | =item PERLIO_F_CANWRITE
|
|---|
| 299 |
|
|---|
| 300 | Writes are permitted, i.e. opened as "w" or "r+" or "a", etc.
|
|---|
| 301 |
|
|---|
| 302 | =item PERLIO_F_CANREAD
|
|---|
| 303 |
|
|---|
| 304 | Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick).
|
|---|
| 305 |
|
|---|
| 306 | =item PERLIO_F_ERROR
|
|---|
| 307 |
|
|---|
| 308 | An error has occurred (for C<PerlIO_error()>).
|
|---|
| 309 |
|
|---|
| 310 | =item PERLIO_F_TRUNCATE
|
|---|
| 311 |
|
|---|
| 312 | Truncate file suggested by open mode.
|
|---|
| 313 |
|
|---|
| 314 | =item PERLIO_F_APPEND
|
|---|
| 315 |
|
|---|
| 316 | All writes should be appends.
|
|---|
| 317 |
|
|---|
| 318 | =item PERLIO_F_CRLF
|
|---|
| 319 |
|
|---|
| 320 | Layer is performing Win32-like "\n" mapped to CR,LF for output and CR,LF
|
|---|
| 321 | mapped to "\n" for input. Normally the provided "crlf" layer is the only
|
|---|
| 322 | layer that need bother about this. C<PerlIO_binmode()> will mess with this
|
|---|
| 323 | flag rather than add/remove layers if the C<PERLIO_K_CANCRLF> bit is set
|
|---|
| 324 | for the layers class.
|
|---|
| 325 |
|
|---|
| 326 | =item PERLIO_F_UTF8
|
|---|
| 327 |
|
|---|
| 328 | Data written to this layer should be UTF-8 encoded; data provided
|
|---|
| 329 | by this layer should be considered UTF-8 encoded. Can be set on any layer
|
|---|
| 330 | by ":utf8" dummy layer. Also set on ":encoding" layer.
|
|---|
| 331 |
|
|---|
| 332 | =item PERLIO_F_UNBUF
|
|---|
| 333 |
|
|---|
| 334 | Layer is unbuffered - i.e. write to next layer down should occur for
|
|---|
| 335 | each write to this layer.
|
|---|
| 336 |
|
|---|
| 337 | =item PERLIO_F_WRBUF
|
|---|
| 338 |
|
|---|
| 339 | The buffer for this layer currently holds data written to it but not sent
|
|---|
| 340 | to next layer.
|
|---|
| 341 |
|
|---|
| 342 | =item PERLIO_F_RDBUF
|
|---|
| 343 |
|
|---|
| 344 | The buffer for this layer currently holds unconsumed data read from
|
|---|
| 345 | layer below.
|
|---|
| 346 |
|
|---|
| 347 | =item PERLIO_F_LINEBUF
|
|---|
| 348 |
|
|---|
| 349 | Layer is line buffered. Write data should be passed to next layer down
|
|---|
| 350 | whenever a "\n" is seen. Any data beyond the "\n" should then be
|
|---|
| 351 | processed.
|
|---|
| 352 |
|
|---|
| 353 | =item PERLIO_F_TEMP
|
|---|
| 354 |
|
|---|
| 355 | File has been C<unlink()>ed, or should be deleted on C<close()>.
|
|---|
| 356 |
|
|---|
| 357 | =item PERLIO_F_OPEN
|
|---|
| 358 |
|
|---|
| 359 | Handle is open.
|
|---|
| 360 |
|
|---|
|
|---|