You are viewing the version of this documentation from Perl 5.30.3. View the latest version

CONTENTS

NAME

perliol - C API for Perl's implementation of IO in Layers.

SYNOPSIS

/* Defining a layer ... */
#include <perliol.h>

DESCRIPTION

This document describes the behavior and implementation of the PerlIO abstraction described in perlapio when USE_PERLIO is defined.

History and Background

The PerlIO abstraction was introduced in perl5.003_02 but languished as just an abstraction until perl5.7.0. However during that time a number of perl extensions switched to using it, so the API is mostly fixed to maintain (source) compatibility.

The aim of the implementation is to provide the PerlIO API in a flexible and platform neutral manner. It is also a trial of an "Object Oriented C, with vtables" approach which may be applied to Perl 6.

Basic Structure

PerlIO is a stack of layers.

The low levels of the stack work with the low-level operating system calls (file descriptors in C) getting bytes in and out, the higher layers of the stack buffer, filter, and otherwise manipulate the I/O, and return characters (or bytes) to Perl. Terms above and below are used to refer to the relative positioning of the stack layers.

A layer contains a "vtable", the table of I/O operations (at C level a table of function pointers), and status flags. The functions in the vtable implement operations like "open", "read", and "write".

When I/O, for example "read", is requested, the request goes from Perl first down the stack using "read" functions of each layer, then at the bottom the input is requested from the operating system services, then the result is returned up the stack, finally being interpreted as Perl data.

The requests do not necessarily go always all the way down to the operating system: that's where PerlIO buffering comes into play.

When you do an open() and specify extra PerlIO layers to be deployed, the layers you specify are "pushed" on top of the already existing default stack. One way to see it is that "operating system is on the left" and "Perl is on the right".

What exact layers are in this default stack depends on a lot of things: your operating system, Perl version, Perl compile time configuration, and Perl runtime configuration. See PerlIO, "PERLIO" in perlrun, and open for more information.

binmode() operates similarly to open(): by default the specified layers are pushed on top of the existing stack.

However, note that even as the specified layers are "pushed on top" for open() and binmode(), this doesn't mean that the effects are limited to the "top": PerlIO layers can be very 'active' and inspect and affect layers also deeper in the stack. As an example there is a layer called "raw" which repeatedly "pops" layers until it reaches the first layer that has declared itself capable of handling binary data. The "pushed" layers are processed in left-to-right order.

sysopen() operates (unsurprisingly) at a lower level in the stack than open(). For example in Unix or Unix-like systems sysopen() operates directly at the level of file descriptors: in the terms of PerlIO layers, it uses only the "unix" layer, which is a rather thin wrapper on top of the Unix file descriptors.

Layers vs Disciplines

Initial discussion of the ability to modify IO streams behaviour used the term "discipline" for the entities which were added. This came (I believe) from the use of the term in "sfio", which in turn borrowed it from "line disciplines" on Unix terminals. However, this document (and the C code) uses the term "layer".

This is, I hope, a natural term given the implementation, and should avoid connotations that are inherent in earlier uses of "discipline" for things which are rather different.

Data Structures

The basic data structure is a PerlIOl:

typedef struct _PerlIO PerlIOl;
typedef struct _PerlIO_funcs PerlIO_funcs;
typedef PerlIOl *PerlIO;

struct _PerlIO
{
 PerlIOl *	next;       /* Lower layer */
 PerlIO_funcs *	tab;        /* Functions for this layer */
 U32		flags;      /* Various flags for state */
};

A PerlIOl * is a pointer to the struct, and the application level PerlIO * is a pointer to a PerlIOl * - i.e. a pointer to a pointer to the struct. This allows the application level PerlIO * to remain constant while the actual PerlIOl * underneath changes. (Compare perl's SV * which remains constant while its sv_any field changes as the scalar's type changes.) An IO stream is then in general represented as a pointer to this linked-list of "layers".

It should be noted that because of the double indirection in a PerlIO *, a &(perlio->next) "is" a PerlIO *, and so to some degree at least one layer can use the "standard" API on the next layer down.

A "layer" is composed of two parts:

  1. The functions and attributes of the "layer class".

  2. The per-instance data for a particular handle.

Functions and Attributes

The functions and attributes are accessed via the "tab" (for table) member of PerlIOl. The functions (methods of the layer "class") are fixed, and are defined by the PerlIO_funcs type. They are broadly the same as the public PerlIO_xxxxx functions:

struct _PerlIO_funcs
{
 Size_t     fsize;
 char *     name;
 Size_t     size;
 IV         kind;
 IV         (*Pushed)(pTHX_ PerlIO *f,
                            const char *mode,
                            SV *arg,
                            PerlIO_funcs *tab);
 IV         (*Popped)(pTHX_ PerlIO *f);
 PerlIO *   (*Open)(pTHX_ PerlIO_funcs *tab,
                          PerlIO_list_t *layers, IV n,
                          const char *mode,
                          int fd, int imode, int perm,
                          PerlIO *old,
                          int narg, SV **args);
 IV         (*Binmode)(pTHX_ PerlIO *f);
 SV *       (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags)
 IV         (*Fileno)(pTHX_ PerlIO *f);
 PerlIO *   (*Dup)(pTHX_ PerlIO *f,
                         PerlIO *o,
                         CLONE_PARAMS *param,
                         int flags)
 /* Unix-like functions - cf sfio line disciplines */
 SSize_t    (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count);
 SSize_t    (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
 SSize_t    (*Write)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
 IV         (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence);
 Off_t      (*Tell)(pTHX_ PerlIO *f);
 IV         (*Close)(pTHX_ PerlIO *f);
 /* Stdio-like buffered IO functions */
 IV         (*Flush)(pTHX_ PerlIO *f);
 IV         (*Fill)(pTHX_ PerlIO *f);
 IV         (*Eof)(pTHX_ PerlIO *f);
 IV         (*Error)(pTHX_ PerlIO *f);
 void       (*Clearerr)(pTHX_ PerlIO *f);
 void       (*Setlinebuf)(pTHX_ PerlIO *f);
 /* Perl's snooping functions */
 STDCHAR *  (*Get_base)(pTHX_ PerlIO *f);
 Size_t     (*Get_bufsiz)(pTHX_ PerlIO *f);
 STDCHAR *  (*Get_ptr)(pTHX_ PerlIO *f);
 SSize_t    (*Get_cnt)(pTHX_ PerlIO *f);
 void       (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt);
};

The first few members of the struct give a function table size for compatibility check "name" for the layer, the size to malloc for the per-instance data, and some flags which are attributes of the class as whole (such as whether it is a buffering layer), then follow the functions which fall into four basic groups:

  1. Opening and setup functions

  2. Basic IO operations

  3. Stdio class buffering options.

  4. Functions to support Perl's traditional "fast" access to the buffer.

A layer does not have to implement all the functions, but the whole table has to be present. Unimplemented slots can be NULL (which will result in an error when called) or can be filled in with stubs to "inherit" behaviour from a "base class". This "inheritance" is fixed for all instances of the layer, but as the layer chooses which stubs to populate the table, limited "multiple inheritance" is possible.

Per-instance Data

The per-instance data are held in memory beyond the basic PerlIOl struct, by making a PerlIOl the first member of the layer's struct thus:

typedef struct
{
 struct _PerlIO base;       /* Base "class" info */
 STDCHAR *	buf;        /* Start of buffer */
 STDCHAR *	end;        /* End of valid part of buffer */
 STDCHAR *	ptr;        /* Current position in buffer */
 Off_t		posn;       /* Offset of buf into the file */
 Size_t		bufsiz;     /* Real size of buffer */
 IV		oneword;    /* Emergency buffer */
} PerlIOBuf;

In this way (as for perl's scalars) a pointer to a PerlIOBuf can be treated as a pointer to a PerlIOl.

Layers in action.

             table           perlio          unix
         |           |
         +-----------+    +----------+    +--------+
PerlIO ->|           |--->|  next    |--->|  NULL  |
         +-----------+    +----------+    +--------+
         |           |    |  buffer  |    |   fd   |
         +-----------+    |          |    +--------+
         |           |    +----------+

The above attempts to show how the layer scheme works in a simple case. The application's PerlIO * points to an entry in the table(s) representing open (allocated) handles. For example the first three slots in the table correspond to stdin,stdout and stderr. The table in turn points to the current "top" layer for the handle - in this case an instance of the generic buffering layer "perlio". That layer in turn points to the next layer down - in this case the low-level "unix" layer.

The above is roughly equivalent to a "stdio" buffered stream, but with much more flexibility:

Per-instance flag bits

The generic flag bits are a hybrid of O_XXXXX style flags deduced from the mode string passed to PerlIO_open(), and state bits for typical buffer layers.

PERLIO_F_EOF

End of file.

PERLIO_F_CANWRITE

Writes are permitted, i.e. opened as "w" or "r+" or "a", etc.

PERLIO_F_CANREAD

Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick).