Python Types and C-Structures¶
Several new types are defined in the C-code. Most of these are
accessible from Python, but a few are not exposed due to their limited
use. Every new Python type has an associated PyObject *
with an
internal structure that includes a pointer to a “method table” that
defines how the new object behaves in Python. When you receive a
Python object into C code, you always get a pointer to a
PyObject
structure. Because a PyObject
structure is
very generic and defines only PyObject_HEAD
, by itself it
is not very interesting. However, different objects contain more
details after the PyObject_HEAD
(but you have to cast to the
correct type to access them — or use accessor functions or macros).
New Python Types Defined¶
Python types are the functional equivalent in C of classes in Python. By constructing a new Python type you make available a new object for Python. The ndarray object is an example of a new type defined in C. New types are defined in C by two basic steps:
- creating a C-structure (usually named
Py{Name}Object
) that is binary- compatible with thePyObject
structure itself but holds the additional information needed for that particular object; - populating the
PyTypeObject
table (pointed to by the ob_type member of thePyObject
structure) with pointers to functions that implement the desired behavior for the type.
Instead of special method names which define behavior for Python classes, there are “function tables” which point to functions that implement the desired results. Since Python 2.2, the PyTypeObject itself has become dynamic which allows C types that can be “sub-typed “from other C-types in C, and sub-classed in Python. The children types inherit the attributes and methods from their parent(s).
There are two major new types: the ndarray ( PyArray_Type
)
and the ufunc ( PyUFunc_Type
). Additional types play a
supportive role: the PyArrayIter_Type
, the
PyArrayMultiIter_Type
, and the PyArrayDescr_Type
. The PyArrayIter_Type
is the type for a flat iterator for an
ndarray (the object that is returned when getting the flat
attribute). The PyArrayMultiIter_Type
is the type of the
object returned when calling broadcast
(). It handles iteration
and broadcasting over a collection of nested sequences. Also, the
PyArrayDescr_Type
is the data-type-descriptor type whose
instances describe the data. Finally, there are 21 new scalar-array
types which are new Python scalars corresponding to each of the
fundamental data types available for arrays. An additional 10 other
types are place holders that allow the array scalars to fit into a
hierarchy of actual Python types.
PyArray_Type¶
-
PyArray_Type
¶ The Python type of the ndarray is
PyArray_Type
. In C, every ndarray is a pointer to aPyArrayObject
structure. The ob_type member of this structure contains a pointer to thePyArray_Type
typeobject.
-
PyArrayObject
¶ The
PyArrayObject
C-structure contains all of the required information for an array. All instances of an ndarray (and its subclasses) will have this structure. For future compatibility, these structure members should normally be accessed using the provided macros. If you need a shorter name, then you can make use ofNPY_AO
which is defined to be equivalent toPyArrayObject
.typedef struct PyArrayObject { PyObject_HEAD char *data; int nd; npy_intp *dimensions; npy_intp *strides; PyObject *base; PyArray_Descr *descr; int flags; PyObject *weakreflist; } PyArrayObject;
-
char *
PyArrayObject.data
¶ A pointer to the first element of the array. This pointer can (and normally should) be recast to the data type of the array.
-
int
PyArrayObject.nd
¶ An integer providing the number of dimensions for this array. When nd is 0, the array is sometimes called a rank-0 array. Such arrays have undefined dimensions and strides and cannot be accessed.
NPY_MAXDIMS
is the largest number of dimensions for any array.
-
npy_intp
PyArrayObject.dimensions
¶ An array of integers providing the shape in each dimension as long as nd
1. The integer is always large enough to hold a pointer on the platform, so the dimension size is only limited by memory.
-
npy_intp *
PyArrayObject.strides
¶ An array of integers providing for each dimension the number of bytes that must be skipped to get to the next element in that dimension.
-
PyObject *
PyArrayObject.base
¶ This member is used to hold a pointer to another Python object that is related to this array. There are two use cases: 1) If this array does not own its own memory, then base points to the Python object that owns it (perhaps another array object), 2) If this array has the (deprecated)
NPY_ARRAY_UPDATEIFCOPY
or :c:data:NPY_ARRAY_WRITEBACKIFCOPY`: flag set, then this array is a working copy of a “misbehaved” array. WhenPyArray_ResolveWritebackIfCopy
is called, the array pointed to by base will be updated with the contents of this array.
-
PyArray_Descr *
PyArrayObject.descr
¶ A pointer to a data-type descriptor object (see below). The data-type descriptor object is an instance of a new built-in type which allows a generic description of memory. There is a descriptor structure for each data type supported. This descriptor structure contains useful information about the type as well as a pointer to a table of function pointers to implement specific functionality.
-
int
PyArrayObject.flags
¶ Flags indicating how the memory pointed to by data is to be interpreted. Possible flags are
NPY_ARRAY_C_CONTIGUOUS
,NPY_ARRAY_F_CONTIGUOUS
,NPY_ARRAY_OWNDATA
,NPY_ARRAY_ALIGNED
,NPY_ARRAY_WRITEABLE
,NPY_ARRAY_WRITEBACKIFCOPY
, andNPY_ARRAY_UPDATEIFCOPY
.
PyArrayDescr_Type¶
-
PyArrayDescr_Type
¶ The
PyArrayDescr_Type
is the built-in type of the data-type-descriptor objects used to describe how the bytes comprising the array are to be interpreted. There are 21 statically-definedPyArray_Descr
objects for the built-in data-types. While these participate in reference counting, their reference count should never reach zero. There is also a dynamic table of user-definedPyArray_Descr
objects that is also maintained. Once a data-type-descriptor object is “registered” it should never be deallocated either. The functionPyArray_DescrFromType
(...) can be used to retrieve aPyArray_Descr
object from an enumerated type-number (either built-in or user- defined).
-
PyArray_Descr
¶ The format of the
PyArray_Descr
structure that lies at the heart of thePyArrayDescr_Type
istypedef struct { PyObject_HEAD PyTypeObject *typeobj; char kind; char type; char byteorder; char unused; int flags; int type_num; int elsize; int alignment; PyArray_ArrayDescr *subarray; PyObject *fields; PyArray_ArrFuncs *f; } PyArray_Descr;
-
PyTypeObject *
PyArray_Descr.typeobj
¶ Pointer to a typeobject that is the corresponding Python type for the elements of this array. For the builtin types, this points to the corresponding array scalar. For user-defined types, this should point to a user-defined typeobject. This typeobject can either inherit from array scalars or not. If it does not inherit from array scalars, then the
NPY_USE_GETITEM
andNPY_USE_SETITEM
flags should be set in theflags
member.
-
char
PyArray_Descr.kind
¶ A character code indicating the kind of array (using the array interface typestring notation). A ‘b’ represents Boolean, a ‘i’ represents signed integer, a ‘u’ represents unsigned integer, ‘f’ represents floating point, ‘c’ represents complex floating point, ‘S’ represents 8-bit zero-terminated bytes, ‘U’ represents 32-bit/character unicode string, and ‘V’ represents arbitrary.
-
char
PyArray_Descr.type
¶ A traditional character code indicating the data type.
-
char
PyArray_Descr.byteorder
¶ A character indicating the byte-order: ‘>’ (big-endian), ‘<’ (little- endian), ‘=’ (native), ‘|’ (irrelevant, ignore). All builtin data- types have byteorder ‘=’.
-
int
PyArray_Descr.flags
¶ A data-type bit-flag that determines if the data-type exhibits object- array like behavior. Each bit in this member is a flag which are named as:
-
NPY_ITEM_REFCOUNT
¶
-
NPY_ITEM_HASOBJECT
¶ Indicates that items of this data-type must be reference counted (using
Py_INCREF
andPy_DECREF
).
-
NPY_LIST_PICKLE
¶ Indicates arrays of this data-type must be converted to a list before pickling.
-
NPY_ITEM_IS_POINTER
¶ Indicates the item is a pointer to some other data-type
-
NPY_NEEDS_INIT
¶ Indicates memory for this data-type must be initialized (set to 0) on creation.
-
NPY_NEEDS_PYAPI
¶ Indicates this data-type requires the Python C-API during access (so don’t give up the GIL if array access is going to be needed).
-
NPY_USE_GETITEM
¶ On array access use the
f->getitem
function pointer instead of the standard conversion to an array scalar. Must use if you don’t define an array scalar to go along with the data-type.
-
NPY_USE_SETITEM
¶ When creating a 0-d array from an array scalar use
f->setitem
instead of the standard copy from an array scalar. Must use if you don’t define an array scalar to go along with the data-type.
-
NPY_FROM_FIELDS
¶ The bits that are inherited for the parent data-type if these bits are set in any field of the data-type. Currently (
NPY_NEEDS_INIT
|NPY_LIST_PICKLE
|NPY_ITEM_REFCOUNT
|NPY_NEEDS_PYAPI
).
-
NPY_OBJECT_DTYPE_FLAGS
¶ Bits set for the object data-type: (
NPY_LIST_PICKLE
|NPY_USE_GETITEM
|NPY_ITEM_IS_POINTER
|NPY_REFCOUNT
|NPY_NEEDS_INIT
|NPY_NEEDS_PYAPI
).
-
PyDataType_FLAGCHK
(PyArray_Descr *dtype, int flags)¶ Return true if all the given flags are set for the data-type object.
-
PyDataType_REFCHK
(PyArray_Descr *dtype)¶ Equivalent to
PyDataType_FLAGCHK
(dtype,NPY_ITEM_REFCOUNT
).
-
-
int
PyArray_Descr.type_num
¶ A number that uniquely identifies the data type. For new data-types, this number is assigned when the data-type is registered.
-
int
PyArray_Descr.elsize
¶ For data types that are always the same size (such as long), this holds the size of the data type. For flexible data types where different arrays can have a different elementsize, this should be 0.
-
int
PyArray_Descr.alignment
¶ A number providing alignment information for this data type. Specifically, it shows how far from the start of a 2-element structure (whose first element is a
char
), the compiler places an item of this type:offsetof(struct {char c; type v;}, v)
-
PyArray_ArrayDescr *
PyArray_Descr.subarray
¶ If this is non-
NULL
, then this data-type descriptor is a C-style contiguous array of another data-type descriptor. In other-words, each element that this descriptor describes is actually an array of some other base descriptor. This is most useful as the data-type descriptor for a field in another data-type descriptor. The fields member should beNULL
if this is non-NULL
(the fields member of the base descriptor can be non-NULL
however). ThePyArray_ArrayDescr
structure is defined usingtypedef struct { PyArray_Descr *base; PyObject *shape; } PyArray_ArrayDescr;
The elements of this structure are:
-
PyArray_Descr *
PyArray_ArrayDescr.base
¶ The data-type-descriptor object of the base-type.
-
PyArray_Descr *
-
PyObject *
PyArray_Descr.fields
¶ If this is non-NULL, then this data-type-descriptor has fields described by a Python dictionary whose keys are names (and also titles if given) and whose values are tuples that describe the fields. Recall that a data-type-descriptor always describes a fixed-length set of bytes. A field is a named sub-region of that total, fixed-length collection. A field is described by a tuple composed of another data- type-descriptor and a byte offset. Optionally, the tuple may contain a title which is normally a Python string. These tuples are placed in this dictionary keyed by name (and also title if given).
-
PyArray_ArrFuncs *
PyArray_Descr.f
¶ A pointer to a structure containing functions that the type needs to implement internal features. These functions are not the same thing as the universal functions (ufuncs) described later. Their signatures can vary arbitrarily.
-
PyArray_ArrFuncs
¶ Functions implementing internal features. Not all of these function pointers must be defined for a given type. The required members are
nonzero
,copyswap
,copyswapn
,setitem
,getitem
, andcast
. These are assumed to be non-NULL
andNULL
entries will cause a program crash. The other functions may beNULL
which will just mean reduced functionality for that data-type. (Also, the nonzero function will be filled in with a default function if it isNULL
when you register a user-defined data-type).typedef struct { PyArray_VectorUnaryFunc *cast[NPY_NTYPES]; PyArray_GetItemFunc *getitem; PyArray_SetItemFunc *setitem; PyArray_CopySwapNFunc *copyswapn; PyArray_CopySwapFunc *copyswap; PyArray_CompareFunc *compare; PyArray_ArgFunc *argmax; PyArray_DotFunc *dotfunc; PyArray_ScanFunc *scanfunc; PyArray_FromStrFunc *fromstr; PyArray_NonzeroFunc *nonzero; PyArray_FillFunc *fill; PyArray_FillWithScalarFunc *fillwithscalar; PyArray_SortFunc *sort[NPY_NSORTS]; PyArray_ArgSortFunc *argsort[NPY_NSORTS]; PyObject *castdict; PyArray_ScalarKindFunc *scalarkind; int **cancastscalarkindto; int *cancastto; PyArray_FastClipFunc *fastclip; PyArray_FastPutmaskFunc *fastputmask; PyArray_FastTakeFunc *fasttake; PyArray_ArgFunc *argmin; } PyArray_ArrFuncs;
The concept of a behaved segment is used in the description of the function pointers. A behaved segment is one that is aligned and in native machine byte-order for the data-type. The
nonzero
,copyswap
,copyswapn
,getitem
, andsetitem
functions can (and must) deal with mis-behaved arrays. The other functions require behaved memory segments.-
void
cast
(void *from, void *to, npy_intp n, void *fromarr, void *toarr)¶ An array of function pointers to cast from the current type to all of the other builtin types. Each function casts a contiguous, aligned, and notswapped buffer pointed at by from to a contiguous, aligned, and notswapped buffer pointed at by to The number of items to cast is given by n, and the arguments fromarr and toarr are interpreted as PyArrayObjects for flexible arrays to get itemsize information.
-
PyObject *
getitem
(void *data, void *arr)¶ A pointer to a function that returns a standard Python object from a single element of the array object arr pointed to by data. This function must be able to deal with “misbehaved “(misaligned and/or swapped) arrays correctly.
-
int
setitem
(PyObject *item, void *data, void *arr)¶ A pointer to a function that sets the Python object item into the array, arr, at the position pointed to by data . This function deals with “misbehaved” arrays. If successful, a zero is returned, otherwise, a negative one is returned (and a Python error set).
-
void
copyswapn
(void *dest, npy_intp dstride, void *src, npy_intp sstride, npy_intp n, int swap, void *arr)¶
-
void
copyswap
(void *dest, void *src, int swap, void *arr)¶ These members are both pointers to functions to copy data from src to dest and swap if indicated. The value of arr is only used for flexible (
NPY_STRING
,NPY_UNICODE
, and
-
void