Creating Extension Libraries for Ruby

This document explains how to make extension libraries for Ruby.

Basic Knowledge

In C, variables have types and data do not have types. In contrast, Ruby variables do not have a static type, and data themselves have types, so data will need to be converted between the languages.

Data in Ruby are represented by the C type VALUE. Each VALUE data has its data type.

To retrieve C data from a VALUE, you need to:

  1. Identify the VALUE’s data type
  2. Convert the VALUE into C data

Converting to the wrong data type may cause serious problems.

Data Types

The Ruby interpreter has the following data types:

  • T_NIL: nil
  • T_OBJECT: ordinary object
  • T_CLASS: class
  • T_MODULE: module
  • T_FLOAT: floating point number
  • T_STRING: string
  • T_REGEXP: regular expression
  • T_ARRAY: array
  • T_HASH: associative array
  • T_STRUCT: (Ruby) structure
  • T_BIGNUM: multi precision integer
  • T_FIXNUM: Fixnum(31bit or 63bit integer)
  • T_COMPLEX: complex number
  • T_RATIONAL: rational number
  • T_FILE: IO
  • T_TRUE: true
  • T_FALSE: false
  • T_DATA: data
  • T_SYMBOL: symbol

In addition, there are several other types used internally:

  • T_ICLASS: included module
  • T_MATCH: MatchData object
  • T_UNDEF: undefined
  • T_NODE: syntax tree node
  • T_ZOMBIE: object awaiting finalization

Most of the types are represented by C structures.

Check Data Type of the VALUE

The macro TYPE() defined in ruby.h shows the data type of the VALUE. TYPE() returns the constant number T_XXXX described above. To handle data types, your code will look something like this:

switch (TYPE(obj)) {
  case T_FIXNUM:
    /* process Fixnum */
    break;
  case T_STRING:
    /* process String */
    break;
  case T_ARRAY:
    /* process Array */
    break;
  default:
    /* raise exception */
    rb_raise(rb_eTypeError, "not valid value");
    break;
}

There is the data type check function

void Check_Type(VALUE value, int type)

which raises an exception if the VALUE does not have the type specified.

There are also faster check macros for fixnums and nil.

FIXNUM_P(obj)
NIL_P(obj)

Convert VALUE into C Data

The data for type T_NIL, T_FALSE, T_TRUE are nil, false, true respectively. They are singletons for the data type. The equivalent C constants are: Qnil, Qfalse, Qtrue. Note that Qfalse is false in C also (i.e. 0), but not Qnil.

The T_FIXNUM data is a 31bit or 63bit length fixed integer. This size depends on the size of long: if long is 32bit then T_FIXNUM is 31bit, if long is 64bit then T_FIXNUM is 63bit. T_FIXNUM can be converted to a C integer by using the FIX2INT() macro or FIX2LONG(). Though you have to check that the data is really FIXNUM before using them, they are faster. FIX2LONG() never raises exceptions, but FIX2INT() raises RangeError if the result is bigger or smaller than the size of int. There are also NUM2INT() and NUM2LONG() which converts any Ruby numbers into C integers. These macros include a type check, so an exception will be raised if the conversion failed. NUM2DBL() can be used to retrieve the double float value in the same way.

You can use the macros StringValue() and StringValuePtr() to get a char* from a VALUE. StringValue(var) replaces var’s value with the result of “var.to_str()”. StringValuePtr(var) does the same replacement and returns the char* representation of var. These macros will skip the replacement if var is a String. Notice that the macros take only the lvalue as their argument, to change the value of var in place.

You can also use the macro named StringValueCStr(). This is just like StringValuePtr(), but always adds a NUL character at the end of the result. If the result contains a NUL character, this macro causes the ArgumentError exception. StringValuePtr() doesn’t guarantee the existence of a NUL at the end of the result, and the result may contain NUL.

Other data types have corresponding C structures, e.g. struct RArray for T_ARRAY etc. The VALUE of the type which has the corresponding structure can be cast to retrieve the pointer to the struct. The casting macro will be of the form RXXXX for each data type; for instance, RARRAY(obj). See “ruby.h”. However, we do not recommend to access RXXXX data directly because these data structures are complex. Use corresponding rb_xxx() functions to access the internal struct. For example, to access an entry of array, use rb_ary_entry(ary, offset) and rb_ary_store(ary, offset, obj).

There are some accessing macros for structure members, for example RSTRING_LEN(str) to get the size of the Ruby String object. The allocated region can be accessed by RSTRING_PTR(str).

Notice: Do not change the value of the structure directly, unless you are responsible for the result. This ends up being the cause of interesting bugs.

Convert C Data into VALUE

To convert C data to Ruby values:

  • FIXNUM: left shift 1 bit, and turn on its least significant bit (LSB).

  • Other pointer values: cast to VALUE.

You can determine whether a VALUE is a pointer or not by checking its LSB.

Notice: Ruby does not allow arbitrary pointer values to be a VALUE. They should be pointers to the structures which Ruby knows about. The known structures are defined in .

To convert C numbers to Ruby values, use these macros:

  • INT2FIX(): for integers within 31bits.
  • INT2NUM(): for arbitrary sized integers.

INT2NUM() converts an integer into a Bignum if it is out of the FIXNUM range, but is a bit slower.

Manipulating Ruby Data

As I already mentioned, it is not recommended to modify an object’s internal structure. To manipulate objects, use the functions supplied by the Ruby interpreter. Some (not all) of the useful functions are listed below:

String Functions
  • rb_str_new(const char *ptr, long len): Creates a new Ruby string.

rb_str_new2(const char *ptr)

  • rb_str_new_cstr(const char *ptr): Creates a new Ruby string from a C string. This is equivalent to rb_str_new(ptr, strlen(ptr)).

  • rb_str_new_literal(const char *ptr): Creates a new Ruby string from a C string literal.

rb_sprintf(const char *format, …)

  • rb_vsprintf(const char *format, va_list ap): Creates a new Ruby string with printf(3) format.

    Note: In the format string, “%”PRIsVALUE can be used for Object#to_s (or Object#inspect if ‘+’ flag is set) output (and related argument must be a VALUE). Since it conflicts with “%i”, for integers in format strings, use “%d”.

  • rb_str_append(VALUE str1, VALUE str2): Appends Ruby string str2 to Ruby string str1.

  • rb_str_cat(VALUE str, const char *ptr, long len): Appends len bytes of data from ptr to the Ruby string.

rb_str_cat2(VALUE str, const char* ptr)

  • rb_str_cat_cstr(VALUE str, const char* ptr): Appends C string ptr to Ruby string str. This function is equivalent to rb_str_cat(str, ptr, strlen(ptr)).

rb_str_catf(VALUE str, const char* format, …)

  • rb_str_vcatf(VALUE str, const char* format, va_list ap): Appends C string format and successive arguments to Ruby string str according to a printf-like format. These functions are equivalent to rb_str_append(str, rb_sprintf(format, …)) and rb_str_append(str, rb_vsprintf(format, ap)), respectively.

rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)

  • rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc): Creates a new Ruby string with the specified encoding.

  • rb_enc_str_new_literal(const char *ptr, rb_encoding *enc): Creates a new Ruby string from a C string literal with the specified encoding.

rb_usascii_str_new(const char *ptr, long len)

  • rb_usascii_str_new_cstr(const char *ptr): Creates a new Ruby string with encoding US-ASCII.

  • rb_usascii_str_new_literal(const char *ptr): Creates a new Ruby string from a C string literal with encoding US-ASCII.

rb_utf8_str_new(const char *ptr, long len)

  • rb_utf8_str_new_cstr(const char *ptr): Creates a new Ruby string with encoding UTF-8.

  • rb_utf8_str_new_literal(const char *ptr): Creates a new Ruby string from a C string literal with encoding UTF-8.

  • rb_str_resize(VALUE str, long len): Resizes a Ruby string to len bytes. If str is not modifiable, this function raises an exception. The length of str must be set in advance. If len is less than the old length the content beyond len bytes is discarded, else if len is greater than the old length the content beyond the old length bytes will not be preserved but will be garbage. Note that RSTRING_PTR(str) may change by calling this function.

  • rb_str_set_len(VALUE str, long len): Sets the length of a Ruby string. If str is not modifiable, this function raises an exception. This function preserves the content up to len bytes, regardless RSTRING_LEN(str). len must not exceed the capacity of str.

  • rb_str_modify(VALUE str): Prepares a Ruby string to modify. If str is not modifiable, this function raises an exception, or if the buffer of str is shared, this function allocates new buffer to make it unshared. Always you MUST call this function before modifying the contents using RSTRING_PTR and/or rb_str_set_len.

Array Functions
  • rb_ary_new(): Creates an array with no elements.

rb_ary_new2(long len)

  • rb_ary_new_capa(long len): Creates an array with no elements, allocating internal buffer for len elements.

rb_ary_new3(long n, …)

  • rb_ary_new_from_args(long n, …): Creates an n-element array from the arguments.

rb_ary_new4(long n, VALUE *elts)

  • rb_ary_new_from_values(long n, VALUE *elts): Creates an n-element array from a C array.

  • rb_ary_to_ary(VALUE obj): Converts the object into an array. Equivalent to Object#to_ary.

There are many functions to operate an array. They may dump core if other types are given.

  • rb_ary_aref(int argc, const VALUE *argv, VALUE ary): Equivalent to Array#[].

  • rb_ary_entry(VALUE ary, long offset): ary[offset]

  • rb_ary_store(VALUE ary, long offset, VALUE obj): ary[offset] = obj

  • rb_ary_subseq(VALUE ary, long beg, long len): ary[beg, len]

rb_ary_push(VALUE ary, VALUE val) rb_ary_pop(VALUE ary) rb_ary_shift(VALUE ary)

  • rb_ary_unshift(VALUE ary, VALUE val): ary.push, ary.pop, ary.shift, ary.unshift

  • rb_ary_cat(VALUE ary, const VALUE *ptr, long len): Appends len elements of objects from ptr to the array.

Extending Ruby with C

Adding New Features to Ruby

You can add new features (classes, methods, etc.) to the Ruby interpreter. Ruby provides APIs for defining the following things:

  • Classes, Modules
  • Methods, Singleton Methods
  • Constants
Class and Module Definition

To define a class or module, use the functions below:

VALUE rb_define_class(const char *name, VALUE super)
VALUE rb_define_module(const char *name)

These functions return the newly created class or module. You may want to save this reference into a variable to use later.

To define nested classes or modules, use the functions below:

VALUE rb_define_class_under(VALUE outer, const char *name, VALUE super)
VALUE rb_define_module_under(VALUE outer, const char *name)
Method and Singleton Method Definition

To define methods or singleton methods, use these functions:

void rb_define_method(VALUE klass, const char *name,
                      VALUE (*func)(ANYARGS), int argc)

void rb_define_singleton_method(VALUE object, const char *name,
                                VALUE (*func)(ANYARGS), int argc)

The argc represents the number of the arguments to the C function, which must be less than 17. But I doubt you’ll need that many.

If argc is negative, it specifies the calling sequence, not number of the arguments.

If argc is -1, the function will be called as:

VALUE func(int argc, VALUE *argv, VALUE obj)

where argc is the actual number of arguments, argv is the C array of the arguments, and obj is the receiver.

If argc is -2, the arguments are passed in a Ruby array. The function will be called like:

VALUE func(VALUE obj, VALUE args)

where obj is the receiver, and args is the Ruby array containing actual arguments.

There are some more functions to define methods. One takes an ID as the name of method to be defined. See also ID or Symbol below.

void rb_define_method_id(VALUE klass, ID name,
                         VALUE (*func)(ANYARGS), int argc)

There are two functions to define private/protected methods:

void rb_define_private_method(VALUE klass, const char *name,
                              VALUE (*func)(ANYARGS), int argc)
void rb_define_protected_method(VALUE klass, const char *name,
                                VALUE (*func)(ANYARGS), int argc)

At last, rb_define_module_function defines a module function, which are private AND singleton methods of the module. For example, sqrt is a module function defined in the Math module. It can be called in the following way:

Math.sqrt(4)

or

include Math
sqrt(4)

To define module functions, use:

void rb_define_module_function(VALUE module, const char *name,
                               VALUE (*func)(ANYARGS), int argc)

In addition, function-like methods, which are private methods defined in the Kernel module, can be defined using:

void rb_define_global_function(const char *name, VALUE (*func)(ANYARGS), int argc)

To define an alias for the method,

void rb_define_alias(VALUE module, const char* new, const char* old);

To define a reader/writer for an attribute,

void rb_define_attr(VALUE klass, const char *name, int read, int write)

To define and undefine the allocate class method,

void rb_define_alloc_func(VALUE klass, VALUE (*func)(VALUE klass));
void rb_undef_alloc_func(VALUE klass);

func has to take the klass as the argument and return a newly allocated instance. This instance should be as empty as possible, without any expensive (including external) resources.

If you are overriding an existing method of any ancestor of your class, you may rely on:

VALUE rb_call_super(int argc, const VALUE *argv)

To specify whether keyword arguments are passed when calling super:

VALUE rb_call_super(int argc, const VALUE *argv, int kw_splat)

kw_splat can have these possible values (used by all methods that accept kw_splat argument):

  • RB_NO_KEYWORDS: Do not pass keywords
  • RB_PASS_KEYWORDS: Pass keywords, final argument should be a hash of keywords
  • RB_PASS_EMPTY_KEYWORDS: Pass empty keywords (not included in arguments) (this will be removed in Ruby 3.0)

  • RB_PASS_CALLED_KEYWORDS: Pass keywords if current method was called with keywords, useful for argument delegation

To achieve the receiver of the current scope (if no other way is available), you can use:

VALUE rb_current_receiver(void)
Constant Definition

We have 2 functions to define constants:

void rb_define_const(VALUE klass, const char *name, VALUE val)
void rb_define_global_const(const char *name, VALUE val)

The former is to define a constant under specified class/module. The latter is to define a global constant.

Use Ruby Features from C

There are several ways to invoke Ruby’s features from C code.

Evaluate Ruby Programs in a String

The easiest way to use Ruby’s functionality from a C program is to evaluate the string as Ruby program. This function will do the job:

VALUE rb_eval_string(const char *str)

Evaluation is done under the current context, thus current local variables of the innermost method (which is defined by Ruby) can be accessed.

Note that the evaluation can raise an exception. There is a safer function:

VALUE rb_eval_string_protect(const char *str, int *state)

It returns nil when an error occurred. Moreover, *state is zero if str was successfully evaluated, or nonzero otherwise.

ID or Symbol

You can invoke methods directly, without parsing the string. First I need to explain about ID. ID is the integer number to represent Ruby’s identifiers such as variable names. The Ruby data type corresponding to ID is Symbol. It can be accessed from Ruby in the form:

:Identifier

or

:"any kind of string"

You can get the ID value from a string within C code by using

rb_intern(const char *name)
rb_intern_str(VALUE name)

You can retrieve ID from Ruby object (Symbol or String) given as an argument by using

rb_to_id(VALUE symbol)
rb_check_id(volatile VALUE *name)
rb_check_id_cstr(const char *name, long len, rb_encoding *enc)

These functions try to convert the argument to a String if it was not a Symbol nor a String. The second function stores the converted result into *name, and returns 0 if the string is not a known symbol. After this function returned a non-zero value, *name is always a Symbol or a String, otherwise it is a String if the result is 0. The third function takes NUL-terminated C string, not Ruby VALUE.

You can retrieve Symbol from Ruby object (Symbol or String) given as an argument by using

rb_to_symbol(VALUE name)
rb_check_symbol(volatile VALUE *namep)
rb_check_symbol_cstr(const char *ptr, long len, rb_encoding *enc)

These functions are similar to above functions except that these return a Symbol instead of an ID.

You can convert C ID to Ruby Symbol by using

VALUE ID2SYM(ID id)

and to convert Ruby Symbol object to ID, use

ID SYM2ID(VALUE symbol)
Invoke Ruby Method from C

To invoke methods directly, you can use the function below

VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)

This function invokes a method on the recv, with the method name specified by the symbol mid.

Accessing the Variables and Constants

You can access class variables and instance variables using access functions. Also, global variables can be shared between both environments. There’s no way to access Ruby’s local variables.

The functions to access/modify instance variables are below:

VALUE rb_ivar_get(VALUE obj, ID id)
VALUE rb_ivar_set(VALUE obj, ID id, VALUE val)

id must be the symbol, which can be retrieved by rb_intern().

To access the constants of the class/module:

VALUE rb_const_get(VALUE obj, ID id)

See also Constant Definition above.

Information Sharing Between Ruby and C

Ruby Constants That Can Be Accessed From C

As stated in section 1.3, the following Ruby constants can be referred from C.

Qtrue

  • Qfalse: Boolean values. Qfalse is false in C also (i.e. 0).

  • Qnil: Ruby nil in C scope.

Global Variables Shared Between C and Ruby

Information can be shared between the two environments using shared global variables. To define them, you can use functions listed below:

void rb_define_variable(const char *name, VALUE *var)

This function defines the variable which is shared by both environments. The value of the global variable pointed to by var can be accessed through Ruby’s global variable named name.

You can define read-only (from Ruby, of course) variables using the function below.

void rb_define_readonly_variable(const char *name, VALUE *var)

You can define hooked variables. The accessor functions (getter and setter) are called on access to the hooked variables.

void rb_define_hooked_variable(const char *name, VALUE *var,
                               VALUE (*getter)(), void (*setter)())

If you need to supply either setter or getter, just supply 0 for the hook you don’t need. If both hooks are 0, rb_define_hooked_variable() works just like rb_define_variable().

The prototypes of the getter and setter functions are as follows:

VALUE (*getter)(ID id, VALUE *var);
void (*setter)(VALUE val, ID id, VALUE *var);

Also you can define a Ruby global variable without a corresponding C variable. The value of the variable will be set/get only by hooks.

void rb_define_virtual_variable(const char *name,
                                VALUE (*getter)(), void (*setter)())

The prototypes of the getter and setter functions are as follows:

VALUE (*getter)(ID id);
void (*setter)(VALUE val, ID id);

Encapsulate C Data into a Ruby Object