Journey into OpenGL: Transformations
JiOGL
As a preface, let us begin by first learning homogenous coordinates. They are similar to our familiar Cartesian coordinates, except they allow us to also denote points at infinity. You get a homogenous coordinate system by simply adding another dimension. For a two-dimensional plane it is Z, and for three-dimensional space - W. For any point given in Cartesian coordinates (x, y, z), there are infinitely many corresponding homogenous coordinates: (x/w, y/w, z/w, w) for any w. When w = 0, however, this no longer holds. Instead, it is a point at infinity.
While any w is a valid homogenous coordinate, in computer graphics you usually only come across two: 0 and 1. When it is zero, it is a vector, and when it is one - it is a point in the regular Cartesian sense. I won't go over every use of homogenous coordinates, though I will mention that the perspective projection of a point is basically (x/z, y/z, z/z).
When you draw a model, typically you want the model to move somehow. It might move on the X, Y, Z axes, or it might rotate this or that way. It might even inflate in size, who knows. All of these transformations can be done by changing the positions of individual vertices.
A translation operation can be defined as a simple 3-dimensional vector. We can write down this operation in pseudo-code as so:
for each vertex v:
v <= v + translation
Each operator has an operand which makes no effect, called the identity. For translation, it is the zero vector (0, 0, 0).
Scaling is also simple. Here you multiply (component-wise) the vertex positions by a scaling vector. The identity for scaling should be obvious.
for each vertex v:
v <= v * scaling
It is important to note, that scaling occurs with respect to a center point, which is unchanged by the operation. In the above pseudo-code, the center is the origin point (0, 0, 0). If you wish to scale with a different center point, you must first translate it to the origin point, apply the scaling operator, then translate it back. Models, however, are usually made with their center already being the origin point. When done so you can cut away translations.
Rotation is more tricky. In a two-dimensional scene rotation is enough with one angle, but in 3D we need three components. The most common form you find in UIs are Tait-Bryan angles (often incorrectly called Euler angles), which are three rotations around three different axes. They certainly seem simple, but that's a facade. Additionally, Tait-Bryan angles are prone to what is known as gimbal lock, where the model suddenly loses an entire degree of freedom. Such cases should be taken as catastrophic. We shall throw this one in the bin, and go over two other rotational structures.
First are quaternions. Because they're actually for more than just rotation, there are many ways to formulate these, none are that intuitive. For our purposes, my favorite one is relating it to rotation around an axis. Choose a (normalized) axis of rotation (x, y, z), and an angle α. Then the equivalent quaternion is the 4D vector (x * sin(α/2), y * sin(α/2), z * sin(α/2), cos(α/2)). To be honest, quaternions are a whole field, so have this very nicely documented code:
for each vertex v:
v <= Quaternion.rotate(v, q)
Second are rotation matrices. These are triplets of 3D vectors, one for each axis.
/ Rx1 Rx2 Rx3 \
| Ry1 Ry2 Ry3 |
\ Rz1 Rz2 Rz3 /
The intuition behind this is that a matrix effectively changes a vertex's basis, from the familiar Cartesian (1, 0, 0), (0, 1, 0), (0, 0, 1) to (Rx1, Ry1, Rz1), (Rx2, Ry2, Rz2), (Rx3, Ry3, Rz3). A vertex before and after transformation will have similar "relations" between its old and new bases, so when the new basis has perpendicular vectors of length 1, it's equivalent to rotation.
An example would be easier to show in two dimensions, where rotation can be described with one angle. Here is the generic 2D rotation matrix defined by the angle α, and to its right is when α = 180°.
/ cos(α) sin(α) \ / -1 0 \
\ -sin(α) cos(α) /, \ 0 -1 /
The matrix changes the basis from (1, 0), (0, 1) to (-1, 0), (0, -1), which means to mirror the X and Y axes, and that is exactly what rotating by 180 degrees does.
The identity matrix is also the identity of rotation. Rotations may also be neatly combined by simply multiplying them in the order you wish. If you multiply the 180-degree rotation matrix by itself, you'll get the identity matrix, i.e. no rotation.
Like scaling, rotation also occurs relative to the origin.
The above structures are all important and have their uses, but they are toys in comparison to the mother of them all: The Transformation Matrix. Why, you may ask? I'm excited to show you. First, let us see what it is capable of. Can we write down the translation operator with a matrix? We need a matrix M, for which M * v = v + translation. As it turns out, we can if we use homogenous coordinates, specifically when v = (Vx, Vy, Vz, 1), translation = (Tx, Ty, Tz, 1).
/ 1 0 0 Tx \
| 0 1 0 Ty | * v = v + (Tx, Ty, Tz, 0) = (Vx + Tx, Vy + Ty, Vz + Tz, 1)
| 0 0 1 Tz |
\ 0 0 0 1 /
This is easy to prove if you know matrix multiplication, which you should.
Rotation is possible by simply extending the rotation matrix into 4x4. Note the three columns can be viewed as vectors in homogenous coordinates.
/ Rx1 Rx2 Rx3 0 \
| Ry1 Ry2 Ry3 0 | * v = (Rx1 * Vx + Rx2 * Vy + Rx3 * Vz, Ry1 * Vy + Ry2 * Vy + Ry3 + Vz, Rz1 * Vx + Rz2 * Vy + Rz3 * Vz, 1)
| Rz1 Rz2 Rz3 0 |
\ 0 0 0 1 /
Scaling is also possible, like so:
/ Sx 0 0 0 \
| 0 Sy 0 0 | * v = ...
| 0 0 Sz 0 |
\ 0 0 0 1 /
Now for the grand finale. What makes the transformation matrix so powerful is that it can accumulate translations: if you multiply two matrices, their effects will add. This also implies the matrix's ability to store all three operations in one! In fact, any transformation matrix can be viewed as a combination of translation, rotation and scaling:
/ Rx1*Sx Rx2*Sy Rx3*Sz Tx \
| Ry1*Sx Ry2*Sy Ry3*Sz Ty |
| Rz1*Sx Rz2*Sy Rz3*Sz Tz |
\ 0 0 0 1 /
You can see it for yourself by, say, setting some parts of the transformation to identity. And if you set all parts to identity, you will get the 4-order identity matrix.
Not only that, but the inverse of a transformation matrix also gives its inverse effects! It's beautiful, really.
Matrix operations are non-commutative, which is to say the order of operations are important! To the right is an example of this. Furthermore, operations have a certain quirk: the effect produced by the product of transformation matrices A, B, C, etc. appears as though the effects occur in reverse (..., C then B then A). So although the orange cube is a result of rotation before translation, when laid out with matrices, it is a translation matrix times a rotation matrix.
Lastly, all three of these operations are affine. This means if you apply any combination of these operators onto a line, it will result in another line. It is impossible to get a curve from a line, or a line from a curve. This makes some effects impossible to perform with transformation matrices, such as a fisheye distortion effect. Nonetheless, they remain powerful for most uses.
Understanding transformations will greatly assist us onward.
Which data structures store translation?
Translation vectors & transformation matrices.
Which data structures store rotation?
Quaternions, rotation matrices & transformation matrices.
Which data structures store scaling?
Scaling vectors, quaternions (some) & transformation matrices.
What is an affine transformation?
One which maps straight lines only to straight lines.
Why are Tait-Bryan angles and Euler angles discouraged?
They're a real dang mess
What is the advantage of a transformation matrix?
It is simple to combine or invert operations.
