by Markus Winand.

Clustering Data

The Second Power of Indexing


The term cluster is used in various fields. A star cluster, for example, is a group of stars. A computer cluster, on the other hand, is a group of computers that work closely together—either to solve a complex problem (high-performance computing cluster) or to increase availability (failover cluster). Generally speaking, clusters are related things that appear together.

In the field of computing there is one more type of cluster—one that is often misunderstood: the data cluster. Clustering data means to store consecutively accessed data closely together so that accessing it requires fewer IO operations. Data clusters are very important in terms of database tuning. Computer clusters, on the other hand, are also very common in a database context—thus making the term cluster very ambiguous. The sentence “Let’s use a cluster to improve database performance” is just one example; it might refer to a computer cluster but could also mean a data cluster. In this chapter, cluster generally refers to data clusters.

If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or join a training.

The simplest data cluster in an SQL database is the row. Databases store all columns of a row in the same database block if possible. Exceptions apply if a row doesn’t fit into a single block—e.g., when LOB types are involved.

Column Stores

Column oriented databases, or column-stores, organize tables in a columned way. This model is beneficial when accessing many rows but only a few columns—a pattern that is very common in data warehouses (OLAP).

Indexes allow one to cluster data. The basis for this was already explained in Chapter 1, “Anatomy of an SQL Index: the index leaf nodes store the indexed columns in an ordered fashion so that similar values are stored next to each other. That means that indexes build clusters of rows with similar values. This capability to cluster data is so important that I refer to it as the second power of indexing.

Note

The B-tree traversal is the first power of indexing.

Clustering is the second power of indexing.

The following sections explain how to use indexes to cluster data and improve query performance.

Contents

  1. Index Filter Predicates Intentionally Used — to tune LIKE

  2. Index-Only Scan — Avoiding table access

  3. Index-Organized Table — Clustered indexes without tables

Previous pageNext page

You can’t learn everything in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up. Have a look at modern-⁠sql.com as well.

About the Author

Photo of Markus Winand

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Buy the Book

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy now!
(paperback and/or PDF)

Paperback also available at Amazon.com.

Hire Markus

Markus offers SQL training and consulting for developers working at companies of all sizes.
Learn more »

Connect with Markus Winand

Subscribe mailinglistsSubscribe the RSS feedMarkus Winand on LinkedInMarkus Winand on XINGMarkus Winand on TwitterMarkus Winand on Bluesky
Copyright 2010-2025 Markus Winand. All righs reserved.
Legal | Contact | NO WARRANTY | Trademarks | Privacy and GDPR