Making an Audio Waveform Visualizer with Vanilla JavaScript

Matthew Ström on Updated on

As a UI designer, I’m constantly reminded of the value of knowing how to code. I pride myself on thinking of the developers on my team while designing user interfaces. But sometimes, I step on a technical landmine.

A few years ago, as the design director of wsj.com, I was helping to re-design the Wall Street Journal’s podcast directory. One of the designers on the project was working on the podcast player, and I came upon Megaphone’s embedded player.

I previously worked at SoundCloud and knew that these kinds of visualizations were useful for users who skip through audio. I wondered if we could achieve a similar look for the player on the Wall Street Journal’s site.

The answer from engineering: definitely not. Given timelines and restraints, it wasn’t a possibility for that project. We eventually shipped the redesigned pages with a much simpler podcast player.

But I was hooked on the problem. Over nights and weekends, I hacked away trying to achieve this effect. I learned a lot about how audio works on the web, and ultimately was able to achieve the look with less than 100 lines of JavaScript!

It turns out that this example is a perfect way to get acquainted with the Web Audio API, and how to visualize audio data using the Canvas API.

But first, a lesson in how digital audio works

In the real, analog world, sound is a wave. As sound travels from a source (like a speaker) to your ears, it compresses and decompresses air in a pattern that your ears and brain hear as music, or speech, or a dog’s bark, etc. etc.

An analog sound wave is a smooth, continuous function.

But in a computer’s world of electronic signals, sound isn’t a wave. To turn a smooth, continuous wave into data that it can store, computers do something called sampling. Sampling means measuring the sound waves hitting a microphone thousands of times every second, then storing those data points. When playing back audio, your computer reverses the process: it recreates the sound, one tiny split-second of audio at a time.

A digital sound file is made up of tiny slices of the original audio, roughly re-creating the smooth continuous wave.

The number of data points in a sound file depends on its sample rate. You might have seen this number before; the typical sample rate for mp3 files is 44.1 kHz. This means that, for every second of audio, there are 44,100 individual data points. For stereo files, there are 88,200 every second — 44,100 for the left channel, and 44,100 for the right. That means a 30-minute podcast has 158,760,000 individual data points describing the audio!

How can a web page read an mp3?

Over the past nine years, the W3C (the folks who help maintain web standards) have developed the Web Audio API to help web developers work with audio. The Web Audio API is a very deep topic; we’ll hardly crack the surface in this essay. But it all starts with something called the AudioContext.

Think of the AudioContext like a sandbox for working with audio. We can initialize it with a few lines of JavaScript:

// Set up audio context
window.AudioContext = window.AudioContext || window.webkitAudioContext;
const audioContext = new AudioContext();
let currentBuffer = null;

The first line after the comment is a necessary because Safari has implemented AudioContext as webkitAudioContext.

Next, we need to give our new audioContext the mp3 file we’d like to visualize. Let’s fetch it using… fetch()!

const visualizeAudio = url => {
  fetch(url)
    .then(response => response.arrayBuffer())
    .then(arrayBuffer => audioContext.decodeAudioData(arrayBuffer))
    .then(audioBuffer => visualize(audioBuffer));
};

This function takes a URL, fetches it, then transforms the Response object a few times.

  • First, it calls the arrayBuffer() method, which returns — you guessed it — an ArrayBuffer! An ArrayBuffer is just a container for binary data; it’s an efficient way to move lots of data around in JavaScript.
  • We then send the ArrayBuffer to our audioContext via the decodeAudioData() method. decodeAudioData() takes an ArrayBuffer and returns an AudioBuffer, which is a specialized ArrayBuffer for reading audio data. Did you know that browsers came with all these convenient objects? I definitely did not when I started this project.
  • Finally, we send our AudioBuffer off to be visualized.

Filtering the data

To visualize our AudioBuffer, we need to reduce the amount of data we’re working with. Like I mentioned before, we started off with millions of data points, but we’ll have far fewer in our final visualization.

First, let’s limit the channels we are working with. A channel represents the audio sent to an individual speaker. In stereo sound, there are two channels; in 5.1 surround sound, there are six. AudioBuffer has a built-in method to do this: getChannelData(). Call audioBuffer.getChannelData(0), and we’ll be left with one channel’s worth of data.

Next, the hard part: loop through the channel’s data, and select a smaller set of data points. There are a few ways we could go about this. Let’s say I want my final visualization to have 70 bars; I can divide up the audio data into 70 equal parts, and look at a data point from each one.

const filterData = audioBuffer => {
  const rawData = audioBuffer.getChannelData(0); // We only need to work with one channel of data
  const samples = 70; // Number of samples we want to have in our final data set
  const blockSize = Math.floor(rawData.length / samples); // Number of samples in each subdivision
  const filteredData = [];
  for (let i = 0; i < samples; i++) {
    filteredData.push(rawData[i * blockSize]); 
  }
  return filteredData;
}
This was the first approach I took. To get an idea of what the filtered data looks like, I put the result into a spreadsheet and charted it.

The output caught me off guard! It doesn’t look like the visualization we’re emulating at all. There are lots of data points that are close to, or at zero. But that makes a lot of sense: in a podcast, there is a lot of silence between words and sentences. By only looking at the first sample in each of our blocks, it’s highly likely that we’ll catch a very quiet moment.

Let’s modify the algorithm to find the average of the samples. And while we’re at it, we should take the absolute value of our data, so that it’s all positive.

const filterData = audioBuffer => {
  const rawData = audioBuffer.getChannelData(0); // We only need to work with one channel of data
  const samples = 70; // Number of samples we want to have in our final data set
  const blockSize = Math.floor(rawData.length / samples); // the number of samples in each subdivision
  const filteredData = [];
  for (let i = 0; i < samples; i++) {
    let blockStart = blockSize * i; // the location of the first sample in the block
    let sum = 0;
    for (let j = 0; j < blockSize; j++) {
      sum = sum + Math.abs(rawData[blockStart + j]) // find the sum of all the samples in the block
    }
    filteredData.push(sum / blockSize); // divide the sum by the block size to get the average
  }
  return filteredData;
}

Let’s see what that data looks like.