Automating resource selection with client hints

Ilya Grigorik
Ilya Grigorik

Building for the web gives you unparalleled reach. Your web application is a click away and available on almost every connected device-smartphone, tablet, laptop and desktop, TV, and more-regardless of the brand or the platform. To deliver the best experience you've built a responsive site that adapts the presentation and functionality for each form-factor, and now you're running down your performance checklist to ensure that the application loads as quickly as possible: you've optimized your critical rendering path, you've compressed and cached your text resources, and now you're looking at your image resources, which often account for the majority of transferred bytes. Problem is, image optimization is hard:

  • Determine the appropriate format (vector vs. raster)
  • Determine the optimal encoding formats (jpeg, webp, etc.)
  • Determine the right compression settings (lossy vs. lossless)
  • Determine which metadata should be kept or stripped
  • Make multiple variants of each for each display + DPR resolution
  • ...
  • Account for user's network type, speed, and preferences

Individually, these are well-understood problems. Collectively, they create a large optimization space that we (the developers) often overlook or neglect. Humans do a poor job of exploring the same search space repetitively, especially when many steps are involved. Computers, on the other hand, excel at these types of tasks.

The answer to a good and sustainable optimization strategy for images, and other resources with similar properties is simple: automation. If you're hand-tuning your resources, you're doing it wrong: you'll forget, you'll get lazy, or someone else will make these mistake for you - guaranteed.

The saga of the performance-conscious developer

The search through image optimization space has two distinct phases: build-time and run-time.

  • Some optimizations are intrinsic to the resource itself - e.g. selecting the appropriate format and encoding type, tuning compression settings for each encoder, stripping unnecessary metadata and so on. These steps can be performed at "build-time".
  • Other optimizations are determined by the type and properties of the client requesting it and must be performed at "run-time": selecting the appropriate resource for client's DPR and intended display width, accounting for client's network speed, user and application preferences, and so on.

The build-time tooling exists but could be made better. For example, there are a lot of savings to be had by dynamically tuning the "quality" setting for each image and each image format, but I'm yet to see anyone actually use it outside of research. This is an area ripe for innovation, but for the purposes of this post I'll leave it at that. Let's focus on the run-time part of the story.

<img src="/image/thing" sizes="50vw"
        alt="image thing displayed at 50% of viewport width">

The application intent is very simple: fetch and display the image at 50% of the user's viewport. This is where most every designer washes their hands and heads for the bar. Meanwhile, the performance-conscious developer on the team is in for a long night:

  1. To get the best compression she wants to use the optimal image format for each client: WebP for Chrome, JPEG XR for Edge, and JPEG to the rest.
  2. To get the best visual quality she needs to generate multiple variants of each image at different resolutions: 1x, 1.5x, 2x, 2.5x, 3x, and maybe even a few more in between.
  3. To avoid delivering unnecessary pixels she needs to understand what "50% of the user's viewport actually means"—there are a lot of different viewport widths out there!
  4. Ideally, she also wants to deliver a resilient experience where users on slower networks will automatically fetch a lower resolution. After all, it's all about time to glass.
  5. The application also exposes some user controls that affect which image resource ought to be fetched, so there's that to factor in as well.

Oh, and then the designer realizes that she needs to display a different image at 100% width if the viewport size is small to optimize legibility. This means we now have to repeat the same process for one more asset, and then make the fetch conditional on viewport size. Have I mentioned this stuff is hard? Well, ok, let's get to it. The picture element will get us pretty far:

<picture>
    <!-- serve WebP to Chrome and Opera -->
    <source
    media="(min-width: 50em)"
    sizes="50vw"
    srcset="/image/thing-200.webp 200w, /image/thing-400.webp 400w,
        /image/thing-800.webp 800w, /image/thing-1200.webp 1200w,
        /image/thing-1600.webp 1600w, /image/thing-2000.webp 2000w"
    type="image/webp">
    <source
    sizes="(min-width: 30em) 100vw"
    srcset="/image/thing-crop-200.webp 200w, /image/thing-crop-400.webp 400w,
        /image/thing-crop-800.webp 800w, /image/thing-crop-1200.webp 1200w,
        /image/thing-crop-1600.webp 1600w, /image/thing-crop-2000.webp 2000w"
    type="image/webp">
    <!-- serve JPEGXR to Edge -->
    <source
    media="(min-width: 50em)"
    sizes="50vw"
    srcset="/image/thing-200.jpgxr 200w, /image/thing-400.jpgxr 400w,
        /image/thing-800.jpgxr 800w, /image/thing-1200.jpgxr 1200w,
        /image/thing-1600.jpgxr 1600w, /image/thing-2000.jpgxr 2000w"
    type="image/vnd.ms-photo">
    <source
    sizes="(min-width: 30em) 100vw"
    srcset="/image/thing-crop-200.jpgxr 200w, /image/thing-crop-400.jpgxr 400w,
        /image/thing-crop-800.jpgxr 800w, /image/thing-crop-1200.jpgxr 1200w,
        /image/thing-crop-1600.jpgxr 1600w, /image/thing-crop-2000.jpgxr 2000w"
    type="image/vnd.ms-photo">
    <!-- serve JPEG to others -->
    <source
    media="(min-width: 50em)"
    sizes="50vw"
    srcset="/image/thing-200.jpg 200w, /image/thing-400.jpg 400w,
        /image/thing-800.jpg 800w, /image/thing-1200.jpg 1200w,
        /image/thing-1600.jpg 1600w, /image/thing-2000.jpg 2000w">
    <source
    sizes="(min-width: 30em) 100vw"
    srcset="/image/thing-crop-200.jpg 200w, /image/thing-crop-400.jpg 400w,
        /image/thing-crop-800.jpg 800w, /image/thing-crop-1200.jpg 1200w,
        /image/thing-crop-1600.jpg 1600w, /image/thing-crop-2000.jpg 2000w">
    <!-- fallback for browsers that don't support picture -->
    <img src="/image/thing.jpg" width="50%">
</picture>

We've handled the art direction, format selection, and provided six variants of each image to account for variability in DPR and viewport width of the client's device. Impressive!

Unfortunately, the picture element does not allow us to define any rules for how it should behave based on client's connection type or speed. That said, its processing algorithm does allow the user agent to adjust what resource it fetches in some cases—see step 5. We'll just have to hope that the user agent is smart enough. (Note: none of the current implementations are). Similarly, there are no hooks in the picture element to allow for app-specific logic that accounts for app or user preferences. To get these last two bits we'd have to move all of the above logic into JavaScript, but that forfeits the preload scanner optimizations offered by picture. Hmm.

Those limitations aside, it works. Well, at least for this particular asset. The real, and the long-term challenge here is that we can't expect the designer or the developer to hand-craft code like this for each and every asset. It's a fun brain puzzle on the first try, but it loses its appeal immediately after that. We need automation. Perhaps the IDE's or other content-transform tooling can save us and automatically generate the boilerplate above.

Automating resource selection with client hints

Take a deep breath, suspend your disbelief, and now consider the following example:

<meta http-equiv="Accept-CH" content="DPR, Viewport-Width, Width">
...
<picture>
    <source media="(min-width: 50em)" sizes="50vw" srcset="/image/thing">
    <img sizes="100vw" src="/image/thing-crop">
</picture>

Believe it or not, the above example is sufficient to deliver all the same capabilities as the much longer picture markup above plus, as we will see, it enables full developer control over how, which, and when the image resources are fetched. The "magic" is in the first line that enables client hints reporting and tells the browser to advertise the device pixel ratio (DPR), the layout viewport width (Viewport-Width), and the intended display width (Width) of the resources to the server.

With client hints enabled, the resulting client-side markup retains just the presentation requirements. The designer does not have to worry about image types, client resolutions, optimal breakpoints to reduce delivered bytes, or other resource selection criteria. Let's face it, they never did, and they shouldn't have to. Better, the developer also does not need to rewrite and expand the above markup because the actual resource selection is negotiated by the client and server.

Chrome 46 provides native support for the DPR, Width, and Viewport-Width hints. The hints are disabled by default and the <meta http-equiv="Accept-CH" content="..."> above serves as an opt-in signal that tells Chrome to append the specified headers to outgoing requests. With that in place, let's examine the request and response headers for a sample image request: