@mattgodbolt@hachyderm.io cover
@mattgodbolt@hachyderm.io avatar

mattgodbolt

@[email protected]

Husband, father, coder, sometime verb, real person. Fond of old hardware. Co-host @twoscomplement podcast. #BlackLivesMatter. Trans Rights are Human Rights
He/him

This profile is from a federated server and may be incomplete. View on remote instance

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

A bit of New Year fun: behind the scenes outtakes from my Advent of Compiler Optimisation series.

Happy New Year everyone!

https://youtu.be/8gASKxx6mCc

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 25 of Advent of Compiler Optimisations!

We've reached the end of this journey through compiler magic—from simple arithmetic tricks to mind-bending loop transformations. Thank you for following along! Whether you celebrate Christmas or just enjoy a good compiler optimisation, I hope you've discovered something that made you see your code differently.

Read more: https://xania.org/202512/25-thank-you
Watch: https://youtu.be/N1sRfYwzmso

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 24 of Advent of Compiler Optimisations!

A simple loop that sums integers from 0 to n. GCC cleverly unrolls it to process two numbers at once. But clang? The loop completely disappears—replaced by a few multiplies and shifts that compute the answer directly. How does it recognise this pattern and transform O(n) code into O(1)?

Read more: https://xania.org/202512/24-cunning-clang
Watch: https://youtu.be/V9dy34slaxA

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Interested in Intel Skylake's front end? Not yet bored of me? Then you might enjoy this talk I presented at Jane Street in November:

https://youtu.be/BVVNtG5dgks?si=OK8KlYve_TEMzHkX

I'm sure I've made some errors but I put a ton of work into trying to verify what I could. If you know of any inaccuracies do let me know!

I hope to do a follow-up/updated version for a conference next year sometime!

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 21 of Advent of Compiler Optimisations!

Summing an array of integers? The compiler vectorises it beautifully, processing 8 at a time with SIMD. Switch to floats and... the compiler refuses to vectorise, doing each add one by one. Same loop, same code structure — why does the compiler treat floats so differently?

Read more: https://xania.org/202512/21-vectorising-floats
Watch: https://youtu.be/lUTvi_96-D8

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 19 of Advent of Compiler Optimisations!

Recursive functions need to call themselves over and over — that must mean unbounded stack growth, right? Wrong! When a function ends by calling another function (even itself), the compiler can replace the call with a simple jump. Recursion becomes iteration, no stack overhead at all. How does this transformation work?

Read more: https://xania.org/202512/19-tail-call-optimisation
Watch: https://youtu.be/J1vtP0QDLLU

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 18 of Advent of Compiler Optimisations!

You have a function with a fast path and a slow path. Inline it everywhere? Massive code bloat. Don't inline? You miss the fast path performance gains. It's an impossible choice—or is it? The compiler finds a way to get the performance benefits of inlining without paying the full code size cost. But how?

Read more: https://xania.org/202512/18-partial-inlining
Watch: https://youtu.be/STZb5K5sPDs

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 15 of Advent of Compiler Optimisations!

Two nearly identical loops: one accumulates ints into an int, the other accumulates ints into a long. You'd expect similar assembly—just different register sizes, right? Wrong! One loop writes to memory on every iteration, the other keeps everything in registers. Same algorithm, wildly different performance. What's going on?

Read more: https://xania.org/202512/15-aliasing-in-general
Watch: https://youtu.be/PPJtJzT2U04

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 13 of Advent of Compiler Optimisations!

You're calling a function inside a loop, but its result never changes between iterations. Does the compiler spot this and hoist it out? Turns out the answer depends on which compiler you use! Clang pulls off the optimisation beautifully, but gcc stumbles—even with explicit hints. What's going on?

Read more: https://xania.org/202512/13-licking-licm
Watch: https://youtu.be/dIwaqJG0WDo

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 6 of Advent of Compiler Optimisations!

Divide by 512—that's just a shift right by 9, right? But look at the generated code: extra instructions appear! The compiler seems to be doing unnecessary work. Or is it? Turns out there's a subtle difference between what you asked for and what you probably meant. One keyword fixes everything.

Read more: https://xania.org/202512/06-dividing-to-conquer
Watch: https://youtu.be/7Rtk0qOX9zs

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 3 of Advent of Compiler Optimisations!

Four completely different ways to add two numbers—direct addition, a while loop, recursive calls—yet they all compile to the exact same single instruction. The compiler's pattern recognition sees through the obfuscation and finds the canonical form underneath. How does it know?

Read more: https://xania.org/202512/03-more-adding-integers
Watch: https://youtu.be/wHg9lYPMvvE

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 2 of Advent of Compiler Optimisations!

How do you add two integers on x86-64? You might expect add, but the compiler has other ideas—it uses an instruction designed for calculating memory addresses instead! Why would it choose this unusual approach, and what advantages does it bring? The answer reveals something fascinating about x86's quirky architecture.

Read more: https://xania.org/202512/02-adding-integers
Watch: https://youtu.be/BOvg0sGJnes

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Day 1 of Advent of Compiler Optimisations!

Why do compilers love xor eax, eax for zeroing registers? It's brilliant: saves bytes compared to mov eax, 0, AND x86 CPUs recognise this "zeroing idiom" early in the pipeline—breaking register dependencies and removing it from execution entirely. Even better: writing to eax zeroes the top 32 bits of rax for free, handling 64-bit longs in one instruction.

Read more: https://xania.org/202512/01-xor-eax-eax
Watch: https://youtu.be/eLjZ48gqbyg

mattgodbolt OP ,
@mattgodbolt@hachyderm.io avatar

@hp any write to a 32-bit register clears the top 32-bits. That's not true of the other writes. When AMD expanded the registers to 64b, they changed the behaviour for those only. So xor ax, ax only clears the bottom 16b, leaving the other 48 alone.

mattgodbolt OP ,
@mattgodbolt@hachyderm.io avatar

@ljrk @hp I'm 99.99% sure it's the same for all registers, including the new ones. It would be a pain for compilers' register allocatirs to have to treat registers differently depending if they were "new" or not.

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

I just posted my little intro to the Advent of Compiler Optimisation that I announced yesterday. Check it out (and laugh at my terrible editing) at https://youtu.be/j-BwR-Cw0Gk?si=LxGP1oF4OTi6zoN4 and if you want to watch the series, subscribe! :)

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

This December, I'll be posting an article & video each day until Christmas in the Advent of Compiler Optimisations!

Each day we'll explore a fun optimisation in C or C++; some low-level, x86 or ARM-specific, some high-level. Hope you'll join me!

YT: https://youtube.com/mattgodbolt
Blog: https://xania.org

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Another chance to see (a slightly updated version) my ZX Spectrum meets modern C++ talk. This one at the excellent C++ on Sea conference

https://youtube.com/watch?v=gg4pLJNCV9I&si=gxHLdHeUT7yRUNcq

Next year ACCU and C++ on Sea join forces too! Can't wait for that!

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

If you're like me and can't remember if it's RCX or R8 for the fourth parameter... Get the Compiler Explorer ABI Mug: actual calling conventions for x86-64 System V, Windows, and ARM64. Support @compiler_explorer : grab yours here -> https://shop.compiler-explorer.com/collections/abi-mugs

mattgodbolt OP ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer I'll add that! I think it was out of stock when I added the designs! Thanks!

mattgodbolt OP ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer try now! I've enabled it at least (!) we will see
I get a wee alert telling me "may not be in stock"

I'm a bit new to all this...

mattgodbolt OP ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer darn it! I think it's sold out as I've added it but it's not giving me the option to list it! Sorry.

And thanks for the order!! :-)

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Desperate times. We rebuilt CE's OS image and now something SOMETIMES takes cpu cgroup delegation away from us. We suspect systemd, and some kind of start-up race condition. But been unable to reliably repro or diagnose.

Does anyone here know anything about such things?

https://github.com/compiler-explorer/infra/issues/1761

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

I'm very lucky to have been in a few @computerphile videos recently - I'm now up to 13 appearances! Thanks Sean for the opportunity to share my love of (trying to) explain how computers "Really Work".

The playlist of my appearances: https://www.youtube.com/playlist?list=PLzH6n4zXuckpwdGMHgRH5N9xNHzVGCxwf

@jon_valdes@mastodon.gamedev.place avatar jon_valdes , to random

Just realized "Godbolt is the programmer, not the tool" is the new "Frankenstein is the doctor, not the monster"

mattgodbolt ,
@mattgodbolt@hachyderm.io avatar

@lritter @jon_valdes the website has its name at the top left of every page, and it's not that (but I get why most people call it that and have come to terms with it :) )

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

I'm frequently asked "how much does @compiler_explorer cost to run". I've done some digging and posted https://xania.org/202506/compiler-explorer-cost-transparency which is a high-level breakdown of the costs (and revenue); and links to a more in depth report if you want all the gory details.

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random
@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Wonderful news to share: @compiler_explorer has received a $10K grant from NVIDIA's FOSS Fund!

After years of collaboration with their engineers to support GPU compilation, this funding will help us maintain the AWS GPU instances that make CUDA experimentation accessible to all.

A huge thank you to NVIDIA for supporting open source tools that benefit the entire developer community!

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Some folks have asked what changed about MS compilers on @compiler_explorer and why, so I wrote a blog post explaining: https://xania.org/202407/msvc-on-ce

@mattgodbolt@hachyderm.io avatar mattgodbolt , to random

Is optimization actually a type of refactoring? Ben poses this interesting question in the latest @twoscomplement episode at https://www.twoscomplement.org/#podcast

@grumpygamer@mastodon.gamedev.place avatar grumpygamer , to random

Current status:

mattgodbolt ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer are the signals strong?

@grumpygamer@mastodon.gamedev.place avatar grumpygamer , to random

RPGTBD now has a city hall where both justice and bureaucracy can be served. Increase your bureaucracy stat to get more paperwork done.

ALT
mattgodbolt ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer I love this! I went back over prior toots; is there a top-level link that explains what you're up to, or is this just for fun right now?

mattgodbolt ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer Oh. My. Word. Sounds tailor-made for me !! Amazing :) will be following along very closely!!

mattgodbolt ,
@mattgodbolt@hachyderm.io avatar

@penryu @grumpygamer absolutely, me too! Is there a Patreon/similar we can subscribe to?

mattgodbolt ,
@mattgodbolt@hachyderm.io avatar

@grumpygamer oh! You're so welcome; I'm delighted it can be used to help /all/ types of programs (not just boring finance that I do)