@chandlerc@hachyderm.io avatar

chandlerc

@[email protected]

Software, performance, optimization, programming languages, security, open source, #CarbonLang lead, #LLVM, #Clang, C++. 🏳️‍🌈 http://pronoun.is/he or http://pronoun.is/they

This profile is from a federated server and may be incomplete. View on remote instance

@meowray@hachyderm.io avatar meowray , to random

Quality, Velocity, Open Contribution — pick two. If you try for all three, you get none — the maintainers burn out, the project becomes unsustainable.
Lua and SQLite picked quality, and dropped both velocity and open contribution.
When your project is mature enough, you can afford to.
For a project like LLVM, open contribution is not optional — so you're really choosing between quality and velocity.
LLM-aided development dramatically increases contribution volume without increasing reviewer capacity.
LLM-aided review may help at the margins — catching mechanical issues, summarizing patches — but the core bottleneck is human judgment.

chandlerc ,
@chandlerc@hachyderm.io avatar

@meowray FWIW, strongly disagree here.

I think it is entirely possible to have quality, velocity, and open contribution.

I'm not saying there isn't a tradeoff, but I think the above three can be preserved sufficiently.

For example, in LLVM, I think the bigger challenge than quality is that people view "contribution" as much more about "sending a patch" and not "reviewing a patch. As a consequence, the project has lost community and cultural prioritization of code review as an active and necessary part of contribution.

Also, "open contribution" doesn't mean you have to accept contributions. I think a project can still have meaningfully open contribution while insisting contributors balance their contributions between patches and review, and where contributions that are extractive are rejected until the contributor figures out how to make them constructive.

IMO, criteria for sustaining both quality & velocity in OSS:

  • Strong expectation of total community code review in balance to total new patches -- this means that long-time contributors (maintainers) must do more review than new patches.
  • Strong expectation of patches from new contributors rapidly rising to the quality bar where they are efficient to review and non-extractive.
  • Strong testing culture that ensures a large fraction of quality is mechanically ensured
  • Excellent infrastructure use to provide efficient review and CI so tests are effective

I think LLVM struggles with the first and last of these. The last is improving recently though!

@chandlerc@hachyderm.io avatar chandlerc , to random

I think this is the spirit animal I want for 2026....

ALT
@chandlerc@hachyderm.io avatar chandlerc , to random

Super exciting: Rust is no longer experimental in the Linux Kernel, it is here to stay!!!

https://lwn.net/Articles/1049831/

@chandlerc@hachyderm.io avatar chandlerc , to random

Also, this poll does a disservice by misleadingly framing the question IMO.

Basically no one uses terminating assertions literally "always". That's actually more absurd than having no assertions at all. The idea that you literally never disable an assertion is IMO completely detached from the reality of software engineering.

By framing the first choice as that, rather than as "by default" with exceptions, the poll skews its results in silly ways. This is artificially rigging data to support "we only need 'optionally' terminating assertions" by making the closest option to "terminating by default" be an untenable position.

https://mastodon.online/@meetingcpp/115371492810105057

@chandlerc@hachyderm.io avatar chandlerc , to random

has beautifully and fully captured my feelings on AI art here:
https://theoatmeal.com/comics/ai_art

TBH, I could never express my thoughts and feelings about this half as well, so I will just fully subscribe to this as my persistent stance.

It's a bit long, but I encourage giving it a full read. It's worth it.

@pervognsen@mastodon.social avatar pervognsen , to random

Wrote up some thoughts on extrinsically tagged sum types on the lobster site: https://lobste.rs/s/x9s2mu/store_tags_after_payloads#c_c4gteg

chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen This is a really nice insight, and not what I had guessed it to be from the term "extrinsically tagged sum types"... I wonder if there is a way to refer to this technique that more immediately cues the reader to the insight / design tradeoff, because it does seem really valuable to teach and entrench as the default way you want to look at this problem space.

@chandlerc@hachyderm.io avatar chandlerc , to random

I suspect the python code isn't great (not my strong suit), but really happy with the nice output and analysis I've gotten into a new micro benchmark running script: https://github.com/carbon-language/carbon-lang/pull/5706

@chandlerc@hachyderm.io avatar chandlerc , to random

Would be interested to understand how the real-world experience of the keywords val and var in programming languages works out for native Japanese speakers, for example Kotlin: https://kotlinlang.org/docs/basic-syntax.html#variables

I have read that 'l' and 'r' are not distinguished and really difficult for Japanese speakers, but not being one I have no idea.

Specifically, we're considering using val in a much more narrow/specialized position than Kotlin, but it would technically exist along-side var. Wondering if the similarity is a problem if only used in a very niche or specialized part of the language?

Maybe @jfbastien or @barrelshifter know folks or could help connect me with Japanese native speakers who could shed light on this?

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@pervognsen Yeah, the visual similarity was an easy reason to exclude them from any widely used part of the language.

The question we're coming up with is that we have a relatively niche use case (couple orders of magnitude below Kotlin's usage) where val makes a great deal of sense. But var does come up in that use case, but yet another order of magnitude (at least) less often than val would. At that point, the visual confusion seems much less worrisome. There aren't going to be confusing runs with a slight difference, etc.

But if it makes just reading the descriptions of the language hard / impossible / infuriating for Japanese native speakers, that maybe argues against having both keywords even in a niche use case.

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@pervognsen https://discuss.kotlinlang.org/t/var-and-val-are-you-kidding-me-give-me-con/15621/6 is also a good citation for the frustration this can cause when in a widely used part of the language. =/

@thephd@pony.social avatar thephd , (edited ) to random

Suppose there's a C language type _Any_func (with a <stdfunc.h> that has typedef _Any_func any_func;) that allows any func ptr type to be converted to _Any_func*, and back. (Like void* for functions.)

_Any_func* af = f;<br></br>printf("??", af);<br></br>

?? is ...

chandlerc ,
@chandlerc@hachyderm.io avatar

@thephd Given that %q was used, and %lp was used by CHERI I think I would suggest continuing down the %lp path. Model this as a length field for the p type. I'd re-use I as a reference to an instruction pointer with %Ip. Re-using I has the advantage of it already being in a length field so it shouldn't break much, but does mean re-using / aliasing the meaning when applied to pointer types. No worse than %lp though, and that seems to have worked out OK.

If that doesn't work for folks, could either use %fp if you can re-use a type character in the length field, or if not, %Cp (with C for "code") maybe?

chandlerc ,
@chandlerc@hachyderm.io avatar

@thephd I was trying to type capital I which I think is a length modifier?

@pervognsen@mastodon.social avatar pervognsen , (edited ) to random

Reading Cliff's reply, https://github.com/SeaOfNodes/Simple/blob/main/ASimpleReply.md, I don't have very strong opinions but "show me the numbers" doesn't seem useful one way or the other. You're not going to implement otherwise equivalent optimized optimizing compilers side by side in a way that lets you easily compare costs for IRs.

chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen FWIW, I remain firmly in the "CFG with use/def graph layered on top". Had several debates about this over the years with V8 team and others. Somewhat interesting to see all this coming back around. =]

chandlerc ,
@chandlerc@hachyderm.io avatar

@zwarich @pervognsen I don't really buy it...

You can always just write the optimization on the use/def graph and get SoN behavior?

And as much as I agree about the compile time benefit potential you mention, the larger thing for me is the huge advantage of a source-code hint for the schedule. Like, yes, you can compute a schedule easily enough. But the degrees of freedom on that are huge. Letting source code paint the path out of that impossibly-large search space is an advantage that is hard to beat. And not just for actual program scheduling (which I'm not sure matters that much), but also for all the optimizations that must rely on scheduling in some way because we don't have a proper dependency based model for them.

chandlerc ,
@chandlerc@hachyderm.io avatar

@moonchild @zwarich @pervognsen This seems to presuppose that clarity is in opposition to scheduling, but that isn't always the case...

For performance critical code, the scheduling often is a hugely important aspect to make clear.

The scheduling that is completely unconstrained (side-effect-free scheduling) is also the one that even a CFG compiler will nearly limitlessly fix for you, and so is essentially never requires sacrificing clarity. And CFG remains strictly superior: in SoN, you write clear code and get bad performance in a very rare case, but have zero recourse to fix. While in CFG, you write clear code and and get bad performance in a very rare case, and at least have a recourse to fix locally (at clarity cost) until the compiler bug is addressed and clarity can be restored.

The scheduling where the program source has to reflect it is the effectful schedule, and there changes are often semantically relevant and so not best (IMO) communicated out-of-band. Or if they are, we already have the tools for this in terms of factoring functions, and organizing the source code in a way that is clear and then assembling those components in a way that enacts the desired schedule.

chandlerc ,
@chandlerc@hachyderm.io avatar

@moonchild @zwarich @pervognsen SoN literally throws out the source code and starts from scratch. The CFG preserves the CFG at each step to fall back on, but can still make purely dependency based optimizations.

Thats some of what I mean by saying that in many ways, this is about it being easy to have the best of both worlds.

Another way to see it is that SoN focuses on throwing information away. My whole point is to keep both sets of information, and use whichever one is most helpful for a specific case.

chandlerc ,
@chandlerc@hachyderm.io avatar

@moonchild @zwarich @pervognsen A somewhat simplified example, but about the best to fit into a social media post...

Imagine a huge bunch of loads and stores. None alias, and so the SoN compiler is free to select any order of loads and stores it likes. How does it guess at a likely-locality-clustered order? No chance.

But turns out, humans can easily group loads and stores together that have locality. And if there is no other information (and there often is no other info), that default order is likely near-optimal. And even if it isn't near-optimal, now you can document that the source-order is the default lacking information to the contrary, and the author can arrange for it to be near-optimal.

You could build a side channel to carry this information. But you do have source order already, and in practice it turns out to be a very solid starting point. Why not use it (and document it)?

chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen @zwarich @moonchild That could work. Or the load/store widening in codegen.

But sadly, neither of those are super powerful. And you'd have to teach a decent amount of machinery about the extract being cheap. The problem I suspect is that they see loading a vector (or a large integer) and extracting the parts, and a bunch of cost models go "EEEEP!", not knowing that this unpacking is built into the HW. =/

@chandlerc@hachyderm.io avatar chandlerc , to random

Gah... Naming conventions in code are so hard despite "not mattering"...

And yet they do seem to matter to folks. Myself included.

Both knowing that debating and trying to get it "right" is a waste of time, and also being unable to stop debating because it feels like the differences involve very material tradeoffs in readability is ... frustrating.

chandlerc OP ,
@chandlerc@hachyderm.io avatar

When I wrote this, I said "naming convention".

What I was trying to talk about were the subtleties, not the main issue of "how do you name things".

Like:

  • Do you use abbreviations? means shorter / simpler names, but may be less discoverable than the full word.
  • Do you separate different kinds of names into different conventions like snake_case vs. CamelCase? Which ones?

The stumbling point that motivated it: how do you handle acronyms and abbreviations with CamelCase?

Again, not suggesting there exists a "right" answer. But also it doesn't feel wrong to want to question when the answer you have doesn't quite fit.

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@pervognsen This was the source of ... an inordinate time loss today.

Some folks like JpegEncoder. But it appears that a decent number of the folks who are comfortable with that rule in general, start to dislike it and potentially strongly when confronted with a specific case that ends up unreadable.

Interesting examples that came up: Llvm, Ir, Lld

@chandlerc@hachyderm.io avatar chandlerc , to random

As a long-standing open source contributor and in various capacities maintainer, I have a bunch of thoughts following Rust4Linux and Asahi developments...

  1. Humans are going to complain, and in a world w/ social media do so on social media, about public & persistent frustrations in their lives. Not super useful to ask them not to do that.

  2. Brigading is a big deal, and not a term that should be used lightly. It specifically means intentionally creating a coordinated group harassment campaign. Don't call stuff brigading without very clear evidence of intent. People who are intentionally brigading people are actively engaging in organized and serious harassment. It can escalate easily and is unacceptable behavior.

  3. Both those said, people with large online followings are somewhat responsible for any group-harassment or full-on brigading that their following does, even when they're not directly involved. If informed, they need to take very active steps to end this and prevent it going forward. There is no "well I wasn't doing any harassing" excuse nonsense.

  4. Telling someone with a large following "you're brigading me" when their followers were doing something independently isn't a great way to let them know what's going on and get (3) to kick in. Especially folks in powerful, leadership positions should be much better at communicating than that. Tell them what is happening, by whom, and what you're hoping they can do. Ya know, like you would otherwise.

  5. Last, but far from least: the behavior allowed on LKML is ridiculously toxic. This predates Asahi, R4L, and probably Rust. It is incredibly unsurprising for folks to be frustrated about this and posting about that frustration. There's a long line of previous posts from a wide range of authors (who weren't accused of brigading). I would suggest that improving this is a better focus for leadership of the Linux Kernel, and has been for a long term.

  6. Culture problems in open source communities are ultimately the responsibility of leaders in those communities. Harassing leaders because of culture problems will not improve things. But talking about these problems and how leadership can and must step up to address them is an essential component.

@chandlerc@hachyderm.io avatar chandlerc , to random

C'mon folks, let's get another 1000 here!!!

https://pony.social/@thephd/113670623814142421

@thephd@pony.social avatar thephd , to random

The damn survey has 950+ responses. If I get another burst it'll hit 1,000.

There's no way I can manually very every single entry, lol. Guess I'll have to use some basic general-checking techniques in a python script.

https://pony.social/@thephd/113659160563714508

chandlerc ,
@chandlerc@hachyderm.io avatar

@thephd So, uh, does that mean I shouldn't re-boost this? ;]

@chandlerc@hachyderm.io avatar chandlerc , to random

It has been zero days since I saw a generated code quality issue due to emplace_back instead of push_back.

Increasingly confident that methods forwarding arguments to a constructor were an API design mistake in C++.

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@michaelc I tried, but simple examples just fall apart in different ways, especially with the standard library and all the complexity of EH required in the containers...

Here is roughly a best case scenario: https://cpp.compiler-explorer.com/z/zTxGMdKe5

Main loop is the same, but "grow" requires its own bespoke template unlike with push_back.

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@michaelc

The way I think about it is break it down into a few core things.

First, emplace-style APIs capture each arg as a reference, so we spill them to the stack and use up a reg for every argument address. Then rinse through 42+ layers of std::forward, and only then pass each to the constructor.

Second, to ensure no copy or move, emplace has a much more complex growth scenario.

Third, the construction can't be separated and maybe occur at a diff level of inlining.

@chandlerc@hachyderm.io avatar chandlerc , to random

The battle to defeat dynamic relocations continues...

Also, we should have tips or blog posts or something to help steer people away from building ginormous global arrays full of pointers. :grumble:

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@pervognsen @zwarich My primary use case is actually a nearly-static PIE binary. So I don't have any COW pages to really care about, my problem is that the entire large binary image is being relocated to some dynamic offset, and all these pointers need to be updated before I even hit main... My understanding is even in a separate section that'll have to happen?

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@pervognsen @zwarich I think this is largely already the case? They're all in .data.rel.ro rather than .data.ro, separating the pages of non-relocated data from the relocated ones from what I understand?

Certainly, the data structures themselves intermingle tons, but that's not easily fixed...

@chandlerc@hachyderm.io avatar chandlerc , to random

Woot! Removed 25% of the dynamic relocations in Clang, and speeding up PIE-built binaries on my Linux system by 10% for tiny source files!

https://github.com/llvm/llvm-project/pull/118734

@chandlerc@hachyderm.io avatar chandlerc , to random

This was a fun PR description to write: https://github.com/llvm/llvm-project/pull/118736

@chandlerc@hachyderm.io avatar chandlerc , to random

So, just for the sake of curiosity -- how bad is it to create a 563486 character long string literal?

Clang very helpfully tells me that it exceeds the maximum length guaranteed to work -- 64k -- but c'mon, do any compilers really care?

I'm feeling lazy about building the compiler-explorer example that checks this...

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@shafik https://cpp.compiler-explorer.com/z/MWv1nxTWd

Clang and GCC are solid.

MSVC is fine from Visual Studio 16.8 onward, but fail starting at 16.7 it seems. =[

@pervognsen@mastodon.social avatar pervognsen , to random

matklad's old post on how to stay on top of compile time is still relevant: https://matklad.github.io/2021/09/04/fast-rust-builds.html. He mentions that at the time rust-analyzer plus its transitive dependencies was 1.2 million lines of code and built in 8 minutes on GitHub Actions (notoriously slow). I just did a clean debug build of the latest rust-analyzer and it's 44s for a debug build on my laptop. A clean release build is 70s. I assume it's quite a bit more than 1.2 million lines now, too.

chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen Is an M1 running Linux useful? (very good stability of timings, etc, but maybe not what you're looking for...)

chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen Here we go:

Executed in 51.55 secs fish external  
 usr time 211.05 secs 1.01 millis 211.05 secs  
 sys time 9.43 secs 0.03 millis 9.43 secs  
chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen Heres the release build (assuming I got the -r flag right):

Executed in 113.17 secs fish external  
 usr time 830.80 secs 264.00 micros 830.80 secs  
 sys time 9.63 secs 775.00 micros 9.63 secs  

I'm waiting to see if there is an M4 Max or Ultra that comes out in a non-screen form factor around the time that Asahi has M4 support, otherwise I would have nabbed one already.

chandlerc ,
@chandlerc@hachyderm.io avatar

@dougall @pervognsen I wonder if the fact that I'm running Asahi Linux (and performance power mgmt strategy) on the M1 is part of the difference...

@chandlerc@hachyderm.io avatar chandlerc , to random

Had a bunch of thoughts about the recent safety stuff, way more than fit in social media post... Blog post story time! (It's a bit of a ramble, sorry about that...)

https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/

@chandlerc@hachyderm.io avatar chandlerc , to random

I ... really don't understand how the std::filesystem::path API is usable. Do others not find this terribly hard to read and understand?

From https://en.cppreference.com/w/cpp/filesystem/path/append:

path("foo") / "" // the result is "foo/" (appends)  
path("foo") / "/bar"; // the result is "/bar" (replaces)  

And yes, TIL that this is how it works.

@chandlerc@hachyderm.io avatar chandlerc , to random

(Distraction fun coding)

Got a working prototype of a nifty new compile time benchmark.

Takes Carbon's fancy random-source-with-consistent-timing generator, combines it with our standard GoogleBenchmark running hardness, but then subprocesses to a separate compiler process.

With this we can:
a) Measure build (as opposed to library) perf, just like make / ninja
b) Compare across languages / compilers, even w/o a library API

Have Carbon vs C++-with-Clang.

Next: Go? Rust? Other language?

chandlerc OP ,
@chandlerc@hachyderm.io avatar

Already some fun observations...

For short files, 10% of Carbon's compile time is ... initializing LLVM's internal commandline flags ... that we don't use...

This is going to be fun.

@chandlerc@hachyderm.io avatar chandlerc , to random

Anyone know why LLVM doesn't use cmovs on x86 here the way GCC does?

https://cpp.compiler-explorer.com/z/sTM6axW15

Or why it generates an analogous separate subtract and comparison on Arm?

Is there a reason this is actually better despite the larger code?

Best theory I can come up with is that it lets the comparison proceed prior to the immediate being materialized, but that seems very improbable to be a benefit....

chandlerc OP ,
@chandlerc@hachyderm.io avatar

@dougall Yeah, GCC is doing exactly what i'd expect here.

The extra frustrating thing is that at least for x86 I know LLVM models the flags sufficiently to get this. I'm really baffled why it doesn't... Guess bug filing time...

@zwarich@hachyderm.io avatar zwarich , to random

Does anyone have a favorite strategy for mismatched bracket error recovery in recursive descent parsers?

chandlerc ,
@chandlerc@hachyderm.io avatar

@zwarich My favorite strategy is technically before the parser because I find it really hard to do well there:

Detect during lexing, and then post-process the token stream, ideally using indent to guide fixes.

Some of this implemented, but not the indentation bit:
https://github.com/carbon-language/carbon-lang/blob/trunk/toolchain/lex/lex.cpp#L1464

There is a TODO, and we have the indent data, just need someone to write the code to peak at the indent and select good fixes until we run out, and then run the greedy algorithm to fix anything left.

@SonnyBonds@mastodon.gamedev.place avatar SonnyBonds , to random

Is there any non-UB way in C++ of storing an offset to a member (potentially in a subobject, e.g. contained struct or array) and then retrieving a pointer to that member given the base object and this stored offset?

offsetof can get the offset, but using that for any pointer arithmetic is UB isn't it? What can that offset be used for in a standard way, really?

Pointer-to-member can't handle subobjects, I think? Also it's very strongly tied to the base object type which may be a problem.

chandlerc ,
@chandlerc@hachyderm.io avatar

@pervognsen @SonnyBonds AFAIU, not quite standard conformant in the latest C++...

Because you're round-tripping through the underlying storage pointer and a raw offset, you don't have valid pointer provenance for the subobject. But you can fix that, because this is an important use case in important cases: std::launder(reinterpret_cast&lt;float*&gt;(reinterpret_cast&lt;char*&gt;(&amp;obj) + offset))

@chandlerc@hachyderm.io avatar chandlerc , to random

For folks seeing the (bad) analysis from the hateful jerk on twitter, and are comfortable reading there, here is a superb breakdown of that analysis by a literal world expert:

https://x.com/taviso/status/1814762302337654829

chandlerc OP ,
@chandlerc@hachyderm.io avatar

also @thephd , this provides details that mean you may see a "happy"[1] ending to this story after all.

[1]: happy exclusively in the sense of whether it is yet-another-mem-safety-fail-parade entry or not...