Programming language & compiler enthusiast, computer architect. Creator of Rosetta 2.

This profile is from a federated server and may be incomplete. View on remote instance

@zwarich@hachyderm.io avatar zwarich , to random

@joe I started writing some code in the "flow-sensitive explicit tag" style of struct/variant types, and the main feature I was missing was the ability to force downstream consumers to consider new fields. Obviously, you could still have both explicit tags and pattern matching that desugars into them, but it hurts the case a bit.

The other thing I noticed with the explicit tag approach is a bit more syntactic: you are forced to define an enum for the tag and then define the variant type and repeat all of the tag members again. You could probably just add some syntax to generate the tag type from the list of variants in the struct.

zwarich OP ,
@zwarich@hachyderm.io avatar

@joe Yeah, I had similar thoughts. If you go this route, you probably want user-defined patterns, want to implement patterns for variants using this functionality, and want all disjointness/exhaustiveness of patterns to be inherited from the same for enums or integer ranges.

User-defined patterns do expose the additional complication of needing something that looks like a reference, although maybe it’s possible to do it all in the coroutine/handler get/set style that you mention and avoid real references.

zwarich OP ,
@zwarich@hachyderm.io avatar

@joe Wow, “I really need to read the Swift forums today” wasn’t how I expected to begin my week.

I guess my slight reluctance about this path (despite liking many aspects of it) is that it feels like you could be very deep into it and discover some feature you want that just won’t work with the coroutine encodings. I guess on the plus side, since you’re generally relying on algebraic properties of effect handlers, you have more assurance that your individual solutions will compose with each other.

I really wish that the Mutable Value Semantics team would partner with someone a bit more PLT-minded and would do a real PL paper with surface syntax and elaboration into a core language with a lot fewer features.

zwarich OP ,
@zwarich@hachyderm.io avatar

@joe My dumb test example for this stuff is always "return a hash table consisting of strings borrowed from some context object". If you want to go further, you could do "add more strings borrowed from some context object to my inout param hash table".

Even if you come up with some extension of subscripts or remote parts for this, is it really going to be more ergonomic than how you'd write it in Rust, nevermind a Rust with the lifetime syntax cleaned up a bit to let you do 'param when that lifetime is unambiguous?

@regehr@mastodon.social avatar regehr , to random

ClangIR seems to be landing in the LLVM tree. it feels like there must be a backlog of applied compiler research projects that never got done because of a lack of open source C++-level IRs in production compilers. I wonder what they are?

https://blog.llvm.org/posts/2025-gsoc-clangir-upstreaming/

zwarich ,
@zwarich@hachyderm.io avatar

@regehr Even without the C++ allocator semantics, any potential optimizer would have problems with hash table. I don't think any known decidable theory of arrays captures the logic of open addressing probe chains for get/set, nevermind growing and rehashing the table.

@zwarich@hachyderm.io avatar zwarich , to random

For everyone visiting their parents this week:

Haskell calls them “type classes”
Rust calls them “traits”
Swift calls them “protocols”
Go calls them “interfaces”
Raku calls them “roles”

@zwarich@hachyderm.io avatar zwarich , to random

@joe Is there anything you would change about Swift’s gaggle of if let-related features if you could?

zwarich OP ,
@zwarich@hachyderm.io avatar

@joe I liked the idea of the Ultimate Conditional Syntax paper (https://dl.acm.org/doi/10.1145/3689746), but when I tried to implement I found a few problems:

  • It doesn’t work as well for a Rust-like language where bare if has an implicit else { () }, because you then need some way of making the conditionals that are actually supposed to be exhaustive.
  • I want a postfix match like Scala, but I couldn’t think of a nice equivalent in terms of a unified conditional syntax.
  • If you require explicit marking of bindings like Swift, you end up writing a lot of is let …. On some level this isn’t much different than case let … (and even shorter), but it does somehow feel weirder.
  • The various forms of curly brace syntax that you can come up with for the feature do seem a bit strange, although that might again be due to their novelty?
@graydon@types.pl avatar graydon , to random

pointing out that rust does not fix all classes of errors is just not the epic dunk people seem to think it is

more like a reheated PL-world version of the "thanks, obama" meme or something? like ok I guess if you really need to get that out of your system

zwarich ,
@zwarich@hachyderm.io avatar

@uep @graydon I'm curious what problems were preventing them from figuring out that this was due to a panic. Did they not have working logging for panic messages? Could the logging services not handle the load?

The detection/reaction of the issue aside, I do think it is very questionable to specifically write code designed to work with bounded resources (which is not the normal situation for userspace Rust), but then use a bare .unwrap() in a situation that could very easily be triggered by those resource limits being hit.

zwarich ,
@zwarich@hachyderm.io avatar

@dotstdy @graydon @uep It seems very odd that a networking company would not have a robust solution to this interview-level problem.

zwarich ,
@zwarich@hachyderm.io avatar

@dotstdy @pkhuong @graydon @uep Honestly, as far as post-mortems go, I’d rather read about how they missed that it was panicking than this questionable usage of unwrap().

I also don’t think it’s a trivial thing to do. I just meant that Cloudflare is a networking company (and one of their main products is preventing unwanted floods of traffic), so they should be aware of issues involved and would probably ask toy versions of these things in interviews.

pkhuong , to random

If I had to bootstrap (in-memory) data structures on a desert island, here's what I think would start with @zwarich

https://docs.google.com/document/d/1kLdm1owqwxtE581CUy3-EWxAe_lzWHB6hMIHbJahq2s/edit?usp=drivesdk

zwarich ,
@zwarich@hachyderm.io avatar

@rygorous @zeux @pkhuong Yeah, but what data structures would the GPS receiver and satellite phone use?

@zwarich@hachyderm.io avatar zwarich , to random

New artistic PL idea: focus on memory safety and nothing else. Make a type system whose only goal is preventing non-pointer data from being confused as a pointer by tracking pointer/data distinctions (recursively through pointees, of course) and using shape-segregated allocation when required. A language with no clear vision of where it wants to go, only a potential outcome from which it wishes to run far, far away.

zwarich OP ,
@zwarich@hachyderm.io avatar

@sbrunthaler If you use shape-segregated allocation you can avoid ever reusing memory with a different shape. For union types, you can lay out all of the union members so that they agree bytewise on shape.

I think some of the details depend on your opinions regarding dereferencing null pointers.

@malwaretech@infosec.exchange avatar malwaretech , to random

[Thread, post or comment was deleted by the author]

  • Loading...
  • zwarich ,
    @zwarich@hachyderm.io avatar
    ALT
  • Reply
  • Loading...
  • @zwarich@hachyderm.io avatar zwarich , to random

    @graydon Someone linked your "things Rust shipped 1.0 without" post and I noticed it included "a compilation model that relies on the order of declarations". What about the traditional ML style of ordered declarations did you dislike?

    zwarich OP ,
    @zwarich@hachyderm.io avatar

    @pervognsen @graydon I once tried to come up with good syntax for this in a braces language, but the obvious choice of mutual with a block seemed a bit awkward. If you indent it, is you have top-level decls that are indented, and if you don’t indent it, then the closing } can be a bit confusing.

    @zwarich@hachyderm.io avatar zwarich , to random

    I'd give up memory safety if life goes back to being like this again: https://www.youtube.com/watch?v=dHbN2xGI4yk

    @pervognsen@mastodon.social avatar pervognsen , to random

    A funny quirk (which I must have mentioned before) is that the trivial kind of type inference you see in 'var x = e' is literally cheaper and easier for the compiler to do than the alternative 'var x: t = e'. You're already computing subexpression types as part of type checking so it's strictly more work for the compiler to process the type-annotated version.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen I actually found that I needed to add the : <type> in Rust way more than expected, and in fact adding new code would make me have to go back and add it in places where I didn't need it before, even though I only gave the compiler more information.

    IIRC, this would mostly happen with return type overloading (e.g. .into() or the like), where the original code just returned the resulting value whereas the the new code inserted uses of the value before the return.

    Lean is a lot smarter in this regard, as it will generally defer type class resolution until more information is available, but I believe I've still seen similar behavior on more rare occasions.

    @pervognsen@mastodon.social avatar pervognsen , (edited ) to random

    Random musing: If T is an immutable (non-interior-mutable), copyable type then it would be nice if r: &T and let t = *r; &t were indistinguishable. But you can cast references to pointers and observe their pointer identity (via address equality), so they are distinguishable. That seems like a tangible way in which the value semantics picture is nicer. Allowing a shared reference's observable identity to change non-deterministically when copied is wrong (copying forces bytewise equality in Rust).

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen I agree. Nothing should expose a pointer address unless it's actually needed for something (and Rust discourages you from actually doing something with the pointer address at both a language and ecosystem level).

    I would have to dig up the thread again, but I believe this issue has come up in practice, where unsafe code that had sufficient usage in the wild would break when the exact sequence of memcpys on the return path would change. I hope I'm not hallucinating this LLM-style.

    @zwarich@hachyderm.io avatar zwarich , to random

    Has anyone ever seen a name for the pattern of a type and its borrowed/mutably borrowed versions, e.g. Vec<T>, &[T], and &mut [T]? It comes up again and again but I've never really seen it discussed, nor have I seen a language that reifies it

    zwarich OP ,
    @zwarich@hachyderm.io avatar

    @pervognsen Yeah, Borrow is broken because it requires you to produce an actual &. I guess there's also ToOwned in the reverse direction? I'm kind of surprised they haven't added the Equivalent trait to the standard library, given that there are multiple independent implementations of it in the wider ecosystem.

    zwarich OP ,
    @zwarich@hachyderm.io avatar

    @pervognsen I must’ve been only thinking about this on a surface level, because once you mentioned GATs I got a flood of memories of trying to fight this stuff on a project, and resisting using unstable Rust to get more GAT support.

    @joe@f.duriansoftware.com avatar joe , to random

    C and C++ should let you use another enum as an enum's underlying type

    ALT
    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe This reminded me that I really wish there was a systems language that separated a type's logical structure from its underlying representation.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe In my mind there are two different forms of this, which I guess roughly correspond to the difference in vibes between C and C++/Rust (or data vs. codata, intensional vs. extensional, etc.).

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe In the C-like instantiation, you can write functions that operate on a single representation, but can be used polymorphically for all types sharing that representation. There are a few interesting consequences of this basic idea.

    One is that undefined padding is discouraged (because it leads to more types having distinct representations) and it makes much more sense to define padding as zero. Another is that you can make a proper distinction between N byte representations and pointers to N byte representations, whereas C famously blurs this line. This also plays well into exposing different aspects of calling conventions directly in the language, e.g. that a value is being returned on the stack (and thus can be accessed via a pointer) as opposed to in registers. Instead of just having a primitive dynamically sized slice representation, you can provide the primitives for users to define representation containing dynamically sized content, potentially in multiple regions with different sizes.

    Where I'm less certain is to what extent you would actually want representation polymorphism. In a way, it seems less C-like, but there are a lot of cases where it would obviously be useful. I'm also not quite sure how you would make generic initialization, copying, equality, hashing, etc. work. Being polymorphic for types with the same representation would obviously avoid a lot of code duplication, but you might still want types that have nonstandard comparison, or for types that require D-style post-blit constructors, etc.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe The Rust-like instantiation is probably more clear, partially because as languages like Rust crank up the degree of abstraction the places where they forgot to abstract things become more and more apparent.

    One basic thing I've complained about a lot is that &/&mut are unnecessarily coupled with pointer identity, which stands out even more when everything else about the language prevents you from ergonomically taking advantage of the pointer identity that was forced upon you. When you adopt this idea of borrowing as a fiction, you can create some interesting interfaces. My favorite toy example is bit stealing. You can steal bits from a pointer, expose both the pointer and tag as distinct fields with borrowing, and then have the restrictions on combining borrows reflect that you can't update them independently of each other. The underlying model of the existing borrow checker already supports much more than is exposed by the Rust language, and I imagine there are even more extensions that could be had.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe Do you have a link to a discussion with examples, or is it not on the forums? On a related note, I find it amusing that these languages generally assume that there a uniform destructor interface that can be implemented. I guess this assumption is fairly successful in practice, but it makes the cases that don't fall under this assumption much more annoying.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe Maybe this is another situation like bytewise equality, zero initialization, etc. where making it easy to be generic encourages proliferation of wasteful alternatives, rather than adding backpressure to encourage use of a more efficient approach. Individual deallocation is expensive (despite heroic attempts to improve its performance, which themselves have tradeoffs that people often ignore), and it's probably better to consider using arenas or some other form of bulk deallocation instead.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe @graydon If you start from scratch with linear permissions, you need to add some form of borrowing by breaking permissions up into pieces and recombining them. The two most popular ways of doing this are fractional permission (start out with 1, get two permissions with 1/2, each of which can be split again) or counting permissions (split into N up front). The programmer is then responsible for all of the permission accounting, which is not very ergonomic.

    Rust's system avoids the need for permission accounting (and for immutable borrows to be affine) by instead checking that all borrows become unusable. The borrow checker is actually more of an escape analysis, checking whether borrowed pointers ever escape.

    The use of subsumption on top of this scheme makes borrowed pointers more interchangeable in API (don't need separate permissions for every param, or even worse, for every entry in a container), but it also integrates into the escape analysis, allowing for borrows to "escape" as long as it is in the appropriate region determined by subtyping.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe @graydon This escape-based approach does exclude a number of idioms, e.g. Rust can't express a struct consisting of a buffer and a borrowed span from the buffer. While you could imagine some extension of Rust that would allow this for simple cases (by exploiting field-sensitivity of the borrow checker), it would be even more difficult to make this work for a Vec of such structs.

    So I guess there are a few interesting subquestions here:

    • Is the Rust model of escape checking for borrows still helpful without lifetime subsumption?
    • Can you make tracked fractional permissions ergonomic enough that programmers will want to use them?
    • Does the answer to the previous question change if you encourage very coarse-grained permissions, e.g. to arenas?
    • Can you come up with a single ownership system that works well for arenas tracked with coarse permissions as well as dynamically sized containers on the stack containing references into arenas?
    zwarich ,
    @zwarich@hachyderm.io avatar

    @joe @graydon Here's a simple example I was thinking about while brushing my teeth. Imagine you have some read-only permission to an arena or data structure, and you want a function that gives you a dictionary of borrowed pointers into the arena. You have two basic ways to represent this with fractional permissions:

    • You can represent the derived pointers in a way that they require the original permission to use. This means that either the derived pointer type refers to the original arena/permission, or that there is some way to externally state that the permission also applies to the derived pointers.
    • You can represent the derived pointers in a way that bundles the (fractional) read-only permission. This seems closer to the usual Rust way of expressing this, except now you need to (logically) return all of those fractional permissions to get back the original permission at some point, and track the relationship between the two until the point that it is actually returned.

    I'm not aware of any type system that is capable of checking the latter (where the number of summands in the decomposition of the fractional permission depends on a runtime value), and it seems like a pretty difficult problem. This suggests that the former (where the number of summands is determined statically) is the more realistic option. Also note that in Rust, both functions would be the same!

    @zwarich@hachyderm.io avatar zwarich , to random
    @pervognsen@mastodon.social avatar pervognsen , to random

    For some reason CompCert decided to call their jump threading pass 'branch tunneling' which I admit does have more sci-fi appeal: https://compcert.org/doc/html/compcert.backend.Tunneling.html

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen The first instance of this terminology I could find is from Xavier Leroy's own paper from 2002: https://xavierleroy.org/publi/oncard-verifier-spe.pdf

    @pervognsen@mastodon.social avatar pervognsen , (edited ) to random

    Aside from the fact that dynamically typed interpreters are easier and most off-the-shelf embeddable runtimes fall in that category, I think an underrated reason why they've been predominant for game scripting is that the late binding makes it much easier to have loosely coupled asset editing on teams. E.g. Unreal Engine's approach with Blueprint requires load-time recompiles if expectations mismatch, and it needs to resolve mismatches at a finer granularity to support a fail-forward workflow.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen This has strong compatibility with my hot take that the only way to deliver a great debugging experience for optimized code is to do have a runtime that supports dynamic deoptimization.

    @pervognsen@mastodon.social avatar pervognsen , to random

    Sometimes The Algorithm delivers: https://www.youtube.com/@ShenmueFaces/videos

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen If Shenmue and Xenogears coming out in the same 11 month span isn't proof that the 90s were the peak of human civilization, I don't know what is.

    @zwarich@hachyderm.io avatar zwarich , to random

    My coworker Joachim Breitner wrote this blog post a while back about optimizing the representation of Option in languages with ostensibly uniform representation of data: https://www.joachim-breitner.de/blog/787-A_mostly_allocation-free_optional_type

    @zwarich@hachyderm.io avatar zwarich , to random

    If he used reds instead, his next book would be The End of Hysteresis.

    @pervognsen@mastodon.social avatar pervognsen , (edited ) to random

    Fun (horrifying) fact about const generics as mentioned in boxy's talk at Rust Week, but not at all Rust specific. If you want to support const values as parameters for generic types, you'd expect to lift value-level equality to type-level equality, e.g. the array types [T; M] and [T; N] are equal if and only if M == N. Now imagine what happens if you try to generalize that lifting rule to floats, where NaNs mess up the value-level equality.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen Working on a dependently typed language has made me realize that the solutions to these problems proposed for languages like Rust etc. are more ad-hoc and ultimately more complex than just collapsing the distinction between terms and types.

    Of course, this isn't completely fair, because it's more difficult to accommodate the other features that people desire from such languages, but maybe we'll see some more progress there in the near future.

    @pervognsen@mastodon.social avatar pervognsen , to random

    Maybe it's asking too much to get LLVM to optimize i-1 < n to i <= n in a context where it knows that i > 0, but does anyone know if it's supposed to have this optimization and something is blocking it? https://godbolt.org/z/Tj59qnTeh

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen Unrelated to actually solving your problem, but this question made me realize that it would be possible to implement "correlated value propagation" (or whatever you call these sorts of transformations) using saturation for difference logic. This was floating around in my head because such a procedure is being added to Lean's grind tactic:
    https://github.com/leanprover/lean4/blob/master/src/Lean/Meta/Tactic/Grind/Arith/Offset/Main.lean
    I'm curious if this would be sufficiently efficient in practice.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen Yeah, they're the same basic algorithm with different semirings. 2SAT uses the boolean semiring, and difference logic (or UTVPI) uses the tropical semiring. I wonder if there are any other logical decision procedures that arise from this. Searching just indicates probabilistic problems, routing, scheduling, etc.

    @pervognsen@mastodon.social avatar pervognsen , (edited ) to random

    An interesting thing about proving program correctness via loop invariants is that 'while i != n' yields a better postcondition for the loop (namely i = n) than 'while i < n'. That only helps with partial correctness, so you still have to prove termination for total correctness. But it organizes the proof better: it's clear the termination measure should be n - i, which means you need to prove i <= n as a loop invariant and that i increases each iteration.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen Amusingly. Lean will prove termination of the latter by default but not the former (since all local recursive definitions are implicitly lifted to the top level and thus need to terminate for all inputs):

    def f1 (n : Nat) : Nat :=
    loop 0 0
    where
    loop accum i :=
    if i < n then
    loop (accum + 1) (i + 1)
    else
    accum

    def f2 (n : Nat) : Nat :=
    loop 0 0
    where
    loop accum i :=
    if i != n then
    loop (accum + 1) (i + 1)
    else
    accum

    If we add the necessary precondition, it does work though:

    def f3 (n : Nat) : Nat :=
    loop 0 0 (by omega)
    where
    loop accum i (_ : i ≤ n) :=
    if _ : i ≠ n then
    loop (accum + 1) (i + 1) (by omega)
    else
    accum

    @pervognsen@mastodon.social avatar pervognsen , to random

    There are two reasonable justifications for encapsulation. The fundamental justification is preserving data type invariants, especially when safety is involved. So it's very strange to me that the OOP people hardly talk about invariants and only talk about reducing coupling to make it easier to change the implementation.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen I think all of the standard OOP contract systems (Eiffel, Spec#, C++ contracts, etc.) have class invariants in addition to method pre/postconditions. I'm kind of surprised these never made it mainstream, although I guess C++ contracts might still achieve mainstream status, albeit in the dying days of the language.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @foonathan @pervognsen That's sad. I could've sworn I saw them in an example a while ago.

    @pervognsen@mastodon.social avatar pervognsen , (edited ) to random

    I have a hunch Miri and MiniRust's approach to pointer provenance via byte-granularity shadow memory at the abstract machine level could be used to support static initialization in a very general way. Namely, you'd like to be able to dynamically allocate memory during compile-time evaluation and then use that as a static initializer for static runtime data. In theory the allocation metadata from the pointer provenance in the compile-time shadow memory should let you generate relocations.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen I saw this post in isolation from its grandparent and was about to reply "well clearly you should be able to rely on a provenance model to do this", and then I scrolled up.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen @pkhuong This comment made me look into the actual plan:
    https://github.com/rust-lang/const-eval/issues/20
    In the end, it’s not actually all that different than the SBCL image dumper. They check whether any allocations are still reachable, throw an error if any of those were deallocated, and then make new static objects for any that remain.

    Unfortunately, it seems like there are a lot of annoying problems to solve in order to make this real.

    @pervognsen@mastodon.social avatar pervognsen , (edited ) to random

    Ralf Jung's talk on MiniRust: a core language for specifying Rust. I've been following MiniRust on and off and it seems to have gotten quite far now. The talk might be interesting to folks who are interested in how to precisely specify low-level languages with UB in an accessible style, cc @regehr . https://www.youtube.com/watch?v=UKCUZtr378s&t=18360s

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen @regehr I guess the main message of the talk is that for (single-threaded) PL semantics, (executable) operational semantics are preferable to axiomatic semantics?

    I'm curious what Ralf would suggest for a multi-threaded semantics that includes memory consistency. I guess in a way there's not as much of a benefit to specifying that, since practically speaking Rust is stuck with C++'s memory model (as interpreted by LLVM), despite there being some interesting tweaks that you could attempt in a higher-level language like Rust.

    zwarich ,
    @zwarich@hachyderm.io avatar

    @pervognsen @regehr Interesting that they already have a model for sequentially consistent accesses. Due to my biases and past work, I honestly sometimes forget that people care about non-relaxed shared memory.

    I meant that executable semantics become a bit less useful (or at least a bit less incomplete) when the execution becomes nondeterministic, although thinking about it some more I guess model checking a non-deterministic operational semantics might result in a more meaningful trace than an axiomatic semantics.

    A good reminder that I need to actually understand promising semantics, though.