Independent. Previously: TF@Roblox

meshoptimizer, pugixml, volk, calm, niagara, qgrep, Luau

This profile is from a federated server and may be incomplete. View on remote instance

@zeux@mastodon.gamedev.place avatar zeux , to random

I was hoping Lobsters would be "HN but without half the front page filled with AI".

Instead Lobsters is "HN and half the front page is filled with AI"; moreover, instead of a mix of "posts about AI slop" and "posts about AI model innovations" it's just "posts about AI slop".

zeux OP ,
@zeux@mastodon.gamedev.place avatar

@pervognsen Ah I was assuming filtering only works for people with an account but looks like it should save settings to cookies otherwise.

zeux OP ,
@zeux@mastodon.gamedev.place avatar

@dotstdy @pervognsen Honestly part of this is that filtering settings are at the very bottom of the page; I expected some menu at top available for logged in users so haven’t noticed them until now.

@zeux@mastodon.gamedev.place avatar zeux , (edited ) to random

Yesterday on stream we've sketched out most of the support code for descriptor heaps. If you want to see the rest of the owl, I've committed the remaining pieces after the stream. For now all of the previous descriptor code is still required to support current production drivers; hopefully in a few months we can do a cleanup pass and drop support for pre-descriptor-heap drivers (and remove samplerhack).

https://github.com/zeux/niagara/compare/f25b318a924a70ad98cb43bd6bbd5b1dd04c7780...8cf95feee3e7f37b2864795dbcbbf0acc375860c

@pervognsen@mastodon.social avatar pervognsen , (edited ) to random

Someone on Lobsters wondered "how a modern compiler would fare against hand-optimized asm" in reference to Abrash's TransformVector (3x3 matrix-vector multiply) hand-written x87 routine in Quake. Oh my sweet summer child. Has there ever been a compiler that did an amazing job for x87 in-order dual-pipe scheduling? The entangling of register allocation, instruction selection and instruction scheduling is like nightmare difficulty mode for compiler backends.

zeux ,
@zeux@mastodon.gamedev.place avatar

@pervognsen A question I'm more curious about is what is the delta on a modern OOO CPU?

zeux , (edited )
@zeux@mastodon.gamedev.place avatar

@rygorous @wolf480pl @pervognsen ARM64 kinda looks like huffman coding sometimes, only without variable length output you’re reduced to truncating inputs arbitrarily.

zeux ,
@zeux@mastodon.gamedev.place avatar

@rygorous @TomF @wolf480pl @pervognsen I think I posted stats for some random programs a little while back and it was amusing to see x64 average close to 4 bytes per instruction. Lots of 1-2 bytes instructions but also lots of 5-8+ so it all blends. Although I didn’t do instruction counts which might be a little smaller vs fixed length instruction sets.

@zeux@mastodon.gamedev.place avatar zeux , to random

“Shortly before beginning the GNU Project, I heard about the Free University Compiler Kit, also known as VUCK. (The Dutch word for “free” is written with a v.)”

@zeux@mastodon.gamedev.place avatar zeux , (edited ) to random

It's funny to see some folks advocate ECS but if you ask them what they think of GoF design patterns they'd bring out the pitchforks.

zeux OP ,
@zeux@mastodon.gamedev.place avatar

@pervognsen So to me GoF patterns are about:

  • a core of the idea that is sensible
  • wrapped into restrictive language specific monstrosity
  • blindly applied to problems regardless of the fit
  • replacing problem solving and thinking with "pattern" matching
  • with a pointless taxonomy nobody asked for.

... wait, but ECS is all of these too.

@zeux@mastodon.gamedev.place avatar zeux , to random

Upcoming Niagara stream! This Sunday (Feb 15), at 11 AM PST (7 PM GMT), we will embark on a journey to replace all descriptor uses within the renderer with descriptor heaps. https://www.youtube.com/live/VXN4Gewjk4k

zeux OP ,
@zeux@mastodon.gamedev.place avatar

Starting in 30 minutes!

@el0j@mastodon.gamedev.place avatar el0j , to random

@zeux FYI seeing a validation error on niagara after upgrading to SDK 1.4.341.1, where creating the 'clusterPipeline' raises:

"ERROR: vkCreateGraphicsPipelines(): pCreateInfos[0] The Mesh Shader has a TaskPayloadWorkgroupEXT variable but there is no Task Shader to set the payload values."

Ref: VUID-RuntimeSpirv-MeshEXT-10883

zeux ,
@zeux@mastodon.gamedev.place avatar

@el0j Yes I saw it today too; I’ll push a fix later. I filed an issue in Vulkan-Docs on this - it’s quite unclear to me if it was okay for the spec to break valid usage retroactively.

@zeux@mastodon.gamedev.place avatar zeux , to random

Would it kill appliance manufacturers to ship like 16 bytes of non-volatile RAM with every microwave, oven, et al, so that my settings don't randomly reset when power surges/outages happen?

zeux OP ,
@zeux@mastodon.gamedev.place avatar

@cmik ... and a CMOS battery.

@aras@mastodon.gamedev.place avatar aras , to random

Unity's transformation into where everyone at top level comes from Zynga is complete! https://www.businesswire.com/news/home/20260210709281/en/Unity-Appoints-Bernard-Kim-to-its-Board-of-Directors-and-Announces-Board-Transitions :unity:

(David is no longer on the board... oof)

zeux ,
@zeux@mastodon.gamedev.place avatar

@aras Even the company trajectory matches!

@zeux@mastodon.gamedev.place avatar zeux , to random

Hmm this might be just the thing

@zeux@mastodon.gamedev.place avatar zeux , to random

alright time to stop thinking about meshlets and meshes for a week

@dougall@mastodon.social avatar dougall , (edited ) to random

Three instruction NEON float prefix sum. I'd wanted to abuse FCMLA (floating-point complex multiply accumulate) for non-complex arithmetic for so long, and I finally came up with something :)

With two unnecessary multiplies to save one instruction, this may only work out on Apple CPUs, but it's a bit of fun.

(For loops you can broadcast the carried value with vfmaq_laneq_f32(scan, ones, prev, 3) for three multiplies saving two instructions. LLVM fights you on that, though.)

[oops, see reply]

ALT
zeux ,
@zeux@mastodon.gamedev.place avatar

@dougall This is awesome. How low can integers go?

zeux , (edited )
@zeux@mastodon.gamedev.place avatar

@dougall I was hopeful for the SWAR trick with uint16 on x64 but in addition to range issues it was just plain slower than the standard log2 construction. I’m mainly interested in uint32x4 now but that needs u64 multiply…

It’s pretty disappointing that you need 4 instructions to do 6 scalar adds…

zeux ,
@zeux@mastodon.gamedev.place avatar

@rygorous @dougall Yeah I meant even if you aren’t reusing any results. I assume it’s more work to do dependent adds (addv vs add) but still would have hoped it’d be less.

zeux ,
@zeux@mastodon.gamedev.place avatar

@dougall @rygorous Yeah this version is worse for me vs the naive version (add/ext/add/ext + add/dup). Need to see if I can make USRA work...

zeux ,
@zeux@mastodon.gamedev.place avatar

@dougall @rygorous The single-mlaq variant is also worse, if the final addition is done via add/dup. There of course mla is still on the critical path. Fixing that by using VADDV seems to regress perf further :( I might be doing something wrong.

And yeah for USRA to work things can't overflow which I can't guarantee. A reverse order might have worked but the overflow needs to work; this is one of the issues I ran into with SWAR attempts and had to hack around it with extra ops to mask bits off.

zeux , (edited )
@zeux@mastodon.gamedev.place avatar

@dougall @rygorous Yeah this is maddening. You need c0011 to be a constant so that compiler can hoist the load out of the loop; but you need it to be opaque to the compiler! I'm tired of LLVM just constantly making wrong decisions around intrinsics.

Anyway if I propagate the value manually but load the initial state out of a const volatile, I get maybe a little faster with a single-mla variant? I can measure ~2% delta on a larger algorithm so probably 5+% on this loop. two-mla variant is ~same.

zeux ,
@zeux@mastodon.gamedev.place avatar

@dougall @rygorous Oh the asm trick actually works inside the loop? That kinda defies expectations. But maybe with that I can actually ship this...

@zeux@mastodon.gamedev.place avatar zeux , to random

instruction at an offset 0xbad can't possibly be good

zeux OP ,
@zeux@mastodon.gamedev.place avatar

brought to you by yet another suboptimal LLVM codegen that is due to LLVM knowing too much (bits) I think.

zeux OP ,
@zeux@mastodon.gamedev.place avatar
@zeux@mastodon.gamedev.place avatar zeux , to random

MSVC in a nutshell

@zeux@mastodon.gamedev.place avatar zeux , to random

Code reuse is all fun and games until the compiler decides you've called the function one time too many to inline it.

@zeux@mastodon.gamedev.place avatar zeux , to random

I don't have a problem
I can stop any time I want

@zeux@mastodon.gamedev.place avatar zeux , to random

I've been using niri+DMS for the last few weeks as my daily driver and I like it a lot. There were paper cuts that I mostly resolved with a few config tweaks. Still using some Gnome apps for stuff because I don't want to hunt down alternatives and they work fine; this is still on Ubuntu, niri+DMS is like two apt install commands away (plus some config tweaks).

For the first two weeks I haven't even rebooted, just switched compositor/shell setup live and kept going. Love that this is possible.

@zeux@mastodon.gamedev.place avatar zeux , to random

Still some bits to sand off but getting closer.

zeux OP ,
@zeux@mastodon.gamedev.place avatar

and closer

@zeux@mastodon.gamedev.place avatar zeux , to random

The random differences in assembly mnemonics are so annoying. There's almost zero chance that you can copy assembly out of tool A and give it to tool B that consumes assembly without having to patch things.

Things like hex literals, allowed label characters, differences in instruction names, differences in memory operands, etc.

That's after you account for completely fundamental changes in argument order and register syntax (Intel vs AT&T).

@zeux@mastodon.gamedev.place avatar zeux , to random

quints are out
trits are in

Where we're going we don't need no stinkin' bits.

I'm sure I won't regret this.

@zeux@mastodon.gamedev.place avatar zeux , to random

Seriously considering some combination of trits and quints.

@zeux@mastodon.gamedev.place avatar zeux , to random

Since Panther Lake reviews (previews?) seem quite positive, it would be quite funny if Intel looked at their results, decided power consumption is VERY VERY VERY important, and removed AVX10 from their upcoming NVL based laptops because you can't have nice things.

@zeux@mastodon.gamedev.place avatar zeux , to random

They weren't kidding about the ultimate sin huh.

image/png

@zeux@mastodon.gamedev.place avatar zeux , to random

Every time I read a post about Taylor series that goes “actually it’s at 0 so it’s Maclaurin” it feels like a GNU/Linux moment.

@zeux@mastodon.gamedev.place avatar zeux , to random

tfw you can not afford one byte

@zeux@mastodon.gamedev.place avatar zeux , to random

This is quite welcome and seemingly quite drastic. Issues, at least normal-sized, load pretty much instantly for me - and it's a very visible and stark contrast next to any other page (e.g. issue list is slow, PRs are slow, individual issues are super quick).

I hope this is the first of many similar changes to restore GitHub to its former glory.

https://github.blog/changelog/2026-01-22-faster-loading-for-github-issues/

@zeux@mastodon.gamedev.place avatar zeux , to random

good news: my great idea from a little while ago is still paying dividends

this time there's no bad news
yet

@zeux@mastodon.gamedev.place avatar zeux , to random

good news: i got a cool idea
bad news: cool idea is not that good

bleh

@zeux@mastodon.gamedev.place avatar zeux , to random

pareto frontier breach imminent

@zeux@mastodon.gamedev.place avatar zeux , to random

Whoever came up with push-and-turn or squeeze-and-turn bottle cap designs is a monster.

@zeux@mastodon.gamedev.place avatar zeux , to random

Macros are obviously superior to functions in every possible way:

  • no runtime cost
  • no need to parse body until invocation
  • no possibility of dangerous recursive calls
  • can redefine or undefine to avoid symbol conflicts
  • automatically adapts to argument types without expensive templates
  • supports first class lamba expressions without ugly C++ lambda syntax
  • promotes open source code

The only downside I can see is you need at least one function in your program: main() can’t be a macro :(

@zeux@mastodon.gamedev.place avatar zeux , to random

Heard the term “fan fiction” in relation to programming languages and I love it.