Mistral Small Creative beats Claude Opus 4.5 at explaining transformers — 50x cheaper, higher scores

☆ Yσɠƚԋσʂ ☆ · 1 month ago

Mistral Small Creative beats Claude Opus 4.5 at explaining transformers — 50x cheaper, higher scores

peeonyou [he/him]@hexbear.net · edit-2 1 month ago

i used mistral to do some small coding changes and fix my userscript for a piracy site and it was damn good compared with ChatGPT which couldn’t seem to figure out how to do what I wanted to do.

I initially used chatgpt to create a firefox extension that would grab the steam review summary and post the link to the steam page when I clicked on a game on a specific pirate website. It took several hours of broken after broken attempt, always getting so close, but not quite getting it right, until finally it did. All of that only to realize I couldn’t actually use the stupid extension because firefox won’t let you load an unsigned extension unless you jump through hoops each and every time you start firefox. That kinda pissed me off that I went through all that only for chatgpt to tell me in the end, “oh yeah, you didn’t know that? too bad.”

So then I spent another couple hours with it converting that simple extension into a userscript for tampermonkey. It never quite got it right. It absolutely could not figure out how to get the summary of steam reviews. It would sometimes come close, but in the end I just had a link to the steam page mostly, which I guess was really the main thing I wanted. But for all that effort it was just ridiculous and I probably could have figured it out myself in that amount of time anyway.

Then just a couple days ago it bothered me again that such a simple thing was seemingly so difficult. I was reading about opencode, installed it, pointed it at Mistral through openrouter, and within 15 minutes it completely fixed the script after a couple of bad regex iterations and some tweaks to ensure it only worked on the games pages, and cleaned up the dozens of bad versions from chatgpt. It actually made it work so much better than I had expected, creating a nice little summary overlay on the page at the top, color coding the review summary and providing the numbers I wanted and everything. All using just the free model.

ChatGPT kept telling it might not even be possible which seemed insane considering we had the extension working close to what I wanted.

I’m going to keep using non-US models for things because I swear the US models are set to get you 95% of the way there and then just go down bullshit rabbit holes and waste time and energy going in circles only to ultimately fail to do the thing. This has been a repeated experience for me with both Claude and ChatGPT.

☆ Yσɠƚԋσʂ ☆ · 1 month ago

I think it’s very plausible that within a year or two we’ll have models that can run locally and do serious coding comparable to what Claude can do right now. There are two parallel tracks here too. One is models themselves improving, but the other is the tooling and the workflow. The thing I made for storing state outside the context is a good example of what’s possible and is still relatively unexplored. The less context the model needs to do its work the more viable small models become.

dil@piefed.zip · 1 month ago

My biggest issue with ai is it commite and no matter what you tell it, it’ll keep giving the same exact wrong answer phrased differently, “sorry heres the correct one… its the same exact shit that was wrong before”

Can’t even trial and error properly without changing the initial prompt.

JoeByeThen [he/him, they/them]@hexbear.net · 1 month ago

Just fyi if you wanna run homemade extensions in firefox use the dev edition. I went through a similar struggle.

https://www.firefox.com/en-US/channel/desktop/developer/

peeonyou [he/him]@hexbear.net · 1 month ago

yeah… that was a suggestion from gpt later on too. I was so pissed at that point that I just kinda gave up on the whole idea and figured maybe a userscript would be better anyway.

dat_math [they/them]@hexbear.net · 1 month ago

Why is explaining transformers, instead of a concept unlikely to be anywhere in the training data in copyable/memorizable form, the benchmark of interest here?

Mistral Small Creative beats Claude Opus 4.5 at explaining transformers — 50x cheaper, higher scores

Mistral Small Creative beats Claude Opus 4.5 at explaining transformers — 50x cheaper, higher scores

Explain how transformer neural networks work. Provide two explanations: (1) For a junior developer with no ML background, (2) For a senior ML engineer who knows CNNs/RNNs.