[WP] A new season of River Monsters

Riskable@programming.dev · 2 days ago

The assumption here is that the AI-generated code wasn’t reviewed and polished before submission. I’ve written stuff with AI and sometimes it does a fantastic job. Other times it generates code that’s so bad it’s a horror show.

Over time, it’s getting a little bit better. Years from now it’ll be at that 99% “good enough” threshold and no one will care that code was AI-generated anymore.

The key is that code is code: As long as someone is manually reviewing and testing it, you can save a great deal of time and produce good results. It’s useful.

Riskable@programming.dev · 2 days ago

It would turn into a prince! An unstable prince of undefined behavior.

Riskable@programming.dev · 2 days ago

These labs created over 24,000 fraudulent accounts…

No. They were not fraudulent. They were just accounts.

Fraudulent accounts would be something like using a stolen credit card number or using someone else’s real world ID info when you’re not that person.

That’s like saying this “riskable” account is fraudulent because I didn’t use my real name when I signed up.

Also remember that violating a TOS is not a criminal offense. That’s a civil thing that can be fought in courts or via arbitration (if the TOS requires it). The CFAA has long since been severely neutered on that front.

Riskable@programming.dev · 3 days ago

That’s the thing, though: Removing that mercury filtration system is going to be expensive and require re-certification. There’s no point. The price has already been paid.

It’s like someone heard that mercury removal from coal power plants costs a lot of money and saves lives so they said to themselves, “saving lives‽ We can’t have that! The more people that die from pollution, the better!” And set policy based on that instead of examining the reality of the economics.

Riskable@programming.dev · 3 days ago

You still have to press “play” on an mp3. AI models just require vastly more steps in order to be useful.

Riskable@programming.dev · 3 days ago

Whoah there! I was talking about communism, specifically. Not socialism (which isn’t well defined).

Riskable@programming.dev · 3 days ago

You can functionality copy Shakespeare with enough random words being generated. That’s the argument you’re making here.

If you prompt an LLM to finish sentences enough times (like the researchers did, referenced in the article) you can get it to output whatever TF you want.

Wait: Did you think the researchers got these results on the first try? You do realize they passed zillions of prompts into these LLMs until it matched the output they were looking for, right?

It’s not like they said, “spit out Harry Potter” and it did so. They gave the LLM partial sentences and just kept retrying until it generated the matching output. The output that didn’t match was discarded and then the final batch of matching outputs were thrown together in order to say, “aha! See? It can regurgitate text!”

Try it yourself: Take some sentences from any popular book, cut them in half, and tell Claude to finish them. You’ll be surprised. Or maybe not if you remember that RNG is at the core of all LLMs.

Riskable@programming.dev · 3 days ago

To be fair, the big AI companies are just applying the science in order to profit from it. The science behind LLMs is innocent enough. It’s some very specific, money-making applications of that science that are pissing people off.

Reading all these replies… Ugh. It’s so obvious none of these people understand how LLMs work. Not how the training happens either.

Somehow people got it into their heads that LLMs are “plagiarism machines” and that image stuck. LLMs aren’t copying anything when they generate output! If they do, that’s a flaw in their training and AI researchers are always trying to spot and fix things like that. Why? Because it’s those same flaws that allow 3rd parties to understand and copy how their models work (and can create security issues).

Riskable@programming.dev · 3 days ago

No. A .zip file is designed to be eventually decompressed. A .safetensors file is in its final form (which is already compressed somehow… I think).

Riskable@programming.dev · 3 days ago

I suspect that OpenAI is on a path to spend as much money on data centers and hardware as possible so they can go bankrupt and sell those assets to Microsoft for pennies on the dollar.

It’s a scheme to filter billions and billions of dollars around the “AI everything” hype train directly into Microsoft’s coffers.

Riskable@programming.dev · 3 days ago

Generally speaking, communism usually starts off great for the majority of people. Brings people out of poverty and whatnot. Very, very bad for the rich and upper middle classes but overall the public benefits.

Then authoritarianism kicks in and everything goes to shit really fast. People very quickly lose equality and equal treatment as a result.

Corruption is the biggest, inevitable problem because people naturally want to improve their position relative to their peers. Since that’s incredibly difficult under communism, you end up with lots of quid pro quo. Underground, black markets for anything and everything take hold and become just as important as the main economy.

Basically, it never works out. The end result is authoritarianism and deep corruption every time. Just like other forms of government! Except with communism, the pressures of the system force these sorts of problems to arise much faster.

Riskable@programming.dev · 3 days ago

Of all the things… Do they have meetings where they’re like, “we’re not doing enough pointlessly evil things that have no benefit to anyone!” ‽

Riskable@programming.dev · 3 days ago

I’ve been writing ideas down for years and the other day I pasted a whole bunch of them into Gemini, asking it which would make for the best novel (I’ve written a novel before). Whether or not I write any of it, it was still a fun experience.

It was neat to see what an LLM thought would make for the funniest story, most marketable, most likely to become a cult classic (haha). It also refused to process a bunch of them for being too spicy (haha).

It’s a fun exercise if you don’t hate AI 🤷

Riskable@programming.dev · 3 days ago

I know someone who tried this. Their ghost still haunts the simulation.

Riskable@programming.dev · 3 days ago

You’re missing the boat entirely. Think about how an AI model is trained: It reads a section of text (one context size at a time), converts it into tokens, then increases a floating point value a little bit or decreases it a little bit based on what it’s already associated with the previous token.

It does this trillions of times on zillions of books, articles, artificially-created training text (more and more, this), and other similar things. After all of that, you get a great big stream of floating point values you write out into a file. This file represents the a bazillion statistical probabilities, so that when you give it a stream of tokens, it can predict the next one.

That’s all it is. It’s not a database! It hasn’t memorized anything. It hasn’t encoded anything. You can’t decode it at all because it’s a one-way process.

Let me make an analogy: Let’s say you had a collection of dice. You roll them each, individually, 1 trillion times and record the results. Except you’re not just rolling them, you’re leaving them in their current state and tossing them up into a domed ceiling (like one of those dice popper things). After that’s all done you’ll find out that die #1 is slightly imbalanced and wants to land on the number two more than any other number. Except when the starting position is two, then it’s likely to roll a six.

With this amount of data, you could predict the next roll of any die based on its starting position and be right a lot of the time. Not 100% of the time. Just more often than would be possible if it was truly random.

That is how an AI model works. It’s a multi-gigabyte file (note: not terabytes or petabytes which would be necessary for it to be possible to contain a “memorized” collection of millions of books) containing loads of statistical probabilities.

To suggest its just a shitty form of encoding is to say that a record of 100 trillion random dice rolls can be used to reproduce reality.

Riskable@programming.dev · 3 days ago

A .safetensors file (an AI model) is literally just an array of arrays of floating point values. They’re not “encoded tokens” or words or anything like that. They’re absolute nonsense until an inference step converts a prompt into something you can pass through it.

It’s not like a .mp3 file for words. You can’t covert it back into anything remotely resembling human-readable text without inference and a whole lot of matrix multiplication.

If you understand how the RNG is used to pick the next token you’ll understand why it’s not a database or anything like it. There’s no ACID compliance. You can’t query it. It’s just a great big collection of statistical probabilities.

Riskable@programming.dev · 3 days ago

This is my biggest

Huge golden pothos leaf, larger than adult human male hand with big hands

Riskable@programming.dev · 3 days ago

By asking models to complete sentences from a book, Gemini 2.5 regurgitated 76.8 percent of Harry Potter and the Philosopher’s Stone with high levels of accuracy, while Grok 3 generated 70.3 percent.

Ugh. We’re back to this nonsense? “Finishing sentences” != “Memorizing entire books”

Finish this sentence: “We could have been killed—or worse, _______”

Turns out that if you take every sentence from a popular book like Harry Potter and the Sorcerer"s Stone, remove a few words at the end, and then ask an LLM to finish it, it’ll get it right most of the time.

This is true for LLMs that have not been trained with that book.

Why is this, then? How is it possible that an LLM could complete sentences so effectively? Even when it hasn’t been trained on that specific novel?

Human works aren’t as unique as you think they are.

The only reason why LLMs work in the first place is because human writing is so easy to predict that you can throw an RNG at any given prompt and plug that into a statistical model of the most likely word to come after any given word and get a result that sounds legit. That’s why it hallucinates all the time! It’s because it’s just a word prediction machine.

An AI model is not a database. It doesn’t store books. It doesn’t even really memorize anything. It’s literally just an array of arrays of floating point values that predict tokens.

It’s also wickedly complicated and seems like magic. If you don’t understand how it works it’s easy to fall into the “it’s plagiarism!” beleif. It’s not. If you believe that, you have been fooled! You’re believing that it’s actually intelligent in some way and not just a statistical representation of human output.

There’s all kinds of things bad about commercial LLMs but “memorization” isn’t one of them. That’s an illusion.

Riskable@programming.dev · 4 days ago

The 2D vs 3D debate rages on…

Riskable@programming.dev · 4 days ago

Within the browser, it’ll work to “protect” your traffic (including DNS) from prying eyes locally. As in, someone on the same network as you or your ISP or whatever networks your traffic passes through to its destination.

Instead, it sends it all to Microsoft Central Data Collection™! By passing all your traffic through Microsoft’s central servers, you can rest easy, knowing precisely who is inspecting everything you do (including the US government and the other countries in the Five Eyes network).

Let’s be honest: It’s yet another unfair transfer of power from local criminals to international ones, increasing the wealth of billionaire pedophiles. Give the locals a chance to rise up, would ya?