Father, Hacker (Information Security Professional), Open Source Software Developer, Inventor, and 3D printing enthusiast

  • 41 Posts
  • 2.27K Comments
Joined 3 years ago
cake
Cake day: June 23rd, 2023

help-circle
  • The assumption here is that the AI-generated code wasn’t reviewed and polished before submission. I’ve written stuff with AI and sometimes it does a fantastic job. Other times it generates code that’s so bad it’s a horror show.

    Over time, it’s getting a little bit better. Years from now it’ll be at that 99% “good enough” threshold and no one will care that code was AI-generated anymore.

    The key is that code is code: As long as someone is manually reviewing and testing it, you can save a great deal of time and produce good results. It’s useful.







  • You can functionality copy Shakespeare with enough random words being generated. That’s the argument you’re making here.

    If you prompt an LLM to finish sentences enough times (like the researchers did, referenced in the article) you can get it to output whatever TF you want.

    Wait: Did you think the researchers got these results on the first try? You do realize they passed zillions of prompts into these LLMs until it matched the output they were looking for, right?

    It’s not like they said, “spit out Harry Potter” and it did so. They gave the LLM partial sentences and just kept retrying until it generated the matching output. The output that didn’t match was discarded and then the final batch of matching outputs were thrown together in order to say, “aha! See? It can regurgitate text!”

    Try it yourself: Take some sentences from any popular book, cut them in half, and tell Claude to finish them. You’ll be surprised. Or maybe not if you remember that RNG is at the core of all LLMs.


  • To be fair, the big AI companies are just applying the science in order to profit from it. The science behind LLMs is innocent enough. It’s some very specific, money-making applications of that science that are pissing people off.

    Reading all these replies… Ugh. It’s so obvious none of these people understand how LLMs work. Not how the training happens either.

    Somehow people got it into their heads that LLMs are “plagiarism machines” and that image stuck. LLMs aren’t copying anything when they generate output! If they do, that’s a flaw in their training and AI researchers are always trying to spot and fix things like that. Why? Because it’s those same flaws that allow 3rd parties to understand and copy how their models work (and can create security issues).




  • Riskable@programming.devtoMemes@lemmy.mlSign check
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    33
    ·
    3 days ago

    Generally speaking, communism usually starts off great for the majority of people. Brings people out of poverty and whatnot. Very, very bad for the rich and upper middle classes but overall the public benefits.

    Then authoritarianism kicks in and everything goes to shit really fast. People very quickly lose equality and equal treatment as a result.

    Corruption is the biggest, inevitable problem because people naturally want to improve their position relative to their peers. Since that’s incredibly difficult under communism, you end up with lots of quid pro quo. Underground, black markets for anything and everything take hold and become just as important as the main economy.

    Basically, it never works out. The end result is authoritarianism and deep corruption every time. Just like other forms of government! Except with communism, the pressures of the system force these sorts of problems to arise much faster.



  • I’ve been writing ideas down for years and the other day I pasted a whole bunch of them into Gemini, asking it which would make for the best novel (I’ve written a novel before). Whether or not I write any of it, it was still a fun experience.

    It was neat to see what an LLM thought would make for the funniest story, most marketable, most likely to become a cult classic (haha). It also refused to process a bunch of them for being too spicy (haha).

    It’s a fun exercise if you don’t hate AI 🤷



  • You’re missing the boat entirely. Think about how an AI model is trained: It reads a section of text (one context size at a time), converts it into tokens, then increases a floating point value a little bit or decreases it a little bit based on what it’s already associated with the previous token.

    It does this trillions of times on zillions of books, articles, artificially-created training text (more and more, this), and other similar things. After all of that, you get a great big stream of floating point values you write out into a file. This file represents the a bazillion statistical probabilities, so that when you give it a stream of tokens, it can predict the next one.

    That’s all it is. It’s not a database! It hasn’t memorized anything. It hasn’t encoded anything. You can’t decode it at all because it’s a one-way process.

    Let me make an analogy: Let’s say you had a collection of dice. You roll them each, individually, 1 trillion times and record the results. Except you’re not just rolling them, you’re leaving them in their current state and tossing them up into a domed ceiling (like one of those dice popper things). After that’s all done you’ll find out that die #1 is slightly imbalanced and wants to land on the number two more than any other number. Except when the starting position is two, then it’s likely to roll a six.

    With this amount of data, you could predict the next roll of any die based on its starting position and be right a lot of the time. Not 100% of the time. Just more often than would be possible if it was truly random.

    That is how an AI model works. It’s a multi-gigabyte file (note: not terabytes or petabytes which would be necessary for it to be possible to contain a “memorized” collection of millions of books) containing loads of statistical probabilities.

    To suggest its just a shitty form of encoding is to say that a record of 100 trillion random dice rolls can be used to reproduce reality.


  • A .safetensors file (an AI model) is literally just an array of arrays of floating point values. They’re not “encoded tokens” or words or anything like that. They’re absolute nonsense until an inference step converts a prompt into something you can pass through it.

    It’s not like a .mp3 file for words. You can’t covert it back into anything remotely resembling human-readable text without inference and a whole lot of matrix multiplication.

    If you understand how the RNG is used to pick the next token you’ll understand why it’s not a database or anything like it. There’s no ACID compliance. You can’t query it. It’s just a great big collection of statistical probabilities.



  • By asking models to complete sentences from a book, Gemini 2.5 regurgitated 76.8 percent of Harry Potter and the Philosopher’s Stone with high levels of accuracy, while Grok 3 generated 70.3 percent.

    Ugh. We’re back to this nonsense? “Finishing sentences” != “Memorizing entire books”

    Finish this sentence: “We could have been killed—or worse, _______”

    Turns out that if you take every sentence from a popular book like Harry Potter and the Sorcerer"s Stone, remove a few words at the end, and then ask an LLM to finish it, it’ll get it right most of the time.

    This is true for LLMs that have not been trained with that book.

    Why is this, then? How is it possible that an LLM could complete sentences so effectively? Even when it hasn’t been trained on that specific novel?

    Human works aren’t as unique as you think they are.

    The only reason why LLMs work in the first place is because human writing is so easy to predict that you can throw an RNG at any given prompt and plug that into a statistical model of the most likely word to come after any given word and get a result that sounds legit. That’s why it hallucinates all the time! It’s because it’s just a word prediction machine.

    An AI model is not a database. It doesn’t store books. It doesn’t even really memorize anything. It’s literally just an array of arrays of floating point values that predict tokens.

    It’s also wickedly complicated and seems like magic. If you don’t understand how it works it’s easy to fall into the “it’s plagiarism!” beleif. It’s not. If you believe that, you have been fooled! You’re believing that it’s actually intelligent in some way and not just a statistical representation of human output.

    There’s all kinds of things bad about commercial LLMs but “memorization” isn’t one of them. That’s an illusion.



  • Within the browser, it’ll work to “protect” your traffic (including DNS) from prying eyes locally. As in, someone on the same network as you or your ISP or whatever networks your traffic passes through to its destination.

    Instead, it sends it all to Microsoft Central Data Collection™! By passing all your traffic through Microsoft’s central servers, you can rest easy, knowing precisely who is inspecting everything you do (including the US government and the other countries in the Five Eyes network).

    Let’s be honest: It’s yet another unfair transfer of power from local criminals to international ones, increasing the wealth of billionaire pedophiles. Give the locals a chance to rise up, would ya?