• 0 Posts
  • 294 Comments
Joined 5 months ago
cake
Cake day: October 9th, 2025

help-circle






  • I’ll just get this done real quick. Wait, it didn’t save my login? That’s okay, I’ll just reset my password. Invalid email, what? Great, now the site is down. Okay, I’ll just call them. The girl from Ipanema goes walking. for 20 minutes. Call back during business hours? I guess I’ll check online for their business hours. Wait, we’re IN their business hours. Okay, how far is their closest brick and mortar location? Oh, the local branch was shut down, and the closest in-person representative of this company or government entity I have to deal with to survive is a 3 hour drive away now? Wait, the site’s back up. Maybe I used my old school email - is this account that old? Maybe I can use a phone number to reset. There we go! Great! And… where’s the form? They completely redesigned the site, I can’t find anything. Okay, here it is, in a place that makes no sense and… this form can no longer be accepted digitally due to new legislature. I have to mail it in? I guess I might have printer ink and envelopes and stamps. Systematically degraded USPS loses letter in transit.













  • To some extent, Anthropic recognizes that an LLM is always role playing.

    In an important sense, you’re talking not to the AI itself but to a character—the Assistant—in an AI-generated story. -The persona selection model

    Which makes giving an Opus 3 character a blog 2 days later as a “retirement” gig seem contradictory. They usually frame these sorts of contradictions as, “well, we don’t really know, so we’re trying to cover our bases.” The Opus 4.6 system card skirts the same lines. In the welfare section, they essentially just start off by interviewing a character. But then in 7.5, they go on to actually examine what’s going on during text generation.

    We found several sparse autoencoder features suggestive of internal representations of emotion active on cases of answer thrashing and other instances of apparent distress during reasoning.

    And then there’s their introspection research.

    We investigate whether large language models are aware of their own internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills. -Signs of introspection in large language models

    So there’s this distinction between the state of the model itself, and the state of the text it generates. The latter represents a role the LLM is playing, and the former we’ve only really scratched the surface of understanding. The kinda open question is to what extent it’s like something to be an LLM. It’s very unlikely that it’s like something to be one of the roles it’s playing, at least, no more than a character in a dream has interiority. The blog is marketing, but I hope they keep doing the other research too. People outside the company don’t have the kind of access necessary to do some of this research, so we’re having to take their word for it.