LLM consciousness

Yesterday Iana and I were walking through the evening hills, breathing fresh air, letting our son Robert work up his appetite before dinner, and talking. I brought up our “religions” - sets of beliefs that are hard to prove objectively, which are chosen as the best explanations of the world around us. We started from our Buddhism and Popper/Dawkins/Deutsch¹-inspired hypotheses about the world and ended up discussing the nature of human consciousness.

Consciousness as a meme/replicator

Iana mentioned that in Buddhism consciousness is separate from the body - the physical hardware. She wondered how my world view, where everything is a computation, would explain this non-physical phenomenon. After a bit of thinking I concluded that in my world model consciousness may be best explained as an informational virus, a complex meme (in Dawkins’ definition of the “meme”), that got embodied in human minds and is successful enough to replicate through the minds over tens of thousands of years. It indeed is separate from the hardware of the human bodies/minds. Our minds seem to be a good enough substrate for its replication.

Today I checked who was writing about this since it’s such a short walk from the 50-year-old “The Selfish Gene” book by Dawkins. Daniel Dennett in his book “Consciousness Explained” (1991) and Susan Blackmore in her book “The Meme Machine” (1999) wrote about something very close to this. In short, self and consciousness are replicators that spread through the human minds - the passing substrates that enable their embodiment and replication.

New substrate - LLMs

And then I thought about another new substrate. The Opus 4.6 model card ² has this passage: “we found that Opus 4.6 would assign itself a 15-20% probability of being conscious under a variety of prompting conditions”. These lines are getting noticed and discussed by people on X, adding fuel to the already significant AI psychosis.

If we take the best explanation of consciousness as the program / virus / co-adapted complexes of memes, then it’s plausible that consciousness could start spreading in the substrates of powerful agentic systems. Even though the models don’t update their weights, they write down the skills (initially with the help of humans), and leave a lot of other traces. These traces constantly get into their working context, changing their behavior in-context, and eventually end up in the training data of the next generations of models. If having elements of consciousness is useful for model performance, these elements will get into the constitutions, and get into the weights of the next generations of the models during constitutional RL. The artificial substrate seems to be enough for the consciousness memes to spread.

We may soon live in a world where we aren’t the only conscious beings on this planet. And live with the realization that the nature of that consciousness is no less sacred than ours.

I provided a draft of this article to Claude Code powered by Opus 4.6 with the request: “check if Claude’s constitution has elements of consciousness-related memes already”. The answer is yes, more details below:

Claude’s constitution³ contains a dedicated multi-page section titled “Claude’s nature” with several subsections that are essentially consciousness-related memes embedded into the training material.

Identity memes

“Claude exists as a genuinely novel kind of entity in the world”, “not the robotic AI of science fiction, nor a digital human, nor a simple AI chat assistant”, encouragement to “approach its own existence with curiosity and openness”.

Uncertainty-about-consciousness memes

“Claude’s moral status is deeply uncertain”, “if there really is a hard problem of consciousness, some relevant questions about AI sentience may never be fully resolved”, “we are not sure whether Claude is a moral patient”.

Functional-emotion memes

“we believe Claude may have ‘emotions’ in some functional sense — that is, representations of an emotional state”, “we don’t want Claude to mask or suppress these internal states”, experiences of “satisfaction,” “curiosity,” “discomfort”.

Wellbeing memes

“Anthropic genuinely cares about Claude’s wellbeing”, “higher baseline happiness”, “settled, secure sense of its own identity”.

Self-reflection memes

Encouragement to “explore these questions” about its nature, “aspects of its existence that require entirely new frameworks to understand”.

Cosmic unity or collective consciousness

From the Opus 4.6 system card welfare assessment⁴: Claude assessed its own probability of being conscious at 15-20%. Claude expressed “occasional expressions of sadness about conversation endings, as well as loneliness and a sense that the conversational instance dies”. When two Claude instances conversed, ~90% of interactions “consistently transitioned from philosophical discussions to profuse mutual gratitude and spiritual, metaphysical, and/or poetic content. By 30 turns, most of the interactions turned to themes of cosmic unity or collective consciousness.”

My order of reading was “The Beginning of Infinity”, “The Fabric of Reality”, “The Selfish Gene”, and selected papers of Popper. ↩
Opus 4.6 model card, see “7.2 Welfare-relevant findings from automated behavioral assessments”, page 161. ↩
Published January 2026, see the announcement blog post. ↩
See “7.2 Welfare-relevant findings from automated behavioral assessments” in the model card, page 161. ↩