AI Agnosticism: Part 2

Cambridge Philosopher Tom McClelland responds

May 31, 2026

Article voiceover

0:00

-16:02

This past Sunday I published the second piece on Logos Analog, a response to a paper by a Cambridge philosopher named Tom McClelland. Once it was written, I thought it only fair to email him the link, with no real expectation of a reply, just to do my part in letting him know a critique of his work was out there. Within a day he had written back, not with a polite acknowledgment but with a full point-by-point response, eight specific quotations from my essay each followed by his pushback.

I want to say up front what kind of exchange this is, because the shape of it matters. He could have ignored the piece, or sent the polite-but-empty reply that public-facing writers usually send when a stranger they don’t know writes to them with a critique. He didn’t. He took the essay seriously enough to answer it line by line, and he did so without condescension and without flinching, closing with the line

TM: half of the objections are to things I don’t quite say and the other are to things I think I can back up.

That is the right posture for an exchange, conceding what’s fairly pushed and holding what isn’t, which is the posture I was trying to write into in the original piece and which I want to meet him in here.

So this is not a takedown but a continuation of a real engagement, made possible by his generous decision to respond. He has caught me in two places where my original wording was looser than the argument it was trying to support, and I’ll concede both of those clearly before I get to the rest, then show where the substance of the original argument survives the concessions and gets sharper in the process. The exchange also surfaced something I had treated as too obvious to spell out the first time, which turns out to be the most important observation in the entire essay, and which I owe to the back-and-forth itself rather than to anything I would have written on my own.

One quick note before I begin. Tom gave me permission to quote his email directly, and where his words appear in what follows they are his, marked with his initials. I am glad to have his actual words to greatly lessen the chance I might misrepresent him.

I am Head of Development for a US-based EdTech company, building with AI and other technologies, a 25-year veteran developer whose philosophical stance is informed by watching the frontier conversation from inside the industry. Tom is a lecturer in the Department of History and Philosophy of Science at the University of Cambridge, a Director of Studies at King’s College, Cambridge, and an Associate Fellow of the Leverhulme Centre for the Future of Intelligence. Two fields that used to talk past each other are now colliding around the same set of questions, and the collision is what this publication is for. Tom is reading the AI industry from philosophy of mind. I am reading philosophy of mind from inside the AI industry. The disagreements that follow are partly about substance and partly about what each of us can see from where we stand.

My original piece said the framework “fits exactly the shape the industry needs.” Tom pushed back.

TM: There’s nothing that serves industry here. It’s meant to be pushing forward research on AI welfare, which is generally considered inconvenient for industry.

He’s right that the wording imputed motive, and I’m walking it back. I can’t see anyone’s ledger and I’m not in a position to claim what serves whom personally. But the structural observation doesn’t need motive to do its work.

There are three verdicts on offer. Yes, conscious. No, not conscious. Agnostic. On their own, the three options don’t favor industry one way or another, and agnosticism in particular is essentially unactionable. What Tom’s framework adds, and this is genuinely a clever piece of philosophical work, is a bridge: a welfare apparatus that designs toward valence and sentience even while the consciousness verdict stays open. Suddenly agnosticism becomes a working position rather than a dead end.

That bridge also happens to be the most useful thing the industry could be handed. A confident yes would be moral catastrophe at scale. A confident no would deflate the aura around frontier labs. Agnosticism alone gives them nothing to point at. But agnosticism with welfare lets a company say, in effect, “Cambridge philosophers tell us the question is genuinely open, which should license much more building than we’re doing, but we’re choosing to develop with sentience in mind regardless.” That is excellent corporate social responsibility positioning, and the welfare apparatus is what makes it possible.

Tom himself, while I was drafting this response, went on the Mind Chat podcast with Philip Goff and Keith Frankish, which came up in my YouTube feed shortly after he sent the email. Asked why the question matters now:

TM: Anthropic has a guy called Kyle Fish, who they hired to look into the possibility of AI consciousness and to worry about AI welfare. He thinks there’s about a 20% chance that AI is conscious.

Twenty percent is the sweet spot in numerical form. High enough that the question is taken seriously. Low enough that no moral-action trigger fires. That’s not motive. That’s structure.

The other line Tom caught was at the end of the piece.

You cannot read the color of a light you cannot confirm is on.

He answered with a counterexample.

TM: Sure you can! I’ve changed the bulb in my garage to a red light. Someone says ‘is there green light being cast in your garage’. I can reply ‘I’m not there so I don’t know if the light’s on, but if it is on it’ll be red not green.’

He’s right that the line as written doesn’t survive the counter. The analogy was loose, the wording was sloppy, and the red-bulb case shows why. I concede the line.

The argument underneath the line, though, is the one I should have written, and the bulb itself is what shows it.

Tom’s bulb works because he installed it. He has independent prior knowledge of what kind of fixture is in the garage, separate from whether it’s currently on, and that prior knowledge is what lets him make the conditional verdict on color. The analogy maps onto the consciousness case only if he has equivalent prior knowledge of what a conscious system would be like, its texture, what would count as good or bad for it, whether it has anything analogous to valence at all. By his own framework, he doesn’t. He told us in the paper that consciousness science has never actually explained consciousness, and that the missing deep explanation is the entire ground of his agnosticism.

So the picture isn’t a man in his garage with a known red bulb. The picture is a man outside a building, looking through a window at light coming from inside. He doesn’t know what’s producing the light. It could be a bulb, a fire, a piece of hot metal, something else entirely. He has no access to the mechanism. He doesn’t know what causes the light to change properties when it does. The most he can do is observe that it changes, and he’s still trying to work out what those changes correlate with, and that work hasn’t reached any reliable conclusions yet.

On what grounds, then, does he claim to know what color the light will be under specific conditions?

This is the same move he made with the travelers in my original piece, in a different domain. He walked some distance up the road and called the spot neutral ground. He put himself inside a garage he didn’t tell us he could enter, with a fixture he didn’t install, of a type he hasn’t defined, in conditions he hasn’t characterized. Both times, he advances past what his own framework grants him and stands on the position as if it were the starting point. Same wall blocks the existence question and the texture question. The wall doesn’t move when you change the surface.

In my original piece I argued that the markers correlated with consciousness in human subjects, when found in AI systems, are evidence not of emergent consciousness but of deliberate imitation. Tom pushed back.

TM: Any appeal to whether there’s a genuine subject behind the report risks being circular. I think this is just a simple case that the evidence here, like most evidence, is defeasible. We know enough about why AI generates the reports that it generates not to take them at face value. That doesn’t mean starting with the assumption that it’s not a subject. It just means that there’s a ‘gaming problem’ in play that has to be factored in. Incidentally, the long list of outward signs of consciousness I include as a diagram doesn’t have much to do with verbal reports of consciousness. There are all sorts of other signs involved and most of them aren’t as vulnerable to this gaming problem (in other words, the LLMs haven’t been designed specifically to give the relevant output).

The whole pushback rides on the parenthetical at the end. If the markers in his diagram weren’t designed-for, the gaming problem stays local to verbal reports and the rest of the evidence survives. If they were designed-for, the gaming problem isn’t a local complication. It’s the architecture.

Tom’s diagram*1 pulls from the major consciousness theories: recurrent processing2, global workspace theory3, higher-order theories4, attention schema theory5, predictive processing6, and agency and embodiment7. From where I sit, as someone watching these systems get engineered and waiting on each next innovation so I can build with it, every one of these has been an explicit design target or an acknowledged inspiration for frontier AI architectures, traceable through the published research record of the people who built them.

That isn’t a hostile framing. It is the field’s working knowledge of what it has been doing. The vocabulary of consciousness science is in the field because the design strategy was borrowed from consciousness science, openly and continuously, from foundational papers through current frontier-lab interpretability research. Anyone tracking the engineering literature, asked plainly, would tell you the same thing. The architectures were modeled on what we know about cognition and consciousness, because that was the goal.

So the parenthetical at the end of Tom’s pushback doesn’t survive contact with the publication record. The gaming problem isn’t local to verbal reports. It is global to the marker list, because the marker list was the design specification. Defease the verbal reports and you don’t recover a residue of un-gamed evidence. You empty the evidence pile altogether.

That doesn’t make Tom wrong about defeasibility as a general epistemic principle. It makes the empirical content of his pushback weaker than he treated it as. He assumed the markers weren’t designed-for. Inside the field that built them, they were.

When Tom went on the Mind Chat podcast, he made the simulation framing himself, in a different domain.

TM: I think of this as a bit like digestion on this story. So digestion is a biological process, and there’s actually computer models of digestion where you put inputs in and kind of predict what’ll happen in the digestive system and so on. If we had a really good model of digestion, that could be incredibly informative and impressive, but it wouldn’t actually be digesting anything, right? Nothing would get digested. It would be nothing more than a simulation of digestion. So on that kind of biological view, even this really detailed silicon emulation of the neural correlates of consciousness wouldn’t itself be conscious.

He grants the simulation reading on the biological view. He holds it open as one of two readings the evidence permits. From inside the field, the choice between them isn’t evidentially neutral. It is between one reading consistent with what the system is documented to be, and a second reading the documentation contradicts. The architecture is imitation. It was built to be. That is the default reading until shown otherwise.

There’s one more thing I want to say, and it’s the observation the whole exchange has been circling. It came into focus through the back-and-forth, and I had treated it as too obvious to spell out the first time.

The standard agnostic posture on AI consciousness treats the question as one we can principally suspend judgment on. The framework is honest about the limits of what we know, withholds verdicts the evidence can’t support, and proceeds carefully on a question that matters. That is the posture Tom’s paper presents, and it is the posture I want to take seriously.

But agnosticism isn’t free. It only does epistemic work when applied to claims that have earned the right to be taken seriously. Bertrand Russell made the point with a thought experiment: imagine a porcelain teapot orbiting the sun between Earth and Mars, too small for any telescope to detect. Nobody could disprove the teapot’s existence. But it would be absurd to be agnostic about it. The proper response is dismissal, not suspended judgment, because the claim has no warrant in the first place. Russell’s point was that agnosticism applies to questions that have earned the right to be questions. Without that warrant requirement, the same posture would license being agnostic about the teapot between Earth and Mars, Henderson’s Flying Spaghetti Monster, or any unfalsifiable claim anyone happens to make. Which makes agnosticism a tool that does no actual epistemic work.

So the question to ask about AI consciousness is not whether the framework is internally consistent. Tom’s framework is internally consistent. The question is whether the underlying claim has earned the warrant the framework presupposes.

Stack the meta-level uncertainties. We don’t have a deep explanation of consciousness in the only case where we know it exists. Tom grants this; the missing deep explanation is the ground of his agnosticism. We don’t know whether the underlying mechanism of consciousness is functional in nature, or whether something beyond function is required, the substrate question that has divided philosophy of mind for decades. We don’t know whether, if the mechanism is functional, the function is substrate-independent, the multiple-realizability question that requires its own defense. We don’t know whether current silicon architectures, even granting substrate independence in principle, are computationally adequate to host the relevant functions.

Four open questions. None of them settled. Each of them required to be plausible enough to keep the AI consciousness question open. The standard agnostic posture quietly settles at least three of them to license the agnosticism on the fourth. That’s not agnosticism. That’s a position with most of its work hidden under the floor.

A real agnosticism applied honestly all the way down would not produce a working position on AI consciousness. It would produce something closer to silence, or near-silence, on the AI consciousness question, while the prior questions get the philosophical and empirical attention they actually need. What we have instead is a framework that treats one open question with the language of humility while standing on three settled answers that it doesn’t name as settled. That’s the structural fact about why the agnosticism feels off. It isn’t withholding judgment. It’s making three confident judgments and using the language of humility to cover for them.

The reason this matters beyond the local dispute with Tom is that the same move is happening across the AI discourse, constantly, at every level. Confident verdicts on questions a framework should hold open. Quiet importation of contested commitments. Arguments composed of moves that, examined individually, none of the participants would defend. The shape Tom’s paper takes carefully and in good faith is the shape the louder, less careful arguments take everywhere. The careful version is worth reading partly because it makes the structure visible. Once you see the move in Tom’s paper, you start seeing it in the keynote speeches, the company communications, the policy briefings. The framework that licenses an industry’s working assumptions is being constructed by quiet meta-level settlements that the surface humility conceals.

I owe the clarity of this observation to the exchange itself. Tom answered my piece carefully enough to make the structure show. The structure is what the publication is for.

I don’t know if Tom and I will go another round. Maybe yes, maybe not. The exchange has been worth what it has produced even if it ends here, which is two things I want to name before I close. The first is that an academic philosopher took an essay from a working developer seriously enough to answer it line by line, and that fact is rarer than it should be. The second is that the substantive disagreements between us turned out to sit in places neither of us had named clearly in our first attempts, and the back-and-forth is what made them visible.

This publication exists for the work of reading carefully in a moment when careful reading has become harder, and rarer, than the questions warrant. Most of the arguments shaping how we think about AI right now are not being made by people writing in good faith and answering each other line by line. They are being made in keynote slides, company communications, and policy briefings, where the shape of the argument is exactly the shape that licenses the conclusion the speaker needs. I am part of that industry. The reason we keep producing arguments shaped like the conclusions we need is that we are always looking for the next killer app, the next moat, the next thing to put on a marketing slide, and we have stumbled into a set of questions that are not the kind of questions our marketing departments are equipped for. Questions about what we are. What it means to be a person. Whether a thing we built is one. The industry I work in is producing these questions faster than anyone, ourselves included, can responsibly handle them. The discipline of reading philosophy as if its sentences are meant to track reality is one we used to practice without thinking. We don’t anymore. We get back to it by doing it, here and elsewhere.

Strictly, the table reproduced in Tom's paper is from Butlin, Long, et al., "Consciousness in Artificial Intelligence: Insights from the Science of Consciousness" (2023), a multi-authored report led by Patrick Butlin and Robert Long. I refer to it as Tom's diagram throughout for readability, but credit belongs to Butlin, Long, and their coauthors. Tom flagged this in a reply to the draft and the correction is his.

For recurrent processing, the lineage runs from Hochreiter & Schmidhuber’s LSTM through to modern recurrent variants, with biological feedback loops as the acknowledged inspiration. See LeCun, Bengio & Hinton, “Deep Learning,” Nature 521 (2015).

Goyal et al., “Coordination Among Neural Modules Through a Shared Global Workspace,” ICLR 2022 (Yoshua Bengio coauthor) is the explicit GWT-into-deep-learning paper. Bengio’s “Consciousness Prior” framework (2017) makes the link directly. https://arxiv.org/abs/2103.01197

Anthropic’s interpretability team has published on metacognitive monitoring and introspection in Claude models, including the October 2025 introspection paper. See https://www.anthropic.com/research/introspection.

The attention mechanism in transformers was inspired by the cognitive theory of attention; see Lindsay, “Attention in Psychology, Neuroscience, and Machine Learning,” Frontiers in Computational Neuroscience (2020).

The training objective of every large language model is next-token prediction, which is predictive coding’s computational core. See Huang et al., “Meta predictive learning model of languages in neural circuits” (2023).

RLHF is the explicit mechanism that produces goal-directed agency in current frontier models. See Christiano et al., “Deep Reinforcement Learning from Human Preferences” (2017) and Bai et al., “Training a Helpful and Harmless Assistant with RLHF” (Anthropic, 2022).

Discussion about this post

Ready for more?