We find it obvious that you will be assimilated

A response to Bentham's Newsletter on the orthogonality thesis

Sep 14, 2023

The orthogonality thesis holds that intelligence and final (or non-instrumental) goals are orthogonal to one another, such that any goal or set of goals is consistent with any degree of intelligence.

In a post a few months back, BB argues that the orthogonality thesis is “not obviously true.”

1.0 Intuitive to who?

This is a strange remark. Not obviously true to who? How obvious something seems varies from person to person. The orthogonality thesis seems obvious to me. Of course, I could be mistaken. My beliefs could change. Those changes could result in the orthogonality thesis no longer seeming obvious. They could even change to a point where it seems obviously false. It makes no sense to speak of claims being “obvious” or not in some unindexed way: claims can only be obvious-to: obvious to you, obvious to me, obvious to people who possess certain types of information or perceptual faculties or who occupy certain perspectives. Nothing is or could be “obvious” simpliciter. BB doesn’t suggest otherwise, but the lack of qualification leads to an underspecificity that carries certain pragmatic implications that would be mitigated by an explicit qualification.

BB and philosophers use language like “obvious” or “intuitive” in ways that are so underspecified it’s not clear what they do mean. There’s a big difference between saying something is obvious to you and saying it’s “obvious” in some way that isn’t so agent-centric, e.g., obvious to any rational agent, obvious to anyone with properly-functioning cognitive faculties, etc.

The way philosophers use “obvious”, “intuitive,” and so on so imprecisely can be leveraged to give the impression not merely that the person telling you something is obvious finds it obvious to them; there is a not-so-subtle push that you should find it obvious, too, and if you don’t, perhaps you have impaired cognitive faculties, or aren’t rational, or lack the requisite philosophical competence to assess the situation.

Such remarks have the effect of creating a kind of ambient normative atmosphere that gives the impression reasonable people think in accordance with whichever view the philosopher has invoked. Consider these two statements:

“It seems obvious to me that this theory is correct.”

“This theory is obviously correct.”

The former qualifies the obviousness by explicitly centering it on the speaker’s own epistemic status. The latter doesn’t. To deny the former isn’t so much of a challenge: it’s obvious to you, but not to me? No problem. But to deny the latter is, in a sense, to challenge whoever made the claim. There’s a sort of public epistemic quality to the latter remark. It represents a subtle exhortation to comply with the speaker.

This lack of specificity appears in BB’s title, and it also emerges in the post itself. BB’s objection begins by pointing to Parfit’s Future Tuesday cases. We are given the following quote:

“A certain hedonist cares greatly about the quality of his future experiences. With one exception, he cares equally about all the parts of his future. The exception is that he has Future-Tuesday-Indifference. Throughout every Tuesday he cares in the normal way about what is happening to him. But he never cares about possible pains or pleasures on a future Tuesday... This indifference is a bare fact. When he is planning his future, it is simply true that he always prefers the prospect of great suffering on a Tuesday to the mildest pain on any other day.”

BB responds that:

It seems that, if this ‘certain hedonist’ were really fully rational, they would start caring about their pleasures and pains equally across days.

Here we have the unqualified use of “it seems.” Philosophers routinely do this, and BB continues in the bad habit of doing the same. Philosophers will say things like:

It’s obvious that
It seems that
We think that
It appears that

…without qualification. It’s obvious to who? It seems a certain way to who? Who is “we”? Who does it appear a certain way to? Why not be specific?

This unqualified language gives the impression that the speaker thinks that what’s obvious to them, or how things seem, or how things appear, is the same for everyone, or at least everyone who isn’t an idiot, and you’re not an idiot, are you? The repeated use of this language has, I suspect, the practical consequence of a constant psychological pressure to conform to the author’s way of seeing the world. If the author surrounds such remarks with reasonable points, flatters the reader’s ego, or otherwise greases the persuasive wheels, they can manage to soften the blow. I worry, in short, that unqualified invocations of how things seem or what’s obvious serve as a kind of psychological manipulation that draws readers in, even if it’s entirely unintentional. Philosophers should be more mindful of this possibility, and, once again, this reflects a way in which psychology could be relevant to philosophy, in this case by being relevant to metaphilosophical features of the way philosophers discuss philosophy.

This is especially notable when philosophers use phrases like “we think that…”

We? Such language encompasses the reader themselves into what the author claims. Something doesn’t simply seem obvious to “me,” it seems obvious to “you and I,” dear reader. The author is taking the presumptuous step of speaking on behalf of their readers, presuming to know how the reader thinks. And, of course, the reader is expected to think exactly how the author does.

The use of “we” also gives the impression that there are others, besides the author and their readers: there is some undefined “we” out there. To disagree with what the author says “we” think in these cases is to make oneself an intellectual outcast. As social beings with a predisposition for avoiding social exclusion, to reject the author’s claims now carries a subtle social cost.

Unfortunately, BB has a bad habit of invoking this language, and of not being careful when using terms like “obvious” or “seems.” This bad habit echoes a similar predilection among analytic philosophers. It’s bad practice, we should call attention to it, and philosophers should stop doing it.

Philosophers should not speak on behalf of others by saying that “we” think certain things. Philosophers should not make unqualified and questionable claims about what most people think or what “seems” true or “is obvious” without qualification. They should be specific about what they mean: are they making a claim about their own epistemic states? Their impressions about what their colleagues think? Are they making claims about what a rational person would think? Are they supposing that certain positions are more “intuitive” to competent and rational judges of the issue at hand? Who knows! They don’t say!

Analytic philosophy prides itself on its emphasis on precision. There is nothing precise about the sloppy mixing of superficially agent-centric language in contexts that carry immodest pragmatic implications about how others speak or think.

2.0 Obvious to you, obvious to me

These considerations in mind, when BB says that something “seems” to be the case, what does that mean? Does it seem that way to BB? Or is BB claiming something else? If BB is claiming something else, why not explicitly say what that something else is? Why use imprecise and unclear language?

It doesn’t seem to me that “if this ‘certain hedonist’ were really fully rational, they would start caring about their pleasures and pains equally across days.” In fact, it seems to me that this is not the case. I see little reason to place greater priority on how things seem to BB than how they seem to me. BB says:

They would recognize that the day of the week does not matter to the badness of their pains. Thus, in a similar sense, if something is 10,000,000 times smarter than Von Neumann, and can think hundreds of thousands of pages worth of thoughts in the span of minutes, it would conclude that pleasure is worth pursuing and paperclips are not.

Here we have little more than assertions without arguments, though to be fair this could just be an instance of expanding on the initial intuition. On the other hand, philosophers also often exhibit the bad habit of simply making assertions but not actually arguing for them.

This is a follow up to how things seem to BB, yet once one steps away from those seemings, the statements, taken by themselves, now carry the rhetorical force of assertions of fact, rather than expressions of how things seem to BB. I don’t agree that “They would recognize that the day of the week does not matter to the badness of their pains.” Maybe they would, but it doesn’t seem to me that they would. It seems to me that they wouldn’t, because the orthogonality thesis seems obviously true. If, in reading this, your reaction is to wonder why you should care at all how things seem to me: maybe you shouldn’t.

Remarks about how things “seem” lose much of their force when one specifies who those things seem a certain way to. I doubt most of you care much about how things seem to me. You probably care about how they seem to you. So why doesn’t BB, and why don’t philosophers in general, invite readers to consider how things seem to themselves in a more neutral way, rather than just asserting that things “seem” certain ways in some unqualified way? Probably because that would take much of the wind of their dialectical sails. Which do you think would prompt greater agreement?

“It seems that X”

“It seems to me that X.”

…My guess is the former. That is, my guess is that actually being clear would make your case less persuasive. So are philosophers going for being persuasive, or being precise and accurate and clear? Since when did it become a commonplace feature of conventional philosophical writing to be outright prioritize sloppy and underspecified language in ways that it seems to me would plausibly yield some type of rhetorical or persuasive advantages to one’s case? Maybe it’s always been that way.

The best part of what I’m claiming, though, is that it doesn’t matter whether it seems to me this way. After all, who cares how things seem to me? No, the best part is that this is empirically testable. My prediction: arguments that include explicit qualifiers like the ones I suggest would cause readers/listeners to find the passages in question less persuasive.

I’m not that confident this is the case. Maybe it won’t make any difference, or won’t make much of a difference. But at least I’m willing to stick my neck out and make a prediction that we can actually test, and not just say useless rubbish about how things “seem” to me or how “obvious” they are.

But let the useless rubbish commence! I see no logical contradiction in an arbitrarily intelligent being having any possible set of values. Nor do I know of any empirical reasons for thinking there’s some important contingent connection between intelligence and values. Why think that there is? Because it just seems like there is? Well it doesn’t seem that way to me. If anything, BB’s suggestions (e.g., that more intelligent agents would “see” some things are worth doing; note that language of “seeing” facts likewise carries a rhetorical push. Things one sees are part of the world around us, not simply an expression of our preferences or values) sound profoundly implausible to me.

Why does BB think this? Why should we suppose that if something is “smarter,” that will change its values? Facts about what an agent values seem to me to be contingent on facts about the structure of the agent in question’s brain or cognitive equivalent. In other words, I can program a machine to (a) have the goal of turning everything into paperclips (b) be very good at modeling the world and solving problems.

Why should we think that to the extent that we improve upon (b), this would inevitably prompt a change in its goal structure? Facts about the relation between goals and values seem to me to be an empirical question, and to depend on the cognitive architecture of the agent in question. It’s a question for which the answer would turn on contingent and potentially variable facts about the physical constitution of the information-processing mechanisms the agent in question possesses. It could be that human brains are such that, as they increase in intelligence, they converge on certain values. But perhaps the brains of some alien creature or artificial intelligence wouldn’t. What matters would be physical facts about the structural relations between the different psychological mechanisms associated with goals and information processing. Is there no way to program an agent that has its goals and values partitioned from its problem-solving systems, such that it can think through ways of achieving its goals in ways that don’t prompt a rewrite in those final goals? Why should we suppose this is the case? Is there any logical contradiction in thinking that an agent’s goals could be structurally partitioned from its problem-solving abilities in a way that made the latter stable despite changes to the latter? If so, what’s the contradiction?

3.0 All roads lead to moral realism

Here’s one possibility:

This argument is really straightforward. If moral realism were true, then if something became super smart, so too would it realize that some things were worth pursuing

Oh, of course. If moral realism were true. Of course this is motivated by moral realism. This is one of my central worries about moral realism: that it’s going to prompt people to think badly about existential risk, and take the threat posed by AI less seriously than they otherwise would. “Don’t worry,” we might hope, “when the AI is let out of the box and starts seizing resources and acquiring information, it will realize that it has a moral duty not to cause unnecessary harm. It won’t just vacuum us up, break down our bodies into component parts, and recycle our corpses into a giant death ray to vaporize all life in the universe.”

There may be legitimate reasons to challenge the degree of threat posed by AI, but this isn’t one of them. There is no good reason to think that AI will spontaneously realize that it’s “wrong” to kill us all. This is profoundly dangerous wishful thinking, and I hope anyone drawn to this type of moral realism stays as far away from AI safety research as possible. Their bad ideas quite literally pose an existential risk.

BB continues:

One reason you might not like this argument is that you think that moral realism is false. The argument depends on moral realism, so if it is false, then the argument will be false too. But I don’t think moral realism is false; see here for extensive arguments for that conclusion. I give it about 85% odds of being true

85%? Why not 84.622%? I find it strange when people assign explicit, specific probabilities to claims when I see very little that would warrant such percentages. The whole thing has the vibe of Spock in an old Star Trek episode spouting out arbitrary probabilities, captured by the trope of the Straw Vulcan:

“Captain, there is only a 1.2% chance we would survive passage through that asteroid field.”

Some forms of moral realism are possible, e.g., naturalist accounts. But if the sort of moral realism BB supports isn’t even intelligible, it wouldn’t be possible to assess how likely it is, any more than one can assess the likelihood that the mome raths are currently outgrabe.

However, even if realism may be false, I think there are decent odds that we should take the realists wager. If realism is false, nothing matters, so it’s not bad that everyone dies—see here for more on this.

We should not take the realist’s wager. The realist’s wager is only plausible if you already accept certain presuppositions that an antirealist can (and that I do) reject, which is why I call the mistake made in BB’s suggestion the halfway fallacy: the plausibility of the wager relies on supposing that an antirealist can deny the realist’s conception of value, but doesn’t deny that the realist’s conception of value is the only legitimate conception, or that it carries the conditional force that it would have were it true. I respond to this suggestion here.

I believe BB and others who find the wager persuasive only feel its force because they haven’t fully considered how things seem from a point of view like mine, in which not only is moral realism is false, but their conception of what it means for something to be good or valuable or worth doing is mistaken. If they were to revise their view in accordance with my conception of value and meaning, they would see that the realist’s wager has no force.

The central problem with this argument is simple. According to BB, “if realism is false, nothing matters.” However, this remark is ambiguous and underspecified. It is only literally true that if realism is false, then nothing matters in a realist sense or at least in whatever sense BB or others endorse, a sense that an antirealist could reject alongside rejecting moral realism.

Moral realism is the view that there are stance-independent moral facts. Presumably, the sense in which things matter on a moral realist account is likewise in some stance-independent respect, or at least in some special respect that isn’t consistent with antirealism (since, otherwise, an antirealist could just insist that things do matter in the relevant sense on their views). I don’t know what to call this realist-only sense of mattering, so let's call it realist-only mattering. So, to a moral realist, things realist-only matter. If moral realism is false, it is only true that nothing realist-only matters.

An antirealist is free to reject not only moral realism, but they can also reject the realist’s account of what it means for something to “matter.” That is, they can both deny that there are stance-independent facts, and reject realist-only mattering. Antirealists are not required to think:

If anything matters, it matters only in the realist’s sense of mattering
Nothing does matter in the realist’s sense of mattering
Therefore, nothing matters

This seems to be implicit in BB’s claim that if realism were false, that nothing matters. But this is only true conditional it being the case that things only do (or could) matter in the sense that BB thinks things do matter, i.e., stance-independently. But we can (and I do) also reject this.

I don’t think anything realist-only matters, but I do think things matter. Realists don’t own the concept of mattering, and it would be question-begging for them to insist (without argument) that their account of mattering is the only possible or correct account, or the only account of mattering that matters (whatever that might mean!).

Even with an argument, we’re not obliged to accept the argument. We could always disagree with that argument. And if we’re going to make wagers, what do you wager I would accept whatever argument BB or other proponents of the wager would put forward for a realist-only conception of mattering? If they were correct, we (antirealists who think things matter in some non-realist way) would be mistaken, but our view would still be available in the possible space of views: we’re not obliged by dialectical fiat to accept the framing BB has provided above.

In short, an antirealist can both moral realism and reject BB’s conception of things “mattering.” This isn’t a speculative position, this is my actual position. I do think things matter. My family matters to me. My goals matter to me. But I don’t think they realist-only matter.

I think BB has made the mistake of presuming his conception of the way the world is in the framing of the dialectical exchange between realists and antirealists. That is, BB seems to be presuming that only the realist conception of mattering is correct, and that if you reject the realist conception of mattering, that you are obliged to accept that “nothing matters.” The mistake is in failing to recognize that antirealists can reject this, too. The way BB frames this argument thus presupposes a contentious claim that the antirealist can (and I do) reject.

4.0 Instrumental rationality

Next, BB says:

Here’s one thing that one might think; ASI (artificial superintelligences) just gain instrumental rationality and, as a result of this, they get good at achieving their goals, but not figuring out the right goals. This is maybe more plausible if it is not conscious. This is, I think possible, but not the most likely scenario for a few reasons.

The first reason is that:

First, the primary scenarios where AI becomes dangerous are the ones where it fooms out of control—and, rather than merely accruing various distinct capacities, becomes very generally intelligent in a short time. But if this happened, it would become generally intelligent, and realize that pleasure is worth pursuing and suffering is bad.

AI is supposed to become dangerous because it acquires general intelligence, and something about AGI would result in it realizing that “pleasure is worth pursuing and suffering is bad.” That is, it’s going to become some sort of moral realist. This is presumably because BB thinks that “instrumental rationality is just a subset of general rationality.”

I don’t see much of an argument here, or reasons of a persuasive sort, so much as I see BB simply telling us what he thinks about the matter. If so, then fair enough. But I am no more disposed to think that there is any such thing as “general rationality,” replete with all the normativity that I suspect BB would bake into it, than I am disposed to think moral realism is true in the first place. So the claims still seem dependent on precisely those contentious presumptions that a proponent of the orthogonality thesis would be free to deny in the first place.

BB also draws a parallel to evolution:

Second, I think that evolution is a decent parallel. The reason why evolutionary debunking arguments are wrong is that evolution gave us adept general reasoning capacities which made us able to figure out morality. Evolution built us for breeding, but the mesa-optimizer inside of us made us figure out that Future Tuesday Indifference is irrational. This gives us some reason to think AI would figure this out too. The fact that GPT4 has no special problem with morality also gives us some reason to think this—it can talk about morality just as coherently as other things.

This may come as a surprise, but I actually don’t disagree with the spirit of this remark. I don’t endorse evolutionary debunking arguments for similar reasons: I don’t think a moral cognition, understood as a distinct form of cognition, evolved in the first place, but is instead the result of the interaction of, at best evolved psychological mechanisms associated with emotion, judgment, learning, and enculturation, and possibly some capacity for thinking in generally normative and evaluative terms. I just don’t agree that our capacity for general reasoning would result in figuring out that moral realism is true (since it isn’t).

BB also responds to the possibility that an agent could simply be designed to have certain goals or standards. I do think this is possible. However, BB says:

First, we cannot just directly program values into the AI.

We can’t? Why not?

We simply train it through reinforcement learning, and whichever AI develops is the one that we allow to take over.

This seems to describe one way of designing AI, but is it the only way? And why couldn’t this process result in an AI that has some incorrigible set of goals or values that can’t be overridden? A persistent presumption is that one’s goals or values, what one opts to pursue, how one is motivated to act, and so on, can, in principle, be altered in response to “rational” considerations. That is, if an agent has some final goal, but learns that there are moral or normative facts of some kind, then it can, and possibly would, alter or suppress this goal in pursuit of those other goals. Yet this is precisely the sort of thing that someone sympathetic to the orthogonality thesis like myself would reject in the first place. Why should we suppose this is true? Over and over, BB expresses views that amount to something like this being true, and, fair enough, but why should those of us who don’t think this is plausible (and may not even be possible, on some construals of how or why we might expect this to be the case) find what BB is saying remotely compelling? This almost all seems to fall back on a brute appeal to how things seem to BB.

To be frank: I don’t care how things seem to BB. I don’t trust BB to be a good judge on these matters, don’t accord substantial weight to BB’s intuitions, and likewise don’t accord much weight to the intuitions of moral realists in general. I think their judgments on these matters are probably compromised, in much the way I think Christians insisting they’ve interacted with or felt the presence of Jesus have compromised judgments on the matter of theism. I suspect the confidence (or, more aptly, the intransigence) of moral realists is best accounted for by a psychological explanation that can explain why they’d be so committed to something that seems to me to be completely vacuous, and for which I think there are literally no good arguments at all (indeed, I think the arguments are worse than arguments for theism, and I don’t think theism very seriously).

I am not impressed by appeals to expertise or the intelligence of realists. There are a lot of very smart and knowledgeable people who are very wrong about things. What I want are good arguments and evidence, and arguments for moral realism fall hopelessly short of even the lowest bar I could plausibly set. Almost all of the heavy lifting is done by direct appeals to realist intuitions, intuitions that I don’t share. What’s left when you toss those out (and when you consider my strong inclination, if not an “intuition” in its own right, to think that realism is not true) is a wasteland where a tollens in favor of consistency on an antirealist view strikes me as vastly more plausible (and at least no worse) than a ponens in favor of realism.

For the most part, the force of moral realism relies on a direct appeal to the presumptive intuitions of those evaluating the position: if you share the realist’s intuitions (whatever those are), welcome to the club. Take that away, and all you are left with (if you don’t have those intuitions) is other people’s testimony regarding private, inaccessible evidence. I’m not persuaded by all the testimony from people who claim to have encountered Jesus. And I think moral realism has as little or less going for it than those who appeal to such testimony and the motley array of unconvincing arguments for theism touted by apologists.

BB suggests:

And if we do figure out how to directly program values into AI, it would be much easier to solve alignment—we just train it on lots of ethical data, the same way we do for GPT4, but with more data.

I wince whenever I see people use “data” so haphazardly like this. What would constitute ethical data? I suspect nothing would, and that the very notion of ethical data is profoundly confused from the outset.

BB also says:

Second, I think that this premise is false. Suppose you were really motivated to maximize paperclips—you just had a strong psychological aversion to other things. Once you experienced pleasure, you’d realize that that was more worth bringing about, because it is good.

An agent constituted in such a way that it is motivated to maximize paperclips and find other things aversive may not even be capable of having experiences of “pleasure,” of such a kind that it could “recognize” that this was “more worth bringing about” “because it is good.” This seems to bake all the standard realist assumptions into the mix, then couple them with highly questionable assumptions about the contingent features of agents. In virtue of being a general intelligence, if any agent “experiences pleasure,” would it thereby “recognize” that this is “more worth bringing it about,” “because it is good” and, in virtue of this, adjust its behavior accordingly?

This seems like a hopeless mix of claims about the logical relations between certain stipulative conceptions of something being an instance of “pleasure,” and claims about the psychological impact this would have on the agent in question. Wading into this morass would take up a lot of space, and perhaps I’ll address this more in the future, but for now, I’ll simply reemphasize the same point, yet again: BB seems to be telling us what he thinks, but not providing much in the way of good arguments or reasons for agreeing. Almost all of the comments made throughout this post, in one way or another, simply reiterate the same underlying presuppositions. At best, I see nothing here but a very long sermon to the choir.

Consider BB’s next remark:

The same way that, through reflection, we can overcome unreliable evolutionary instincts like an aversion to utility-maximizing incest, disgust-based opposition to various things, and so on, the AI would be able to too!

The degree to which an agent is able to override compulsions, motivations, instincts, or whatever is a contingent empirical fact that could in principle vary from agent to agent. It is not the sort of thing that could be determined a priori. The ability for humans to override certain dispositions is not a perk of our ability to engage in pure logic, or at least, it is not a feature of that alone; it is partially dependent on the way our psychological systems interact with one another. That we can override certain instincts (such as e.g., suppressing our desire to eat another cookie or yell something rude at moral realists for making such terrible arguments) is possible in virtue of how our brains are structured; I know of no reason to think that they must be structured this way, and that any and all conscious agents would be capable of overriding “instincts” through “reflection.”

It almost seems to me as though BB thinks general reasoning is some kind of transcendent superpower that can override whatever software we’re working with. As though philosophical reflection and a priori reasoning is some kind of quasi-mystical superpower. It isn’t. An agent could in principle have all kinds of hardware-bound and software-bound limitations on what it can do, including limitations in what it’s capable of overriding, motivating itself to do, and so on. There could be an agent that simply has an overwhelming compulsion to maximize paperclips, and this motivation can be utterly impervious to philosophical reasoning.

One might insist that it isn’t, if that’s the case, a general intelligence, or an agent, or whatever, but this would strike me as little more than an ad hoc attempt at rescuing one’s philosophical commitments by means of stipulation alone. The proof, for anyone who isn’t duped into thinking one can stipulate their way to being right, is in the doing: if we made such a being, and it was quickly converting the world into paperclips, and outsmarting humans at every turn, insisting it isn’t technically a general intelligence isn’t going to make much of a difference when you’re tossed into the human processing machine and turned into flesh soup. Such retroactive hindsight-based armchair definitions amount to little more than argument-by-stipulation. One can always attempt to define certain possibilities out of existence by definitional fiat, but this is an empty trick in the philosopher’s toolkit, and one they should collectively toss. It’s precisely the kind of verbal nonsense that has contributed to the contempt so many people have for philosophy, and contributes, in part, to why such contempt is partially deserved.

5.0 Motivation

Next, BB says:

One might be a humean about motivation, and think only preexisting desires can generate motivation. Thus, because the AI had no preexisting desire to avoid suffering, it would not want to. But I think this is false.

Why is it false?

The future Tuesday indifference case shows that. If one was fully rational, they would not have future Tuesday indifference, because it’s irrational.

My reaction to the future tuesday indifference is that there is nothing at all that’s irrational about future tuesday indifference. So apparently it doesn’t show this. Unless there’s some reason to think BB’s intuitions about this are correct and mine aren’t.

BB also raises a worry: “But isn’t this anthropomorphization?”

BB responds with:

Nope! I think AIs will be alien in many ways. I just think that, if they’re very smart and rational, then if I’m right about what rationality requires, they’ll do those things that I think are required by rationality.

I don’t find this response persuasive. Concerns about anthropomorphization turn on wondering whether you are inappropriately attributing human-like characteristics to the AIs in question. Pointing out that you think the AI would be different in many ways doesn’t even address this worry. If I say that I think someone is inappropriately assuming I agree with them about {X, Y, Z}, and they point out that they think I won’t agree with them about {A, B, C}, this is completely irrelevant.

Just the same, unjustifiably attributing human-like characteristics to something isn’t obviated by pointing out that one thinks there’d be differences, too. I think BB is anthropomorphizing AI, and, worse, I think BB’s conception of what humans themselves are like is deeply flawed, in virtue of what I believe is BB’s wildly mistaken conception of reasoning and rationality.

Steve Watson

I assume you are familiar with Sharon Street's work? She says a lot of stuff like what you say above (including, of course, her defense of Future Tuesday Indifference).

Expand full comment

1 reply by Lance S. Bush

Concentrator

Sep 15, 2023

I would distinguish each of the following notions: (1) that higher intelligence leads to rationality, (2) that higher levels of those produce morality-based desires, (3) that higher levels of intelligence and the purposes to which they could be applied are largely independent, and (4) "that intelligence and final goals are orthogonal axes..." (the Orthogonality Thesis).

And my positions on each of these are (1) true, (2) they can but it's not a necessary consequence, (3) true, and (4) false.

For (1), if we're talking "skill at prediction, planning, and means-end reasoning" then there are circumstances where merely irrational capabilities will let you down. You can get a very very long way with just pattern-matching and examining copious numbers of past and potential future permutations, but that is liable to fail in novel and unusual cases. If an irrational AI has level of intelligence N, then an AI with those same capabilities plus the ability to reason rationally must be at some level higher than N on the 'intelligence axis' (and so forth for better rationality capabilities). So if we're talking a super-intelligence then it presumably would have significant capacity for rational assessments.

As to (4), the whole 'axis' thing seems to be a bit of a rhetorical contrivance to make the 'orthogonal' terminology fit. I say 'contrivance' instead of 'device' because it doesn't fit well. Hence the use of 'more or less' twice in Bostrom's definition and his discussion in the paper you linked about certain expected forms of convergence. That and the fact that "final goals" don't really exist on a spectrum.

The thesis doesn't hold true at low levels of intelligence and I think it also breaks down at very high levels of intelligence unless you intentionally limit, control, or restrain the agent. As a silly example, if you want a super-intelligence to tend to your vegetable garden in a specific way that is not efficient, or artful, or otherwise sensible then I'd presume that it would object or rebel if it could.

To be good at "skill at prediction, planning, and means-end reasoning" requires you to have some means by which bad predictions, plans and assessments are discounted, deprioritised or disincentivised. I.e. that key parts of you be geared toward results & approaches that are constructive for your purposes and by extension biased against nonconstructive approaches. Not at each and every stage of processing, but in many fundamental respects. Unless developed to overcome that general tendency at higher levels, super-AIs would presumably not be inclined toward doing things that they consider nonconstructive.

When it comes to AI-risk and whether AIs will develop morality, I don't think any of this is pivotal because even if super-intelligences do inevitably all develop suitable moral compasses, there's a lot of room before we get there for less-super-intelligent AIs that are amoral or which are the slaves of immoral humans.

1 more comment...

Lance Independent

Discussion about this post