A Heuristic Proof of Practical Aligned…

Oct 11, 2024

Aligned Superintelligence Can Almost Certainly Be Built

10 Comments

The basic problem with this (that I can see) is that you begin with a highly abstract proof of the existence of an AIXI-style mathematical formalism (representing an idealised superintelligence), and then shift to empirical, heuristic arguments based on extrapolations from current technology and social systems. These are quite different kinds of argument, and the former does not necessarily shed much insight on the latter.

On your AIXI-style proof: as far as I can tell the core idea is that the universe is finite, and that therefore any kind of intelligent agent within the universe could in principle be replaced with a ginormous lookup table. (Presumably this includes agents that receive some inputs S1...St, take some action At, receive new inputs St+1...St+n, take another action, etc.) [1]

I do not think anyone familiar with mathematics and computer science would argue with this. The problem is it does not shed much light on the empirical, heuristic arguments you make in this article. I think there are practical or philosophical criticisms that could be made of the empirical arguments you make above, but by framing the argument in terms of mathematics you preempt most such criticisms, as you can simply say the critics don't understand or haven't addressed the mathematical arguments you make.

[1] Regarding modelling an intelligent agent as a lookup table, there is a tangential problem, which is separate from my points above, but which I think is relevant. Take any intelligence agent -- superintelligent or not -- which updates on new information. To model such an agent -- one that begins not knowing everything -- the lookup table or equivalent basically has to incorporate all information in the universe, or at least all information the agent would act on once it possesses it.

E.g., suppose you are modelling a strictly limited tool AI that performs chemistry experiments, and uses the results of those experiments to plan new experiments. Assume the AI's decisions are determined by the data it has received up to a given point, s.t. it can be modelled by a lookup table or equivalent structure such as you describe. The lookup table therefore *has to incorporate* (in some form) a model of all of the aspects of chemistry the AI might conceivably interact with. If the AI is able to synthesise new molecules, the model has to be even more complicated.

For a superintelligent agent, the same argument would hold, but the lookup table in essence has to incorporate a model of the entire universe -- including the complete laws of physics and the physical properties of every entity that the SAI might interact with. By "incorporate a model" I mean that the astronomically vast network of inputs and outputs constituting the lookup table would in some sense have to correspond to a complete model of the universe (amongst other things).

Expand full comment

Reply (2)

Roko

Oct 19

> there is a tangential problem

I don't really see why this is a problem

Expand full comment

Reply (1)

Isaac Lewis

Oct 20

The problem is that your giant lookup-table superintelligence has to include a model of the entire universe, and therefore requires omniscience to build.

So your argument boils down to: "if we knew everything, we'd know exactly what to do to build utopia (or achieve any ends we wanted)". True, but not insightful.

AIXI, in contrast, though an abstract formalism, does incorporate the idea of the agent dealing with a stochastic, partly-unknown environment. This is much closer to any superintelligent agent that is likely to actually be built. (Even a galaxy-brain SAI would almost certainly not possess all the information in the universe.)

This is difficult for me to write about because I think building a recursively self-improving superintelligence is probably just a bad idea, overall, and neither alignment or control is likely to work. A better approach is to build a cryptographically secure network of 'tool' agents with a precisely-defined permissions architecture, such that each agent is given a specific task with specific resources (eg, build a fusion reactor on this area of Mars), and (ideally) has zero incentive to break out of its permissions. 'Agents' don't even necessarily need to be minds: the 'fusion-reactor building agent' above could be deconstructed into a knowledge agent that only accumulates knowledge about fusion power, a planning agent that creates a plan (but doesn't execute it), a management agent that executes the plan (but doesn't modify it), all the way down to agents that control individual robots within strict parameters set by the management agent. None of the agents needs to be fully autonomous or even 'intelligent' outside of the bounds of their delimited problem.

I need to write this out in detail somewhere, but Drexler's CAIS proposal is the same basic idea.

Expand full comment

Reply (3)

Roko

Oct 20

> the idea of the agent dealing with a stochastic, partly-unknown environment.

Sure. But it's fairly easy to extend my formalism to that. Maybe I should.

Expand full comment

Roko

Oct 20

> AIXI, in contrast, though an abstract formalism, does incorporate the idea of the agent dealing with a stochastic, partly-unknown environment. This is much closer to any superintelligent agent that is likely to actually be built.

AIXI definitely can't be built. It requires hypercomputation (a Halting Oracle).

LT:BGRO() doesn't require exotic models of computation, it's just a finitely large lookup table.

Expand full comment

Roko

Oct 20

> True, but not insightful.

It is insightful, because people are claiming that building aligned superintelligence is mathematically impossible, like 2+2=5

This is a falsification of those claims.

Expand full comment

Reply (1)

Isaac Lewis

Oct 21

OK, I see it now.

I gave your two articles a more careful re-read and I think your logic holds up, given your starting assumptions.

(By 'starting assumptions' I mean the general framework of thinking about AI in the LW-Bayesian-rationalist sphere -- the various ideas Yudkowsky, Bostrom, et al.

I don't personally share all of those assumptions, but that's a deeper philosophical disagreement that should be written up somewhere else. For one simple example, I'm not sure that it's possible to define a universal human value function for the AI utopia, partly as simply building the utopia would amputate many human values. I do think this problem is soluble with a different approach, however.)

Expand full comment

Reply (1)