Discussion about this post

User's avatar
Roko's avatar

I think these types of methods can be extended significantly to show the following:

- existence proof of aligned superintelligence that is not just logically but also physically realizable

- proof that ML interpretability/mech interp cannot possibly be logically necessary for aligned superintelligence

- proof that ML interpretability/mech interp cannot possibly be logically sufficient for aligned superintelligence

- proof that given certain minimal emulation ability of humans by AI (e.g. understands common-sense morality and cause and effect) and of AI by humans (humans can do multiplications etc) the internal details of AI models cannot possibly make a difference to the set of realizable good outcomes

Expand full comment
Randall Randall's avatar

If I understand your proposal correctly, LT:BGROW isn't an intelligence at all, super- or not. It's a plan of action that would require a superintelligence to develop. A plan of action can indeed be fully "aligned", "friendly", or whatever, if you look at outcomes, but that doesn't imply that it's possible to construct an unfettered intelligence that always does what a given person thinks is best. That's even the case for the *same* person, upgraded to have more intelligence or information, as we arguably see all the time in humans.

Expand full comment
11 more comments...

No posts