Discussion about this post

User's avatar
Gerald Monroe's avatar

The AI safety field was not evidence based. They started with a bunch of ideas and assumptions, and on this shakey foundation piled up a bunch more. That's why it's mostly useless and unhelpful. Actual AI safety seems to just be basically manual and automated software testing, same way it's been done for decades.

That's the insight - you can be really smart but if your reasoning is based on unproven, low quality information, you cannot produce a good result. Garbage in/out. Information theory says it's The Law.

Hopefully your ideas of negative sum games turns out to be pessimistic. For one thing you need humans who are able to keep AI honest if you are the king. Those humans need to be smart. Similarly aging is going to kill you - you need a few million humans as test subjects for the medicine that could save you. And human doctors to keep the AI honest for that as well.

Expand full comment
Mitchell Porter's avatar

Definitely it's the case that these quasihuman AIs turn out to give a much better ethical simulacrum than one might have expected. They sometimes get confused and think e.g. that protecting people's feelings is more important than preventing the destruction of the world, but they do OK most of the time.

If I have understood you correctly, I agree that developing a complete self-sufficient value system is an appropriate response to an era in which one has to worry about AI enhancement of familiar political risks like war and dictatorship. Part of the problem is that the meaning of alignment has been diluted from anything like "achieve CEV", down to "make the AI do what I want". I sometimes call the former, "civilizational alignment", since it is about imbuing the AI with enough values that an entire benevolent civilization could be reconstructed from them.

I agree very much that it's desirable to have comprehensive proposals for civilization-setting values. At the same time, the old lore of alignment is full of warnings for us that are still relevant. What if you need six core values and you only happened to write down five? What even is the methodology for arriving at a correct civilizational value system? So maybe we can say that any serious proposal needs to have a section explaining how the details of the proposal were arrived at.

Also, the problem of alignment in the simpler sense of "do what I want" is not thoroughly "solved", and especially not in a way that knowably safely scales to superintelligence. After all, AIs still give surprising wrong answers, and that's exactly what you don't want when it reaches the point of escaping human control.

Expand full comment
3 more comments...

No posts