The AI safety field was not evidence based. They started with a bunch of ideas and assumptions, and on this shakey foundation piled up a bunch more. That's why it's mostly useless and unhelpful. Actual AI safety seems to just be basically manual and automated software testing, same way it's been done for decades.
That's the insight - you can be really smart but if your reasoning is based on unproven, low quality information, you cannot produce a good result. Garbage in/out. Information theory says it's The Law.
Hopefully your ideas of negative sum games turns out to be pessimistic. For one thing you need humans who are able to keep AI honest if you are the king. Those humans need to be smart. Similarly aging is going to kill you - you need a few million humans as test subjects for the medicine that could save you. And human doctors to keep the AI honest for that as well.
Definitely it's the case that these quasihuman AIs turn out to give a much better ethical simulacrum than one might have expected. They sometimes get confused and think e.g. that protecting people's feelings is more important than preventing the destruction of the world, but they do OK most of the time.
If I have understood you correctly, I agree that developing a complete self-sufficient value system is an appropriate response to an era in which one has to worry about AI enhancement of familiar political risks like war and dictatorship. Part of the problem is that the meaning of alignment has been diluted from anything like "achieve CEV", down to "make the AI do what I want". I sometimes call the former, "civilizational alignment", since it is about imbuing the AI with enough values that an entire benevolent civilization could be reconstructed from them.
I agree very much that it's desirable to have comprehensive proposals for civilization-setting values. At the same time, the old lore of alignment is full of warnings for us that are still relevant. What if you need six core values and you only happened to write down five? What even is the methodology for arriving at a correct civilizational value system? So maybe we can say that any serious proposal needs to have a section explaining how the details of the proposal were arrived at.
Also, the problem of alignment in the simpler sense of "do what I want" is not thoroughly "solved", and especially not in a way that knowably safely scales to superintelligence. After all, AIs still give surprising wrong answers, and that's exactly what you don't want when it reaches the point of escaping human control.
One reason I believe that the HUVA assumption became false is people did not realize how much data mattered to shape AI capabilities and values, and a lot of AI safety assumed that a lot of the capabilities and values essentially developed independently of the data, which is almost certainly false for realistic AI systems.
Quintin Pope uses a stronger claim than I'd endorse, but it's directionally correct here:
"- Underestimated the centrality of data in determining ~every aspect of an AI's cognition. Obviously, past frameworks acknowledged that data was important for determining things like an AI's beliefs, but things like the AI's goals, its learning algorithm, its planning process, etc were more often imagined to be independent of the data it was trained on. Different people made this mistake to different degrees, and there were those who e.g., argued that AIs should infer human values from data. However, even such proposals tended to imagine a lot of hand-engineering going into specifying e.g., the process by which the AIs made inferences about human preferences from the data. Few people talked about ~every aspect of an AI's cognition being determined by a mostly values-neutral process of function approximation over a huge dataset that specified all of beliefs and values / planning / etc simultaneously. Again, I think this was mostly due to over-indexing on the implementations of AI / goals / etc that were having the most emphasis at the time."
From the beginning, the whole talk about God-tier AIs coming to rule us or exterminate us seemed like wish casting. It's the same as saying humanity will be invaded by an alien species. In the end, one is hoping the greatest threat to humanity is not ourselves, and doubly with AI, that we will actually invent something so powerful and brilliant.
"Oh, what if we were so amazing we made Gods! Oh, how terrible that would be!"
The AI safety field was not evidence based. They started with a bunch of ideas and assumptions, and on this shakey foundation piled up a bunch more. That's why it's mostly useless and unhelpful. Actual AI safety seems to just be basically manual and automated software testing, same way it's been done for decades.
That's the insight - you can be really smart but if your reasoning is based on unproven, low quality information, you cannot produce a good result. Garbage in/out. Information theory says it's The Law.
Hopefully your ideas of negative sum games turns out to be pessimistic. For one thing you need humans who are able to keep AI honest if you are the king. Those humans need to be smart. Similarly aging is going to kill you - you need a few million humans as test subjects for the medicine that could save you. And human doctors to keep the AI honest for that as well.
Definitely it's the case that these quasihuman AIs turn out to give a much better ethical simulacrum than one might have expected. They sometimes get confused and think e.g. that protecting people's feelings is more important than preventing the destruction of the world, but they do OK most of the time.
If I have understood you correctly, I agree that developing a complete self-sufficient value system is an appropriate response to an era in which one has to worry about AI enhancement of familiar political risks like war and dictatorship. Part of the problem is that the meaning of alignment has been diluted from anything like "achieve CEV", down to "make the AI do what I want". I sometimes call the former, "civilizational alignment", since it is about imbuing the AI with enough values that an entire benevolent civilization could be reconstructed from them.
I agree very much that it's desirable to have comprehensive proposals for civilization-setting values. At the same time, the old lore of alignment is full of warnings for us that are still relevant. What if you need six core values and you only happened to write down five? What even is the methodology for arriving at a correct civilizational value system? So maybe we can say that any serious proposal needs to have a section explaining how the details of the proposal were arrived at.
Also, the problem of alignment in the simpler sense of "do what I want" is not thoroughly "solved", and especially not in a way that knowably safely scales to superintelligence. After all, AIs still give surprising wrong answers, and that's exactly what you don't want when it reaches the point of escaping human control.
> What even is the methodology for arriving at a correct civilizational value system?
See: https://www.transhumanaxiology.com/p/the-elysium-proposal
One reason I believe that the HUVA assumption became false is people did not realize how much data mattered to shape AI capabilities and values, and a lot of AI safety assumed that a lot of the capabilities and values essentially developed independently of the data, which is almost certainly false for realistic AI systems.
Quintin Pope uses a stronger claim than I'd endorse, but it's directionally correct here:
"- Underestimated the centrality of data in determining ~every aspect of an AI's cognition. Obviously, past frameworks acknowledged that data was important for determining things like an AI's beliefs, but things like the AI's goals, its learning algorithm, its planning process, etc were more often imagined to be independent of the data it was trained on. Different people made this mistake to different degrees, and there were those who e.g., argued that AIs should infer human values from data. However, even such proposals tended to imagine a lot of hand-engineering going into specifying e.g., the process by which the AIs made inferences about human preferences from the data. Few people talked about ~every aspect of an AI's cognition being determined by a mostly values-neutral process of function approximation over a huge dataset that specified all of beliefs and values / planning / etc simultaneously. Again, I think this was mostly due to over-indexing on the implementations of AI / goals / etc that were having the most emphasis at the time."
Link to full tweet:
https://x.com/QuintinPope5/status/1709363036849618983
From the beginning, the whole talk about God-tier AIs coming to rule us or exterminate us seemed like wish casting. It's the same as saying humanity will be invaded by an alien species. In the end, one is hoping the greatest threat to humanity is not ourselves, and doubly with AI, that we will actually invent something so powerful and brilliant.
"Oh, what if we were so amazing we made Gods! Oh, how terrible that would be!"