Aligning machine intelligence with human values

Quanta Magazine’s Natalie Wolchover interviews computer scientist Stuart Russell about the future of artificial intelligence:

You could say machines should err on the side of doing nothing in areas where there’s a conflict of values. That might be difficult. I think we will have to build in these value functions. If you want to have a domestic robot in your house, it has to share a pretty good cross-section of human values; otherwise it’s going to do pretty stupid things, like put the cat in the oven for dinner because there’s no food in the fridge and the kids are hungry. Real life is full of these tradeoffs. If the machine makes these tradeoffs in ways that reveal that it just doesn’t get it — that it’s just missing some chunk of what’s obvious to humans — then you’re not going to want that thing in your house.

I don’t see any real way around the fact that there’s going to be, in some sense, a values industry. And I also think there’s a huge economic incentive to get it right. It only takes one or two things like a domestic robot putting the cat in the oven for dinner for people to lose confidence and not buy them.

Then there’s the question, if we get it right such that some intelligent systems behave themselves, as you make the transition to more and more intelligent systems, does that mean you have to get better and better value functions that clean up all the loose ends, or do they still continue behaving themselves? I don’t know the answer yet.

Essentially, the interview is a fantastic primer into the kinds of things computer scientists are up against when it comes to designing intelligent machines. Not just the mechanics of it, but encoding rules based on the way we think, and certain moral codes. All of which means that we have to be able to quantify qualities.