It hit me how much of a workout my cerebral cortex would get this weekend when I boarded the bus and overheard a half American / half Australian accent saying, “What complex numbers are saying is up plus down is different from up minus down. If you treat everything as a wave, waves are not phases.”
It really sank in when, at the opening dinner, a young man with a beard and fedora sat down next to me, stuck his hand out, and said, “Hi, I’m Derek, and I’m a philosopher” (no shade; Derek and I had a delightful chat about the episode of The Good Place where Michael puts Chidi through the trolley problem). I didn’t bother commenting to anyone about Steph Curry saying, “game, blouses” to poor, poor James Harden.
So, yeah. The conference about keeping humans safe from artificial general intelligence (AGI), or human-level AI, is weird. With AGI, you’re dealing with a concept so revolutionary in its logical endpoint that it’s almost mystical. Hardcore AI people can seem cult-like.
But the weirdness in a way that is ultimately very fulfilling, despite all the incomprehensible math. It reminds me of eating local food in remote parts of China. The first time I put ma la jiao in my mouth and felt it numb up, it was a totally alien experience. Yet I loved it; it was both rich and expansive, shining a light on such an unexplored corner of the global flavor matrix made me see all food in a new light. Kind of like how talking about AI safety has made me reflect on this whole project of being human.
Consider something called “inverse reinforcement learning,” or IRL for short. In most AI systems, humans define an objective, then train an algorithm to meet it. With IRL, algorithms observe human behavior and define their own objectives based on their own model for human intent and well-being.
IRL is potentially very useful because humans are terrible at specifying our objectives. Even in extremely narrow contexts where software is useful today, computers often act like complete idiots, and it’s on us to make sure we’re inputting commands on the program’s terms. In complex contexts – say, your living room and kitchen – defining all objectives and their relative priorities for a home-assistance robot would be like reconstructing a sliced onion. It would be much easier to let this robot observe our behavior to know, say, what months of the year and times of day we want what windows open, etc.
The advantage of IRL is magnified infinitely when considering the possibility of AGI or superintelligence. If we build something smarter than us, we need it to act in line with our values. But what are our values? Getting world governments to endorse a comprehensive statement of values is the most hopeless task imaginable. Wouldn’t it be great if a superintelligence could learn our values by watching us interact over time instead?
So I clapped when Stuart Russell talked about this in his keynote. But then I was spooked. This would mean outsourcing all of philosophy to machines. It’s a perversion of humans’ instinct for seeking safety when faced with a threat: to flail for something certain to grip onto.
Instead, we’d intentionally hand over relevant decisions about humanity’s direction to computers, because we can’t trust ourselves. It would be the most basic, panic-inducing admission, the ultimate source of all fearful, sleepless nights: that nothing about our existence is certain or absolute. Or if it is, it’s out of humanity’s dimension to be able to define it. For control-oriented species, it’s a heavy, chalky, bitter pill. But maybe robots can figure it out. And in an unnerving debate, it might be the least unnerving option.
The thing about eating out in western China is after three days, you’re desperate for some good old puffy carbs with cheese on top. Looking forward to moving on from the ma la jiao and enjoying some intellectual pizza when I get home.