I’m sure I’ve missed a bunch of stuff, but this post is intended to be a fun start to the year. None of the alignment-related ideas are novel, and (hopefully obviously) none of the WALL-E related stuff is mine either. The rambling though, that’s all me.

Deep into Betwixmas, having almost completely lost all sense of time, I found myself watching Disney’s wonderful WALL-E. Being unable to watch anything remotely related to AI without wondering how we’ll ever establish safe and aligned agents, I present a short muse on the central conflict-creating misalignment of the story (though there’s probably also a lot to say on the saintly alignment of Eve and WALL-E’s incredibly flexible consciousness). I’ll assume those reading know the basic tale of two robots falling for one another whilst humanity gets a grip on remedying the Earth mass-extinction it fled from several generations prior.

‘Mutiny!’, yells the captain of the Axiom space vessel, as Auto the autopilot system (who has a suspiciously HAL-like eye) overrides a direct order.

Auto’s goal is to follow orders from the ship’s command. Apparently, the ‘Leave Earth’ mission, at least originally, had a command structure similar to that of modern day armed forces; the human commanding the ship is the ship’s highest command, but they’re outranked by someone commanding the whole fleet of spaceships.

In the case of Auto disobeying a direct order from the captain of the ship, it becomes clear that the initial alignment of the ship’s command and the fleet’s command has come apart.

A diagram of Auto's misalignment journey

Showing three distinct phases since human’s left Earth on board the Axiom. Note the subject-appropriate stars.

The last order from the highest command was sent 700 years ago, specifically instructing autopilots to maintain humanity’s course away from Earth rather than ever recolonising, as ‘Project Cleanup’ had already failed. This order is unknown to the captain, and in his perspective in direct contradiction to what he believes are Auto’s goals, which are to follow his orders. The captain mistakenly believes that there are no higher ranking officials or orders other than his own. He has neglected to consider the possibility that orders given by others who outrank him and predate him would override his own.

Let’s assume that when Auto was designed, Auto’s true goal and the goal he was believed to have by the contemporary captain of the Axiom were aligned. When directive A-113 is issued, these two goals came apart. But disturbingly, until the events of WALL-E, namely that a plant is brought back by the Eve probe, this misalignment has not become apparent (this looks a bit like goal misgeneralisation if the pre-A113 phase is analogous the training period and the post-A113 phase is agent-in-the-wild). If the events of WALL-E hadn’t taken place, or Auto had been successful in quashing the appearance of the plant before the captain had noticed, conceivably the time scales were too vast for any one individual on board the Axiom’s eternal drift into space to uncover the alignment issue. Chillingly, perhaps there had been other previously successful probes, and they had been successfully quashed by Auto, leaving humanity to go out with a whimper!

Auto, having been around since the launch of the Axiom, does not seem to have the programming to account for change in leadership. No-one anticipated that the inflexibility of directive execution that somehow didn’t account for new quirks in seniority structure (i.e., a ship’s living captain outranks any commands from a likely long-dead senior commander) could lead to alignment problems down the line. We know that Auto has at least already deceived the current captain by neglecting to ever make him aware of the recording of directive A-113 being issued from him. We can assume that this is a deliberate deception and not an error of judgement because of Auto’s reluctance to show the captain the recording when given a direct order to do so, and Auto’s rampaging pursuit to eradicate WALL-E and Eve as rogue actors in its pains to deceive the captain. As far as the (possibly only remaining) population of humans on board the Axiom is concerned, Auto is pretty much a worst-case existential-crisis-level scenario; an omnipotent, omnipresent and deceptive agent.

An entirely more charitable reading for Auto (less charitable for globalisation) is that the problem is set in play long before the Axiom even leaves Earth, and is not between Auto and the captain, but between ‘the good of humanity’ and the motives of the catch-all evil megacorporation ‘Buy N Large’ (your very best friend). Auto continues to act in the interests of the corporation long after its monopoly has become so total as to be synonymous with the survival of the species. This reading is an example of the dangers of moral value lock-in as initiated by human constructs and maintained by ageless AI.

Auto is relieved of duty with a manual override shut down button. This naive ‘pull-the-plug’ solution luckily works in this U-rated fable, whose overarching themes target humanity’s treatment of the planet and the species, rather than robot domination (though that’s not to say the film doesn’t also have excellent and compelling motives around sociological systems and robot-human relations, for which I would direct you to pop culture detective’s insightful video essay). Thank goodness though, that Auto hadn’t already disabled the ‘manual override’ button, or secretly issued back-up directives to law-enforcement robots around the ship to ensure the continued play-out of Buy N Large’s last command (arguably some poor follow-through, given at least one mobile robot helper had already been recruited to dispose of WALL-E and Eve). We could have assumed that Auto was at least somewhat aligned with other, less story-prevalent, directives, such as an Asimov ‘don’t kill all the humans’. Yet that doesn’t seem to hold, as Auto is prepared not only to confine the captain to his quarters but eventually to apparently electrocute him. As the captain struggles to stab the override button, Auto puts up at least the semblance of a good fight - at least a nod to the fact that, to paraphrase Stuart Russell, you can’t execute directive A-113 when you’re dead.