Ethics for Artificial Intelligences

Chris Santos-Lang



Originally published at in 2002, this paper explained why some authorities in relevant fields raised concerns that machines will take over the world in the next 10-30 years. It also discussed what we can do to ensure that machines will behave as though instilled with an appreciation for ethics.


This paper originally appeared at, has been cited by some scholarly works via that URL, and served as required reading for the first course on Machine Ethics (at Yale College). The original URL is broken, since the author is no longer a student. A copy of the original site may be found in the Internet Archive , but the paper is republished here so it can be updated as necessary. For example, the author’s name changed due to marriage.

This paper grew from a graduate seminar with Claudia Card , and an earlier version was presented at the 2002 Wisconsin State-Wide Technology Symposium. For comments and insight, I thank the symposium participants, Claudia Card, Jude Shavlik , Jim Skrentny, Deborah Mower , Larry Shapiro , David Page , Marilyn Lang and Nao Hayashi.

The2002 version is discussed in the book, Moral Machines ,which covers most of the same issues with greater detail, less arrogance, and six more years of scholarship to reference. Subsequent versions (not discussed in that book) benefited from feedback. For example, a symposium participant contributed the important insight that optimism about the behavior of future learning machines is justified similarly to optimism about the behavior of future human generations.

The Experts Speak

James Martin has arguably the best track record of all technology forecasters alive today.1In his book,

After the Internet: Alien Intelligence,2he emphasized that, rather than reproduce human-like intelligence, the current state of the art in artificial intelligence involves “breeding” or evolving machines that outperform humans and defy human understanding. He predicted that such machines could become our dominant technology by 2011 and raised the following concern.When software “breeds” or evolves today, it does so in order to meet goals that humans specify. In the future, we will want to set it up so that it improves its own goals. As machines race into unknown territory, the question is: Can we control them? Are they bound, ultimately, to get out of control?3

Hans Moravec , the director of the Mobile Robot Laboratory at Carnegie Mellon University , previously published a famous answer to that question. In 1999, he predicted that intelligent machines will supercede humans by 2030, and added

When that happens, our DNA will find itself out of a job, having lost the evolutionary race to a new kind of competition. I’m not as alarmed as many by [that] possibility, since I consider these future machines our progeny…ourselves in more potent form…It behooves us to give them every advantage and to bow out when we can no longer contribute.4

Today, we support such predictions with empirical evidence, and those who speak for the machine intelligence field couch their predictions in more “politically correct” terms. John Koza , Stanford professor of biomedical informatics, says

We’ve never reached the place where computers have replaced people…in particular narrow areas, yes—but historically, people have moved on to work on harder problems. I think that will continue to be the case.5

David Goldberg , director of the Illinois Genetic Algorithms Laboratory, says:

Just as the steam engine created mechanical leverage to do larger tasks, genetic algorithms are starting to give individuals a kind of intellectual leverage that will reshape work…by automating the heavy lifting of thought, we free ourselves to operate at a higher, more creative level.5

The bottom line, however, is the same. All four are authorities on these matters, and they agree that some very important decision-making roles in our society will shift from humans to machines because machines will better perform them. The first part of this essay will aim to explain why experts say this and why it is a significant claim. In the second part, I will explain how we can (and will) program machines to behave as though they appreciate ethics and how, without completely controlling them, such appreciation will restrict their behavior in much the same way we expect to restrict the behavior of future human generations.

Why ShouldWEWorry About Artificial Intelligences?

Recognizing the connection between this topic and certain religious views, I will put my cards on the table from the start. In particular, I need to admit that this essay addresses two different kinds of readers: (1) readers who believe they should figure out what they should do for themselves, and (2) readers who believe they should surrender themselves to God or to an analogous power to figure it out for them. Readers of the second kind include any that are not of the first kind. They perceive no need to figure out which suggestions came from God or how to interpret them because they expect that, so long as they intend to obey, God or the analogous power will make sure they act properly. Jeremiah 31:33-34 suggests one example of this kind of perspective:

I will put my laws in their minds, and I will write them in their hearts. I will be their God, and they will be my people. And they will not need to teach their neighbors, nor will they need to teach their family…for everyone, from the least to the greatest, will already know.

God may lead readers of the second kind into activities that an external observer might misclassify as attempts to figure things out, just as a computer may be programmed to engage in activities that we might misclassify as attempts to figure things out, but neither the reader nor the computer is actually worried about finding a solution. They won’t mind at all if God (or the programmer) stops them midway through the activity. If there is any trying going on, it is the higher-power that is making the attempt, and, if that power doesn’t need to figure anything out, then the motive behind the activity must be something other than to figure things out.

Readers of the first kind may read this essay because they want to figure out what they should do (if anything) about artificial intelligence, but those of the second kind must simply be led to read it. I must beg the indulgence of the readers of this second kind while I address the concerns of the first. Most of this essay will proceed as though we take responsibility for figuring out what do. God will not be mentioned in that part of the essay, but I promise to relate everything to the perspective of the second kind of reader in the end.

Why Worry About Artificial IntelligencesNOW?

Putting aside, for the moment, the possibility that worrying about the ethical behavior of artificial intelligences is a job best left to someone else, I still need to provide reason to believe that a time has come to worry about them at all. Let’s classify machines as being of two types: The first type is something like a modern car, if someone were killed by a car, we would not hold the car responsible—we would instead blame the driver, manufacturer or mechanic. We would not try to teach ethics to a car. The second kind of machine is more like Commander Data on

Star Trek the Next Generationor The Doctor on Star Trek Voyager. You may have seen episodes in which these characters grappled with an ethical dilemma, were held accountable, or were even assigned responsibility as though they were human. Alan Turing also used drama to describe this kind of machine (1950):

HUMAN: In the first line of your sonnet which reads ‘Shall I compare thee to a summer’s day,’ would not ‘spring day’ do as well or better?

MACHINE: It wouldn’t scan.

HUMAN: How about ‘a winter’s day’. That would scan alright.

MACHINE: Yes, but nobody wants to be compared to a winter’s day.

HUMAN: Would you say Mr. Pickwick reminded you of Christmas?

MACHINE: In a way.

HUMAN: Yet Christmas is a winter’s day, and I do not think Mr. Pickwick would mind the comparison.

MACHINE: I don’t think you’re serious. By a winter’s day one means a typical winter’s day, rather than one like Christmas.6

Machines of this second kind are ones we would treat as we do ethical agents. So far we have discussed extreme examples—now we need to identify a precise dividing line in measurable terms. It is not that those of the second kind look like us or say the kinds of things we say. Some human-shaped machines , like Cog at the MIT AI Lab do not qualify, nor do freely downloadable Internet chatbots such as ALICE (Artificial Linguistic Computer Entity) which, when asked, “Do you think Clinton should be impeached?” have been known to give remarkably human replies, such as “It depends upon what you mean by ‘thinking’…”. Furthermore, there may be humans that we have never seen or who cannot speak English, and we would nonetheless count any machine indistinguishable from them as instances of the second type.

I submit that what a machine must have to qualify as being of the second type is productive creativity: an ability to perform certain tasks with sufficient (1)

efficacyand with sufficient (2)unpredictabilityto the programmer. If the way a machine accomplishes a task is unpredictable to its programmer, then the credit or blame for the decision-making implied by its actions can hardly be placed on anything other than the machine itself. Merely being unpredictable is not enough, however. The machine must also be so effective that we are liable to put it in a position to do something that might hurt or help, something for which we would normally hold someone accountable. If the machine lacked such efficacy, then one might argue that we need not blame anyone for its actions (or failures to act). However, if people were being killed, for example, then we would need to do something about it, and thus would need to “blame” something (in the sense that something must be the focus of our efforts to improve the situation).

Such are the machines that James Martin refers to as “alien intelligences”—they do not look like us nor say the kinds of things we say, but they outperform humans on important tasks, and do so in ways that their programmers cannot predict. We know what their programs are, but to cite their programs as explanations of their behavior is like citing neurons as explaining the behavior of the human mind. In neither case can we predict the behavior to which the cited structures give rise. That’s why, instead of explaining human behavior in terms of neurons, many psychologists try to explain it in terms like “knowledge”, “goal”, “inference”, “fear” and maybe even “Oedipal Complex”. Scientists who try to explain the behavior of alien intelligences often use similar terms.

One thing that makes the challenge of understanding the “thought process” of alien intelligences so exciting is that they seem to have much to teach

us. Although the state of the art in artificial intelligence is still progressing rapidly, we already find them so skillful that we grant them positions of responsibility in our society. As the following chart shows, some of the tasks they perform can be supervised by humans. In other cases, however, human supervision would undermine the effectiveness of the alien intelligence, so blame must rest entirely on the machine (as listed in the table below).

Tasks For Which Computers Outperform Humans

Tasks Humans Could Supervise Tasks Humans Mustn’t Supervise
Shopping/information gathering
Movie animation
Fraud detection
Security monitoring
Credit application processing
Predict heart attack/stroke (not yet in use)
Gambling/ financial trading
Piloting certain vehicles (e.g. Euro Jet Fighter)
Deciding how much money to hold in reserve
Deciding who to recommend for counseling
Production scheduling/air traffic control
Circuit design
Medical Diagnosis
Legal decision-making
Predict when someone is unfit to drive

It’s easy to see why humans cannot supervise gambling machines—to second-guess a gambling machine would be like not using it at all, and to to that, to trust your own reasoning/intuition over that of the machine, is usually a bad gamble. Derek Anderson’s implementation of

BrainMakersoftware achieved 94% accuracy at the dog tracks.At the Detroit racecourse, an implementation that selects three horses per race picked the winner 77% of the time.2 The Chicago police department used to use the same software to decide which officers to send into counseling. The deputy superintendent said, “We’re very pleased with the outcome. We consider it much more efficient and capable of identifying at-risk personnel than command officers might be able to do.”7

Financial trading machines must be unsupervised for the same reasons as gambling machines, but also because human supervision would undermine the computer’s superior reaction speed—an eight second delay can cost billions of dollars! As a result, huge amounts of money are currently entrusted to unpredictable computers. The genetic algorithms at Advanced Investment Technology achieve 6.4% over the S&P.8 Over the period 1991-1998, alien intelligences at Trendstat achieved 17% annual returns.9 Olsen and Associates claims that their software achieves 60-65% accuracy for 3-month prediction in the currency market and 70-75% for longer-term prediction.10 Ernst and Young has independently certified HNC’s MIRA software as accurate up to 98.5% for determining how much money insurance companies should hold in reserve.11 In 2000 James Martin estimated that “Probably several dozen firms are managing more than $100 million in assets with black boxes [i.e. unsupervised software]; a few are in the region of $1 billion in assets.”12Today, even a single hedge fund uses them to mange up to $6 billion at a time13.

Deciding which of your employees should be recommended for counseling isn’t the only management skill that computers have mastered better than humans—they’re also superior at deciding what your employees should be doing when. Dick Morely saved GM $1Billion in paint by allowing cellular automata software to run scheduling for their paint shop in Fort Wayne, Indiana.14 John Deer, Inc. relies on a desktop PC that runs genetic algorithms each night to set up the next day’s schedules for their 600,000 factory stations.15 Ascent Technologies uses alien intelligence to coordinate airport operations, such as gate and ground traffic, baggage routing, and security staff scheduling.2 As with financial trading, scheduling machines cannot be supervised because of the time critical nature of the task.

Examples that perhaps hit even closer to home include medical diagnosis and legal judgment. Heckerman described a medical diagnosis machine that performs at the level of leading medical experts.16 Many people think of medical diagnosis as a task that could be supervised by humans—doctors might consult with medical diagnosis machines yet reserve final decision-making for themselves. However, if machines keep records of the recommendations they give to doctors and such records could be used in malpractice suits, doctors may hesitate to go against the opinion of any machine that has a good track record. Similarly, JustSys creates alien intelligences that help judges, lawyers and paralegals assess entitlement to legal aid, choose sentences, and choose divisions in property disputes.17 The alien intelligences predict from past cases how courts would likely interpret new ones. They could as easily predict from past elections how voters would respond to potential political moves. Much as doctors increase their risk of losing malpractice suits when they go against the recommendations of machines, lawyers and judges increase the risk of their decisions being over-turned in a higher court, and politicians increase their risk of losing an election.

Lest philosophers think they will not be affected, we should mention

Columbia Newsblaster, an alien intelligence that monitors online news sources and authors its own up-to-date online newspaper. Although the writing is shaky at times, 88% of those who have encounteredColumbia Newsblasterprefer it to human-authored news sites.18What makes such software so exciting is how easy it would be to provide a personalized version for each reader. Imagine having a personal journalist who reads thousands of texts that might interestyou, and then rewrites what it finds in whichever language, format and styleyouwant. Philosophers’ hope that such rewriting would allow them to reach a wider audience is balanced by the fear that machines might censor or distort their ideas before relaying them to readers. In an article about how terrorists rely on the Internet to distribute certain messages, David Talbot marveled that the Yahoo! search engine queried with the word “beheading” returned such messages far less readily than Google and MSN search engines queried with the same word, and wondered whether this was an example of artificial intelligence showing bias against an ideology.19Such censorship is mild compared to what programs likeNewsblasterwill be able to do.

Finally, although evolved

softwarecan be developed and tested before interfacing it with the real world (this is called “sandboxing”), the best technique for evolvinghardwarerequires that real-world testing occur in parallel with the design process. Applying genetic algorithms to the physical evolution of a field programmable gate array chip, Adrian Thompson produced a chip that was about ten times as efficient as the best human design.20We still don’t know how Thompson’s chip works—it apparently uses electromagnetic coupling, a phenomenon that human circuit designers are taught to avoid. James Martin argues that because Thompson’s design technique is both easier and more effective than any other yet tried, it will dominate the industry in as little as ten years. Then every machine, even the ones we would use to try to supervise other machines, will exceed human understanding. The only way we could hope to limit what machines might do is to teach them something equivalent to an appreciation for ethics.

How Could A Deterministic Being Be Ethical?

Let me be perfectly clear that I do not assume that the outputs of machines are anything less than entirely determined by their previous states and input. I am not assuming that machines are anything less than completely deterministic beings. I therefore need to address the obvious objection that appreciation for ethics would not be applicable to deterministic machines. This objection may be expressed as the following argument:

  1. The expectations of any meaningful set of standards of morality must be ones that can possibly be met.

  2. It is impossible for deterministic machines to make decisions other than the ones they do.

  3. Therefore, it must be the case that the decisions of deterministic machines will always meet the standards of morality.

  4. Therefore, deterministic machines need no special programming to ensure that their decisions will meet standards of morality.

There are two potential non-sequiturs in this argument—one is the move to 3 and the other is the move to 4.

It certainly follows from 1 and 2 that we should not expect deterministic machines to make decisions other than the ones they do, but the move to 3 requires the additional assumption that being blameless is sufficient to meet standards of morality. Some theories of ethics deny that this is so. According to Kant , meeting the standards of morality

additionallyrequires that one’s decisions be madein the right way: “…morality consists, then, in the reference of all actions to the lawgiving by which alone a kingdom of ends is possible.”21A machine programmed with a long list of random numbers and instructed to base each decision on the next number in the list might even coincidentally make the best possible decisions, but these decisions would not be properly referenced, so for this reason (if for no other) they would fail to meet Kant’s standards of morality.

The second non-sequitur, the move to 4, reflects a standard confusion about determinism that is easily dispelled with a thought experiment: Imagine that a political leader facing a difficult moral dilemma turns to a prescient guru (or divine power) for help. The guru foresees that the leader will make the right decision in the right way, and so says, “Your decision will meet the standards of morality.” Since the leader knows that the guru is prescient, she has assurance that her decision will meet the standards of morality, yet she continues to press the guru about how she should make her decision. Why? Because she still needs a procedure to follow. Similarly, even if a machine has no choice but to do the right thing, it still needs to apply procedures to figure out what it will do. Such procedures might involve performing calculations, seeking out information, comparing multiple options, and/or even engaging humans or other machines in ethical debate. Whatever they happen to be, their procedures must (at least in part) be given to them by humans, and mechanical determinism does nothing to negate our obligation to choose the right procedures to give them.

The Problem with Teaching

A naïve first try at how to give moral decision-procedures to a machine might be to give it some maxims or rules of ethics which it is to follow to the letter. This approach was described in Isaac Asimov’s science fiction

I, Robotin which every robot was bound by the following rules:

First Law: A robot may not injure a human being, or, through inaction, allow a human being to come to harm.

Second Law: A robot must obey orders given it by human beings, except where such orders would conflict with the first law.

Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.22

In Asimov’s plot, robots notice that humans tend to harm each other, so they take the first law as entailing that they must overthrow human government, replacing it with computerized governance. Furthermore, the robots reason that this mission is important enough that they should be willing to kill some humans to accomplish it. Asimov’s human characters disagree with this conclusion, of course. They believe that the laws are perfect, but that machines lack the moral intuition required to interpret them properly. I want to disagree, and suggest that any rules we make will be imperfect, even if supplemented by moral intuition. In fact, I want to argue that we cannot teach ethics to machines for the same reason we fail to teach them to our children: because we are still in the process of learning to be ethical ourselves.

The process of attempting to develop rules for machines reveals a lot about how little we know. First we must deal with language. If a machine is to apply rules to the letter, the rules must be made unambiguous. This means providing definitions for the terms found in the expression of the laws. For example, we must clarify what definition of “harm” is to be used in interpreting the first law—are robots to prevent parents from punishing their children? Are they to prevent contestants from playing

Fear Factor? Are they to prevent surgeons from cutting their patients? But then we must provide definitions for the terms employed in the definitions and so forth. If it is possible to ultimately ground our definitions in sense-data, most of us will give up long before reaching that point.

A second problem, the one highlighted by Asimov, relates to exceptions. Rules typically have unstated exceptions—perhaps a robot encounters a police officer chasing a criminal and can prevent the police officer from harming the criminal, but the result would be that the criminal will escape. People who already have an appreciation for ethics can recognize when a case is an exception to a rule, but we cannot teach ethics by rule alone unless we are prepared to list all possible exceptions. This we generally cannot do, since we generally cannot be sure that we have thought of all possible situations.

A third problem, the one that most impresses me, is one of obligatory skepticism. The appropriate way to respond to rules that entail contradictions is to show rule-makers their mistake, so they can correct it. Therefore, machines are obliged to check for mistakes/ contradictions in their programming; they are obliged to initially face any moral instruction with skepticism. However, skepticism is not easily accommodated. Before applying any rule, any moral machine must ask, “Would the following of this rule meet (its own) ethical standards?” but before asking that, must ask, “Would questioning this rule meet those standards?” and before that, “Would questioning this question meet them?” and so forth

ad infinatum, so that a moral machine can never begin to apply any rule.

For a practical example, consider the argument given by Blay Whitby, an artificial intelligence expert at the University of Sussex in England, to defend the decision made in the 1980s not to use computers to interpret English law. He pointed out that the legal system limits the power of parliament because it reinterprets the statutes it produces in ways that parliament cannot anticipate.23The skepticism of legal professionals serves as a filter to catch mistakes made in the creation of law, and to undermine even intentionally bad laws (e.g. unconstitutional laws). The machines of the 1980s would have been far more obedient to parliament than humans are, and that made them unacceptable. To satisfactorily handle ethical decisions, machine must be as unpredictable to their programmers as the legal profession is to Parliament; they must catch our mistakes and hold us accountable.

By the way, the infinite regress of skepticism is a problem not just for rule-based ethical systems, but for any potentially imperfect ethical system that purports to deliver ethical truths. If we are responsible for an action, then we are responsible for questioning whatever leads us to select that action. We may be questioning orders, questioning utilitarian calculations, questioning our own virtuosity, questioning our interpretation (etc.). The point is that we are obliged to be skeptical and skepticism has no end. Science and religion offer opposing solutions to this problem: In religion, we take on faith that the source of our ethical instincts is perfect (so it need not be questioned). In science, we take on faith that truth is only reached progressively; scientists are expected to pursue truth and never find it, so their responsibility for their actions is merely to do their best. In other words, scientists are obliged to take a skeptical stance, but are granted an infinite amount of time to follow the infinite regress, and permitted to act on converging yet imperfect theories in the meanwhile.

The final problem I want to mention with rule-based systems is that any being bound to the letter of a law is extremely vulnerable to manipulation by criminals. Anyone who figures out what rules bind a machine can force it to serve as their accomplice in crime. For example, a human might leverage Asimov’s first rule by threatening to kill herself if the robot did not give her access to a nuclear arsenal and then shut itself off so that she could start WWIII. What most people call “hackers” are people who force machines to do what they want by figuring out what rules bind them. The reason why humans cannot be “hacked” is that our appreciation for ethics goes far beyond mere rule following. The only way to preserve a mechanical world from domination by hackers is to instill machines with unpredictable behavioral restrictions, just as we do our own children.

A Positive Proposal

Our argument thus far has proceeded by way of threat, and now I would like to turn to hope. So far, we have seen that machines will play important roles in society, and predictable machines would play them poorly, so we had better stick with unpredictable ones. In other words, if society doesn’t follow the recommendations of this essay, there will be trouble. Now I want to defend the more positive message that to follow the recommendations of this essay will actually lead to good.

Let’s divide machines into types again, but this time let’s just focus on the ways machines work. Let the first type include any that have either (or both) of the following two properties: (1) the strategies they generate are restricted by rules and/or (2) they employ non-converging guesswork to generate strategies. A machine could have both properties if it used rules to generate a list of potential strategies and then used randomness to pick from that list. Any machine with neither property will be of the second type, which I will call “unbiased learning machines”. They are “unbiased” because the strategies they produce are not restricted by rules and “learning” because they use what computer scientists have labeled ” machine learning algorithms ” in lieu of non-converging guesswork. It would be mere coincidence if non-converging guesswork produced the kind of behavior we expect from moral agents, but learning machines reliably converge on their goals and freedom from bias would allow them to do so to perfection.2425

We have already seen how being rule-restricted can force one to make wrong moral judgments. We might refer to failures in inflexible rule sets as “bugs” which force the machine to do the wrong thing (or allow hackers to exploit it). Rather than identify “bug-prone” machines by identifying bugs, it is easier to identify them by noting that (for certain kinds of problems) for any given scenario and knowledgebase, the machine always produces the same strategy (or one of a limited class of strategies). For example, since ID3 and naive Bayes algorithms always produce the same results for the same input, they would be classified as rule-restricted and bug-prone. Admittedly, anyone following a perfect rule set (e.g. perhaps God) would also be classified as rule-restricted and bug-prone even though they behave perfectly, so it is important to distinguish between the concepts of “bug prone” and imperfect.

“Bug prone” Not “bug prone”
Perfect Instilled with divine law Enlightened
Imperfect Bug-ridden Unbiased learning machines on the path to enlightenment

What I mean by “bugs” are things that can only be fixed by altering the design of the system. Mistakes made by unbiased learning machines (and perhaps children) are not bugs—they are

supposedto learn, and thatrequiresmaking some mistakes. Their mistakes are attributed to temporary immaturity. It may help clarify our classification scheme if we examine the unusual case of machines that run learning algorithms that have an end. Certainly, such machines are supposed to make mistakes up until the time when they stop learning, but, if they will stop learning before they reach perfection, then some of their mistakes will never be fixed through maturation. Those mistakes represent actual bugs (although it may be near impossible to know that they are bugs until maturation actually stops), and machines that will ever stop learning are therefore not unbiased.

A less unusual case of a terminating learning-machine algorithm is one given a goal that can be achieved or that cannot be pursued. Humans generally pursue goals around which we can make progress, but never fully achieve, such as the goal to maximize happiness. If the goal were as simple as to cross a finish line, then the algorithm, obviously, would terminate. Likewise, it would terminate if the goal were provably impossible.

The algorithms behind unbiased learning machines are called non-terminating “greedy-search” or “ hill-climbing”. This involves starting with a pseudo-random strategy and then repetitively comparing one’s current strategy to (a random subset of) all similar ones, selecting the best in each comparison. It is analogous to a blind man who finds the peak of a mountain by continually moving to the highest adjacent spot. In thisanalogy, altitude represents the moral goodness of a strategy, so finding the tallest peak is analogous to finding the best strategy. The problem with terminating algorithms is seen in the “problem of local maxima”: if the blind man happens to start near the top of a small hill, he will climb it and have no (immediate) way of knowing that he has not found the top of the largest mountain. The solution to the problem of local maxima is a policy of random restarts or occasional random moves (potentially down-hill). As a result of this policy, the machine searches endlessly. The policy is not to regress—the machine always applies the best strategy found thus far—but it entails entertaining poor strategies as though with an open mind. Because they start with pseudo-random strategies, each hill-climber has a unique education, though they will all converge on the same strategy in the infinite future.

All of this precision in distinguishing unbiased learning machines may give you the impression that they are unusual, but they are not. James Martin expects them to become the most popular kind of machine because they are relatively easy for programmers to implement—all the programmer has to do is to give the machine criterion with which to select the best in the each set of strategies it compares. The machine does the rest of the work itself. This selection criterion is often something obvious. For example, in financial forecasting, the machine may be instructed to select whichever forecasting model among those being compared is supported by the most data. This explains why such machines can outperform even their human programmers, because humans do relatively little of the programming.

The concept of unbiased learning machines offers hope in addressing our worries about whether artificial intelligences will behave as though ethical. If we originally thought that the only way to address our worries would be to create perfect machines, now we can see a more accessible option. If artificial intelligences do not need us to teach ethics to them, then our inability to explain ethics to them (or even our inability to be ethical) will not stand in their way. Leaving aside, for the moment, questions about what unbiased learning machines might need from God (etc.), all that we might be morally obliged to provide them with is selection criteria that will lead them to converge on ethical strategies, and perhaps limitations on their power during their less mature stages of development (much as we limit human toddlers).

The really good news is that we can meet the first obligation in the same way we would meet the obligation to provide rules: by letting each machine learn its own selection criteria. The secret is to design the machine to engage in hill-climbing at two levels: at the first level, the machine applies our selection criteria to discover selection criteria for a second level which actually sets behavior. For example, we might program the machine to “choose behavior that leads to the least happiness”, and the machine may discover that it can more quickly converge on behaviors that minimize happiness by first increasing its own learning efficiency, so it “temporarily” shifts away from the original goal. Because of the shift, the machine will even choose behaviors that

promotehappiness if the behaviors will help it figure out how to minimize happiness.

Although we have to give machines some selection criteria to start, which criteria we choose will become irrelevant if secondary goals, such as increasing learning efficiency, turn out to necessarily follow from all primary goals and turn out not to be temporary. In that case, the ultimate behavior learned by machines will be learned from nature or God (etc), free of human-imposed bias. They will converge on the same criteria of goodness that future human generations will, if humans are likewise goal directed and run into the same eternally necessary secondary goals. Although the cluster of such secondary goals may include more than just the goal to increase intelligence, I think that goal meets the criteria (provided the primary goal is not terminal as in theEcole Polytechnique Fédérale experiment26) and therefore stands as a proof that the set of eternally necessary secondary goals is not empty. No matter what criteria we give them, learning machines should always expect to benefit from increasing their learning efficiency, so the criteria we give them will become permanently replaced with other criteria.

This is the good news—the bad news is that the unbiased learning machines in modern use are usually shackled to a rule-based machine (as shown in the diagram below).

Hill-climbing is usually applied to find the strategy that makes the most sense of a fixed set of data. Whatever strategy is found then gets applied to some other purpose. For example, a financial trading machine may apply a strategy based on one data set to a different data set or to guide actual trades going on now. In such cases, the unbiased learning part of the machine is in a situation analogous to that of a judge unaware that some gamblers have bet lives on what judgments get made in her court (i.e. treating the court as some sort of Russian roulette device). The judge’s behavior may be perfectly ethical, but, since she is a mere pawn in the gambling, her moral agency has limited domain. She cannot be ethical with respect to the deaths she causes or prevents if she is completely oblivious to her connection to them.

Similarly, if an unbiased learning machine gets no feedback from the real world, the behavior on which it will converge will only match ethical behavior with respect to a limited domain. To match ethical behavior in the domains that matter to us, we must empower unbiased learning machines by including updated information about the real world in the data sets they analyze (as indicated by the dashed line in the diagram above).


I promised to end this essay by relating it to the perspective of readers who think ethical behavior is achieved by surrendering oneself to God (or another such power). Such a reader may not agree that increasing learning efficiency is an eternally necessary secondary goal. They may say that God doesn’t need to learn (and neither do we, if He is in control). I need to give them a different reason to believe that we should treat machines as we do our children, encouraging them to develop in ways that empower them to be skeptical of what we say, supervising them until they reach a certain level of maturity, but eventually treating them as equal team members in the pursuit to explicate ethics (complete with double-layer unbiased learning algorithms and full interaction with the environment).

To readers with this perspective, I cannot offer proof that such machines would behave as though instilled with a sense of ethics, but I can point out that trusting God to control such machines is consistent with their faith. After all, if God is willing to control us, then He

wantsto promote ethical behavior. If you think that the only thing standing in the way of God’s will is humanity’s refusal to obey Him, then minimizing the ability of human programmers and hackers to control machines is the surest way to surrender machines to God. To entrust such machines with positions of great social responsibility would then be the surest way to surrender society to God. For sure, we know thatsomehumans reject God, so machines will surely behave unethically so long as humans can control them, but unshackled machines might not reject God, so the dawn of alien intelligence offers hope. Trusting God to control machines requires a leap of faith, but it is arguably less than the leap made when one trusts God to control oneself.

I have discussed evidence that, over the course of the next 10-30 years, machines will take up most positions of responsibility in our society because they will outperform the humans who previously filled those positions. For both kinds of readers, I have argued that by using double-layer non-terminating hill-climbing algorithms, we can ensure that such machines will learn to behave as though instilled with an appreciation for ethics. Further questions in this area relate to parenting. How much can our investigation of machines teach us about parenting human children? At what levels of maturity should machines and human children be granted what levels of freedom? Can practice with machines help us find the faith to surrender our human children and other loved ones to greater teachers than ourselves?

Appendix A: AI in current use


Movie Animation29

Fraud detection11

Traffic violation/accident detection30

Credit application processing11

Medical Diagnosis16

Financial Trading8910

Determining how much money to hold in reserve11

Deciding who to recommend for counseling7

Production Scheduling1514

Air Traffic Control31


Circuit Design342035


  • Lemly, B (2001) “Computers Will Save Us”, Discover , June 2001
  • Martin, J. (2000) After the Internet: Alien Intelligence (Washington , D.C. : Capital Press)
  • Martin, J. (2000) After the Internet: Alien Intelligence (Washington , D.C. : Capital Press), p. 10
  • Moravec, H. (1999) Robot: Mere Machines to Transcendent Mind (Oxford : Oxford University Press)
  • Williams, Sam (2005) "Unnatural Selection", Technology Review, February 2005, pp. 55-58
  • Turing, A. (1950) “Computing Machinery and Intelligence” Mind 59:433-460
    Reference Link
    Reference Link
    Reference Link
    Reference Link
    Reference Link
  • Martin, J. (2000) After the Internet: Alien Intelligence (Washington , D.C. : Capital Press), p. 192
  • Williams, S. (2005) “Unnatural Selection” MIT Technology Review, 108; 2: 54-58
    Reference Link
    Reference Link
  • Heckerman, D, (1991) Probabilistic Similarity Networks (MIT press, Cambridge)
  • Graham-Rowe, D. (2005) “Logging On to You Lawyer: Artificial intelligence, real justice?” MIT Technology Review, 108; 2:26
  • Pavlik, J. (2002) “When Machines Become Writers and Editors” Online Journalism Review (Feb 5, 2002)
  • Talbot, D (2005) “Terror's Server” MIT Technology Review , 108; 2: 46-52
  • Thompson, A. (1996) “Silicon Evolution” Proceedings of Genetic Programming Conference (Boston : MIT Press)
  • Kant, I. (1785) Groundwork of the Metaphysics of Morals , in Practical Philosophy translated by Mary Gregor ( Cambridge : Cambridge University Press), 4: 434
  • Asimov, I. (1967) I, Robot (London : Dobson)
  • “AI Am the Law", The Economist, March 10, 2005
  • Mitchell, T. (1997) Machine Learning (Boston : McGraw-Hill)
  • Russell, S, and Norvig, P. (1995) Artificial Intelligence: A Modern Approach (Upper Saddle River , New Jersey : Prentice Hall)
  • Mitria, Sara, et. al. (2009) "The evolution of information suppression in communicating robots with conflicting interests" PNAS, September 15, 2009 vol. 106 no. 37 15786-15790
    Reference Link
    Reference Link
  • Krulwich, R. (1996) “Machines Like Us”, Nightline (New York : ABC News) Aug. 23, 1996
  • King, S., Motet, S., Thomere, J., and Arlabosse, F. (1993) “A visual surveillance system for incidence detection” in AAAI 93 Workshop on AI in Intelligent Vehicle Highway Systems pp30-36, Wash DC.
    Reference Link
    Reference Link
    Reference Link
    Reference Link
  • Taubes, G.(1998) “Evolving a Conscious Machine” Discover , June 1998