ARTIFICIAL INTELLIGENCE, THE SYNTHESIS OF HUMANS EVOLUTION.

Evolutionary computing


Samuel’s checkers program was also notable for being one of the first efforts at evolutionary computing. (His program “evolved” by pitting a modified copy against the current best version of his program, with the winner becoming the new standard.) Evolutionary computing typically involves the use of some automatic method of generating and evaluating successive “generations” of a program, until a highly proficient solution evolves.

A leading proponent of evolutionary computing, John Holland, also wrote test software for the prototype of the IBM 701 computer. In particular, he helped design a neural-network “virtual” rat that could be trained to navigate through a maze. This work convinced Holland of the efficacy of the bottom-up approach. While continuing to consult for IBM, Holland moved to the University of Michigan in 1952 to pursue a doctorate in mathematics. He soon switched, however, to a new interdisciplinary program in computers and information processing (later known as communications science) created by Arthur Burks, one of the builders of ENIAC and its successor EDVAC. In his 1959 dissertation, for most likely the world’s first computer science Ph.D., Holland proposed a new type of computer—a multiprocessor computer—that would assign each artificial neuron in a network to a separate processor. (In 1985 Daniel Hillis solved the engineering difficulties to build the first such computer, the 65,536-processor Thinking Machines Corporation supercomputer.)

Holland joined the faculty at Michigan after graduation and over the next four decades directed much of the research into methods of automating evolutionary computing, a process now known by the term genetic algorithms. Systems implemented in Holland’s laboratory included a chess program, models of single-cell biological organisms, and a classifier system for controlling a simulated gas-pipeline network. Genetic algorithms are no longer restricted to “academic” demonstrations, however; in one important practical application, a genetic algorithm cooperates with a witness to a crime in order to generate a portrait of the criminal.


Logical reasoning and problem solving


The ability to reason logically is an important aspect of intelligence and has always been a major focus of AI research. An important landmark in this area was a theorem-proving program written in 1955–56 by Allen Newell and J. Clifford Shaw of the RAND Corporation and Herbert Simon of the Carnegie Mellon University. The Logic Theorist, as the program became known, was designed to prove theorems from Principia Mathematica (1910–13), a three-volume work by the British philosopher-mathematicians Alfred North Whitehead and Bertrand Russell. In one instance, a proof devised by the program was more elegant than the proof given in the books.

Newell, Simon, and Shaw went on to write a more powerful program, the General Problem Solver, or GPS. The first version of GPS ran in 1957, and work continued on the project for about a decade. GPS could solve an impressive variety of puzzles using a trial and error approach. However, one criticism of GPS, and similar programs that lack any learning capability, is that the program’s intelligence is entirely secondhand, coming from whatever information the programmer explicitly includes.


English dialogue


Two of the best-known early AI programs, Eliza and Parry, gave an eerie semblance of intelligent conversation. (Details of both were first published in 1966.) Eliza, written by Joseph Weizenbaum of MIT’s AI Laboratory, simulated a human therapist. Parry, written by Stanford University psychiatrist Kenneth Colby, simulated a human paranoiac. Psychiatrists who were asked to decide whether they were communicating with Parry or a human paranoiac were often unable to tell. Nevertheless, neither Parry nor Eliza could reasonably be described as intelligent. Parry’s contributions to the conversation were canned—constructed in advance by the programmer and stored away in the computer’s memory. Eliza, too, relied on canned sentences and simple programming tricks.


AI programming languages


In the course of their work on the Logic Theorist and GPS, Newell, Simon, and Shaw developed their Information Processing Language (IPL), a computer language tailored for AI programming. At the heart of IPL was a highly flexible data structure that they called a list. A list is simply an ordered sequence of items of data. Some or all of the items in a list may themselves be listed. This scheme leads to richly branching structures.

In 1960 John McCarthy combined elements of IPL with the lambda calculus (a formal mathematical-logical system) to produce the programming language LISP (List Processor), which remains the principal language for AI work in the United States. (The lambda calculus itself was invented in 1936 by the Princeton logician Alonzo Church while he was investigating the abstract Entscheidungs problem, or “decision problem,” for predicate logic—the same problem that Turing had been attacking when he invented the universal Turing machine.

The logic programming language PROLOG (Programming en Logique) was conceived by Alain Colmerauer at the University of Aix-Marseille, France, where the language was first implemented in 1973. PROLOG was further developed by the logician Robert Kowalski, a member of the AI group at the University of Edinburgh. This language makes use of a powerful theorem-proving technique known as resolution, invented in 1963 at the U.S. Atomic Energy Commission’s Argonne National Laboratory in Illinois by the British logician Alan Robinson. PROLOG can determine whether or not a given statement follows logically from other given statements. For example, given the statements “All logicians are rational” and “Robinson is a logician,” a PROLOG program responds in the affirmative to the query “Robinson is rational?” PROLOG is widely used for AI work, especially in Europe and Japan.

Researchers at the Institute for New Generation Computer Technology in Tokyo have used PROLOG as the basis for sophisticated logic programming languages. Known as fifth-generation languages, these are in use on nonnumerical parallel computers developed at the Institute.

Other recent work includes the development of languages for reasoning about time-dependent data such as “the account was paid yesterday These languages are based on tense logic, which permits statements to be located in the flow of time. (Tense logic was invented in 1953 by the philosopher Arthur Prior at the University of Canterbury, Christchurch, New Zealand.)


Microworld programs


To cope with the bewildering complexity of the real world, scientists often ignore less relevant details; for instance, physicists often ignore friction and elasticity in their models. In 1970 Marvin Minsky and Seymour Papert of the MIT AI Laboratory proposed that likewise AI research should focus on developing programs capable of intelligent behaviour in simpler artificial environments known as micro-worlds. Much research has focused on the so-called blocks world, which consists of coloured blocks of various shapes and sizes arrayed on a flat surface.

An early success of the micro-world approach was SHRDLU, written by Terry Winograd of MIT. (Details of the program were published in 1972.) SHRDLU controlled a robot arm that operated above a flat surface strewn with play blocks. Both the arm and the blocks were virtual. SHRDLU would respond to commands typed in natural English, such as “Will you please stack up both of the red blocks and either a green cube or a pyramid.” The program could also answer questions about its own actions.Although SHRDLU was initially hailed as a major breakthrough, Winograd soon announced that the program was, in fact, a dead end. The techniques pioneered in the program proved unsuitable for application in wider, more interesting worlds. Moreover, the appearance that SHRDLU gave of understanding the blocks of the micro-world, and English statements concerning it, was in fact an illusion. SHRDLU had no idea what a green block was.

Another product of the micro-world approach was Shakey, a mobile robot developed at the Stanford Research Institute by Bertram Raphael, Nils Nilsson, and others during the period 1968–72. The robot occupied a specially built micro-world consisting of walls, doorways, and a few simply shaped wooden blocks. Each wall had a carefully painted baseboard to enable the robot to “see” where the wall met the floor (a simplification of reality that is typical of the micro-world approach). Shakey had about a dozen basic abilities, such as TURN, PUSH, and CLIMB-RAMP.


Shakey
Shakey
Shakey, a mobile robot, was developed (1966–72) at the Stanford Research Institute, Menlo Park, California. The robot is equipped with a television camera, a range finder, and collision sensors that enable a minicomputer to control its actions remotely. Shakey can perform a few basic actions, such as go forward, turn, and push, albeit at a very slow pace. Contrasting colours, particularly the dark baseboard on each wall, help the robot to distinguish separate surfaces.
Courtesy of SRI International
Shakey
Shakey, a mobile robot, was developed (1966–72) at the Stanford Research Institute, Menlo Park, California. The robot is equipped with a television camera, a range finder, and collision sensors that enable a minicomputer to control its actions remotely.
Shakey can perform a few basic actions, such as go forward, turn, and push, albeit at a very slow pace. Contrasting colours, particularly the dark baseboard on each wall, help the robot to distinguish separate surfaces.
image :Courtesy of SRI International

Critics pointed out the highly simplified nature of Shakey’s environment and emphasized that, despite these simplifications, Shakey operated excruciatingly slowly; a series of actions that a human could plan out and execute in minutes took Shakey days.

The greatest success of the microworld approach is a type of program known as an expert system, described in the next section.


EXPERT SYSTEMS


Expert systems occupy a type of micro-world—for example, a model of a ship’s hold and its cargo—that is self-contained and relatively uncomplicated. For such AI systems every effort is made to incorporate all the information about some narrow field that an expert (or group of experts) would know, so that a good expert system can often outperform any single human expert. There are many commercial expert systems, including programs for medical diagnosis, chemical analysis, credit authorization, financial management, corporate planning, financial document routing, oil and mineral prospecting, genetic engineering, automobile design and manufacture, camera lens design, computer installation design, airline scheduling, cargo placement, and automatic help services for home computer owners.

Knowledge and inference

The basic components of an expert system are a knowledge base, or KB, and an inference engine. The information to be stored in the KB is obtained by interviewing people who are experts in the area in question. The interviewer, or knowledge engineer, organize the information elicited from the experts into a collection of rules, typically of an “if-then” structure. Rules of this type are called production rules. The inference engine enables the expert system to draw deductions from the rules in the KB. For example, if the KB contains the production rules “if x, then y” and “if y, then z,” the inference engine is able to deduce “if x, then z.” The expert system might then query its user, “Is x true in the situation that we are considering?” If the answer is affirmative, the system will proceed to infer z.

Some expert systems use fuzzy logic. In standard logic there are only two truth values, true and false. This absolute precision makes vague attributes or situations difficult to characterize. (When, precisely, does a thinning head of hair become a bald head?) Often the rules that human experts use contain vague expressions, and so it is useful for an expert system’s inference engine to employ fuzzy logic.


DENDRAL
In 1965 the AI researcher Edward Feigenbaum and the geneticist Joshua Lederberg, both of Stanford University, began work on Heuristic DENDRAL (later shortened to DENDRAL), a chemical-analysis expert system. The substance to be analyzed might, for example, be a complicated compound of carbon, hydrogen, and nitrogen. Starting from spectrographic data obtained from the substance, DENDRAL would hypothesize the substance’s molecular structure. DENDRAL’s performance rivaled that of chemists expert at this task, and the program was used in industry and in academia.

MYCIN
Work on MYCIN, an expert system for treating blood infections, began at Stanford University in 1972. MYCIN would attempt to diagnose patients based on reported symptoms and medical test results. The program could request further information concerning the patient, as well as suggest additional laboratory tests, to arrive at a probable diagnosis, after which it would recommend a course of treatment. If requested, MYCIN would explain the reasoning that led to its diagnosis and recommendation. Using about 500 production rules, MYCIN operated at roughly the same level of competence as human specialists in blood infections and rather better than general practitioners

Nevertheless, expert systems have no common sense or understanding of the limits of their expertise. For instance, if MYCIN were told that a patient who had received a gunshot wound was bleeding to death, the program would attempt to diagnose a bacterial cause for the patient’s symptoms. Expert systems can also act on absurd clerical errors, such as prescribing an obviously incorrect dosage of a drug for a patient whose weight and age data were accidentally transposed.

THE CYC PROJECT
CYC is a large experiment in symbolic AI. The project began in 1984 under the auspices of the Microelectronics and Computer Technology Corporation, a consortium of computer, semiconductor, and electronics manufacturers. In 1995 Douglas Lenat, the CYC project director, spun off the project as Cycorp, Inc., based in Austin, Texas. The most ambitious goal of Cycorp was to build a KB containing a significant percentage of the commonsense knowledge of a human being. Millions of commonsense assertions, or rules, were coded into CYC. The expectation was that this “critical mass” would allow the system itself to extract further rules directly from ordinary prose and eventually serve as the foundation for future generations of expert systems.

With only a fraction of its commonsense KB compiled, CYC could draw inferences that would defeat simpler systems. For example, CYC could infer, “Garcia is wet,” from the statement, “Garcia is finishing a marathon run,” by employing its rules that running a marathon entails high exertion, that people sweat at high levels of exertion, and that when something sweats it is wet. Among the outstanding remaining problems are issues in searching and problem solving—for example, how to search the KB automatically for information that is relevant to a given problem. AI researchers call the problem of updating, searching, and otherwise manipulating a large structure of symbols in realistic amounts of time the frame problem. Some critics of symbolic AI believe that the frame problem is largely unsolvable and so maintain that the symbolic approach will never yield genuinely intelligent systems. It is possible that CYC, for example, will succumb to the frame problem long before the system achieves human levels of knowledge.


CONNECTIONISM


Connectionism, or neurons like computing, developed out of attempts to understand how the human brain works at the neural level and, in particular, how people learn and remember. In 1943 the neurophysiologist Warren McCulloch of the University of Illinois and the mathematician Walter Pitts of the University of Chicago published an influential treatise on neural nets and automatons, according to which each neuron in the brain is a simple digital processor and the brain as a whole is a form of computing machine. As McCulloch put it subsequently, “What we thought we were doing (and I think we succeeded fairly well) was treating the brain as a Turing machine.”

Creating an artificial neural network

It was not until 1954, however, that Belmont Farley and Wesley Clark of MIT succeeded in running the first artificial neural network—albeit limited by computer memory to no more than 128 neurons. They were able to train their networks to recognise simple patterns. In addition, they discovered that the random destruction of up to 10 percent of the neurons in a trained network did not affect the network’s performance—a feature that is reminiscent of the brain’s ability to tolerate limited damage inflicted by surgery, accident, or disease.

The simple neural network depicted in the figure illustrates the central ideas of connectionism. Four of the network’s five neurons are for input, and the fifth—to which each of the others is connected—is for output. Each of the neurons is either firing (1) or not firing (0). Each connection leading to N, the output neuron, has a “weight.” What is called the total weighted input into N is calculated by adding up the weights of all the connections leading to N from neurons that are firing.

For example, suppose that only two of the input neurons, X and Y, are firing. Since the weight of the connection from X to N is 1.5 and the weight of the connection from Y to N is 2, it follows that the total weighted input to N is 3.5. As shown in the figure, N has a firing threshold of 4. That is to say, if N’s total weighted input equals or exceeds 4, then N fires; otherwise, N does not fire. So, for example, N does not fire if the only input neurons to fire are X and Y, but N does fire if X, Y, and Z all fire.

section of an artificial neural network
section of an artificial neural network
In the artificial neural network section shown in the figure, the weight, or strength, of each input is indicated by the relative size of its connection. The firing threshold for the output neuron, N, is 4 in this example. Hence, N is quiescent unless a combination of input signals is received from W, X, Y, and Z that exceeds a weight of 4.
Encyclopædia Britannica, Inc.

Training the network involves two steps. First, the external agent inputs a pattern and observes the behaviour of N. Second, the agent adjusts the connection weights in accordance with the rules:

If the actual output is 0 and the desired output is 1, increase by a small fixed amount the weight of each connection leading to N from neurons that are firing (thus making it more likely that N will fire the next time the network is given the same pattern) ;

If the actual output is 1 and the desired output is 0, decrease by that same small amount the weight of each connection leading to the output neuron from neurons that are firing (thus making it less likely that the output neuron will fire the next time the network is given that pattern as input).
The external agent—actually a computer program—goes through this two-step procedure with each pattern in a training sample, which is then repeated a number of times. During these many repetitions, a pattern of connection weights is forged that enables the network to respond correctly to each pattern. The striking thing is that the learning process is entirely mechanical and requires no human intervention or adjustment.

The connection weights are increased or decreased automatically by a constant amount, and exactly the same learning procedure applies to different tasks.


Perceptrons


In 1957 Frank Rosenblatt of the Cornell Aeronautical Laboratory at Cornell University in Ithaca, New York, began investigating artificial neural networks that he called perceptrons. He made major contributions to the field of AI, both through experimental investigations of the properties of neural networks (using computer simulations) and through detailed mathematical analysis. Rosenblatt was a charismatic communicator, and there were soon many research groups in the United States studying perceptrons. Rosenblatt and his followers called their approach connectionist to emphasize the importance in learning of the creation and modification of connections between neurons. Modern researchers have adopted this term.

One of Rosenblatt’s contributions was to generalize the training procedure that Farley and Clark had applied to only two-layer networks so that the procedure could be applied to multilayer networks. Rosenblatt used the phrase “back-propagating error correction” to describe his method. The method, with substantial improvements and extensions by numerous scientists, and the term back-propagation are now in everyday use in connectionism.

Conjugating verbs

In one famous connectionist experiment conducted at the University of California at San Diego (published in 1986), David Rumelhart and James McClelland trained a network of 920 artificial neurons, arranged in two layers of 460 neurons, to form the past tenses of English verbs. Root forms of verbs—such as come, look, and sleep—were presented to one layer of neurons, the input layer. A supervisory computer program observed the difference between the actual response at the layer of output neurons and the desired response—came, say—and then mechanically adjusted the connections throughout the network in accordance with the procedure described above to give the network a slight push in the direction of the correct response. About 400 different verbs were presented one by one to the network, and the connections were adjusted after each presentation.

This whole procedure was repeated about 200 times using the same verbs, after which the network could correctly form the past tense of many unfamiliar verbs as well as of the original verbs. For example, when presented for the first time with guard, the network responded guarded; with weep, wept; with cling, clung; and with drip, dripped (complete with double p). This is a striking example of learning involving generalization. (Sometimes, though, the peculiarities of English were too much for the network, and it formed squawked from squat, shipped from shape, and membled from mail.)

Another name for connectionism is parallel distributed processing, which emphasizes two important features. First, a large number of relatively simple processors—the neurons—operate in parallel. Second, neural networks store information in a distributed fashion, with each individual connection participating in the storage of many different items of information. The know-how that enabled the past-tense network to form wept from weep, for example, was not stored in one specific location in the network but was spread throughout the entire pattern of connection weights that was forged during training. The human brain also appears to store information in a distributed fashion, and connectionist research is contributing to attempts to understand how it does so.

Pages: 1 2 3 4 5


Leave a comment

Design a site like this with WordPress.com
Get started