A brief discussion of the sequence of distinct growth periods of increasing rapidity in human history. • The history of artificial intelligence. • The field’s current capabilities, opinions of experts and contemplating our own ignorance.

Growth modes and big history

The agricultural and industrial revolutions brought about increases in the human population and its density. More importantly, they caused increases in the rate of growth of economic productivity. This has brought us to a current exponential growth rate. A new growth regime could be entered if there was another increase in the rate of growth of the economy. This would lead to an incredible doubling time of two weeks for the world economy.

Contemplating such fast growth has naturally lead to the idea of a singularity, and in our case a technological singularity. Since the term has ‘accreted an unholy aura of techo-utopian connotation’, the book dispenses with the term in favor of ‘intelligence explosion’. This is meant refer to ‘the prospect of machine superintelligence, i.e. the creation of minds that are much faster and more efficient than the familiar biological kind.’

Great expectations and Seasons of hope and despair

Machines matching humans in general intelligence, that is possessing common sense and an effective ability to learn reason and plan to meet complex information processing challenges across a wide range of natural and abstract domains, have been expected since the invention of computers in the 1940s. However, the pioneers of artificial intelligence did not give much lip service or serious thought to the possibility of greater than human AI [1] [2] and any related safety concerns or ethical qualms.

In the summer of 1956 at Dartmouth College, ten scientists sharing an interest in neural nets, automata theory, and the study of intelligence convened with high optimism for a six-week workshop (see Proposal [3]). This Dartmouth Summer Project is often regarded as the cockcrow of artificial intelligence as a field of research and the ensuing period was later described by John McCarthy (the event’s main organizer) as the “Look, Ma, no hands!” era.

During these early days, researchers built systems designed to refute claims of the form “No machine could ever do X!” They created small systems that achieved X in a “microworld” (a well-defined, limited domain that enabled a pared-down version of the performance to be demonstrated), providing a proof of concept. [4] However, the methods that produced successes in the early demonstration systems often proved difficult to extend to a wider variety of problems or to harder problem instances. One reason for this is the “combinatorial explosion” of possibilities that must be explored by methods that rely on something like exhaustive search. [5]

To overcome the combinatorial explosion, one needs algorithms that exploit structure in the target domain and take advantage of prior knowledge by using heuristic search, planning, and flexible abstract representations—capabilities that were poorly developed in the early AI systems. The performance of these early systems also suffered because of poor methods for handling uncertainty, reliance on brittle and ungrounded symbolic representations, data scarcity, and severe hardware limitations on memory capacity and processor speed. By the mid-1970s, there was a growing awareness of these problems. This led to the onset of the first “AI winter”: a period of retrenchment, during which funding decreased and skepticism increased, and AI fell out of fashion.

A new springtime arrived in the early 1980s, when Japan launched its Fifth-Generation Computer Systems Project, a well-funded public–private partnership that aimed to leapfrog the state of the art by developing a massively parallel computing architecture that would serve as a platform for artificial intelligence. [6] The ensuing years saw a hundreds of expert systems being built. Designed as support tools for decision makers, expert systems were rule-based programs that made simple inferences from a knowledge base of facts, which had been elicited from human domain experts and painstakingly hand-coded in a formal language. However, the smaller systems provided little benefit, and the larger ones proved expensive to develop, validate, and keep updated, and were generally cumbersome to use. By the late 1980s, this growth season, too, had run its course. The Fifth-Generation Project and its counterparts in the US and Europe failed to meet their objectives. [7] “AI” became an unwanted epithet among academics, their funders, and private investors looking to fund ventures alike.

By the 1990s, the second AI winter gradually thawed. Optimism was rekindled by new techniques, which seemed to offer alternatives to the traditional high-level symbol manipulation paradigm (often referred to as “Good Old-Fashioned Artificial Intelligence,” or “GOFAI” for short), which had reached its apogee in the expert systems of the 1980s. The new techniques, which included neural networks and genetic algorithms, promised to overcome some shortcomings of the GOFAI approach, in particular the “brittleness” that characterized classical AI programs (which typically produced complete nonsense if the programmers made even a single slightly erroneous assumption).

The new techniques boasted a more organic performance. Most importantly, neural networks could learn from experience, finding natural ways of generalizing from examples and finding hidden statistical patterns in their input. This made the nets good at pattern recognition and classification problems. [8] Neural networks also exhibited the property of “graceful degradation”: a small amount of damage to a neural network typically resulted in a small degradation of its performance, rather than a total crash.

Simple neural networks were were known since the late 1950s but enjoyed a renaissance after the introduction of the backpropagation algorithm, which made it possible to train multilayered neural networks (networks with one or more intermediary ("hidden") layers of neurons between the input and output layers) that were much better than their predecessors. Unlike the ‘rigidly logic-chopping but brittle performance’ of GOFAI systems, neural nets emphasized the importance of massively parallel sub-symbolic processing. This even inspired a new “-ism”, connectionism.

Another approach that helped end the second AI winter was evolution-based methods like genetic algorithms and genetic programming. This approach was widely popularized, though it perhaps had a smaller academic impact. In evolutionary models, a population of candidate solutions (which can be data structures or programs) is maintained, and new candidate solutions are generated randomly by mutating or recombining variants in the existing population. Periodically, the population is pruned by applying a selection criterion (a fitness function) that allows only the better candidates to survive into the next generation. Iterated over thousands of generations, the average quality of the solutions in the candidate pool gradually increases.

This approach can produce efficient solutions for a very wide range of problems, which can be novel and unintuitive. Often looking more like natural structures than anything that a human engineer would design. However, the methods also require skill and ingenuity, particularly in devising a good representational format. Without an efficient way to encode candidate solutions (a genetic language that matches latent structure in the target domain), evolutionary search tends to meander endlessly in a vast search space or get stuck at a local optimum. Even if a good representational format is found, evolution is computationally demanding and is often defeated by the combinatorial explosion.

A major theoretical development over the past twenty years has been a clearer realization of how superficially disparate techniques can be understood as special cases within a common mathematical framework. Many types of neural networks can be viewed as classifiers that perform a particular kind of statistical calculation (maximum likelihood estimation). This perspective allows neural nets to be compared with a larger class of algorithms for learning classifiers ("decision trees," "logistic regression models," "support vector machines," "naive Bayes," "k-nearest-neighbors regression," among others.) In a similar manner, genetic algorithms can be viewed as performing stochastic hill-climbing, which is again a subset of a wider class of algorithms for optimization. Each of these algorithms has its own profile of strengths and weaknesses which can be studied mathematically. [9]

One can view artificial intelligence as a quest to find ways of tractably approximating the ideal Bayesian agent, one that makes probabilistically optimal use of available information. [10] This ideal is unattainable because it is too computationally demanding. [Box 1]

An advantage of relating learning problems is that new algorithms make Bayesian inference more efficient and may yield immediate improvement across many different areas. Advances in Monte Carlo approximation techniques, for example, are directly applied in computer vision, robotics, and computational genetics.