Untangling Complex Systems, page 91
the complement of the first name of the end city (GGCT, for London). The first primer induced
DNA polymerase to copy complements of sequences that had the right start city, and the second
primer initiated the duplication of strands that had the correct end city. PCR proceeds through
thermocycling. Very hot conditions induce the denaturation of double-stranded DNA. Less warm
conditions are favorable for DNA polymerase to duplicate the individual pieces. The result was
that strands with both the right start and end compositions were duplicated at an exponential rate.
On the other hand, strands that had the right start city but not the right end city, or vice versa,
were duplicated much more slowly. DNA sequences that had neither the right start nor the right
end were not duplicated at all. In this manner, the first stage of the flowchart in Figure 13.8 was accomplished. Then, by using the gel electrophoresis technique that separates DNA molecules
having different length through an electric field, Adleman identified those strands that had the
right length (consisting of 24 bases for the example of Figure 13.10). This operation completed the second step of the flowchart. To check if the remaining sequences passed through all the intermediary cities, Adleman used a procedure known as affinity separation. Many copies of the sequence
that is the complementary name of Berlin were attached to microscopic iron beads. These tagged
beads were suspended in the test tube containing the remaining strands, under conditions favorable
for the hybridization reaction. Only those strands that contained the Berlin’s sequence annealed
to the segments bound to the beads. Then, through a magnet placed in contact with the wall of the
test tube, Adleman attracted the beads and could discard the liquid containing strands that did not
have the desired city’s name. Then, he added new solvent and removed the magnet to resuspend the
beads. By heating, he induced the strands to detach from the beads. Next, he reapplied the magnet
to attract the beads to the wall of the test tube. The liquid, which now contained the desired DNA
strands, was recovered for further screening. The process was repeated for the other intermediary
cities, such as Madrid as far as the example of Figure 13.10 is concerned. After finishing the affinity separations, the final step of the flowchart of Figure 13.8 was over, and Adleman was sure that the DNA strands left in the test tube should be those encoding the solution of the Hamiltonian Path
Problem. This result was confirmed by an additional PCR, followed by another gel-electrophoresis
separation. Although the DNA found the solution just in one second, the entire procedure of sepa-
ration and recognition of the right solution required seven days in the laboratory. Therefore, the
Adleman’s idea is brilliant, but the procedure is impractical for solving more massive Hamiltonian
Path Problems (Linial and Linial 1995). In fact, the mass of DNA needed for the computation
would become too big, and the exhaustive search for the right solution would be like to find a
How to Untangle Complex Systems?
467
small needle in an enormous haystack. Moreover, the search may be hindered by the unavoidable
artifacts that PCR can introduce in the amplification stages.
Probably, DNA computing is more promising in vivo, inside living cells. The dream is to imple-
ment a “chemical robot” or what has also been called a “Doctor in a cell” by Ehud Shapiro from
the Weizmann Institute. The “Doctor in a cell” consists of an autonomous molecular computing
device that, after analyzing the surrounding cellular microenvironment, it decides if to release or
not a specific drug (Benenson et al. 2004). For example, an oligonucleotide AND gate has been
engineered to respond to specific microRNA inputs, which are indicators of cancer, by a fluorescent
output. The DNA AND gate has been implanted in alive mammalian cells (Hemphill and Deiters
2013). In other examples, chemical robots have been implemented by exploiting the molecular self-
assembly of DNA that can originate non-arbitrary two- and three-dimensional shapes at the nano-
meter scale, called “scaffolded DNA origami” (Rothemund 2006). For example, nanoscale robots,
based on DNA origami, are capable of dynamically interacting with each other in the cells of a
living cockroach (Amir et al. 2014). The interactions generate logic outputs, which are relayed to
switch molecular payloads on or off. The payload can be a molecule such as a short DNA sequence,
an antibody, or an enzyme. If the payload can activate or inactivate another robot, this will create a
circuit inside a living cell.
RNA is another promising material for computing within living beings. In fact, RNA is involved
in many regulatory networks. Moreover, RNA is structurally more versatile and more thermody-
namically stable than DNA. It displays a structural flexibility and functional diversity similar to
that of proteins, including enzymatic activities. The discovery of diverse RNA functions and the
methods to produce fluorogenic RNA in a cell suggest that small RNAs can work cooperatively,
synergistically or antagonistically to carry out computational logic circuits. With all these advan-
tages, RNA computation is promising, but it is still in its infancy (Qiu et al. 2013).
DNA material has another important application: it could be used as a memory of data, maybe of
the overwhelming Big Data. Life on Earth teaches us that DNA is an alluring material for informa-
tion storage. In fact, a whole genome that fits into a microscopic cell contains all the instructions
required by a living being to survive. Moreover, if the genome is held in dry and dark conditions,
it can remain almost intact hundreds, thousands, and even millions of years (Pääbo et al. 2004). So,
DNA is ultracompact and long-lasting storage material. Scientists have been storing digital data in
DNA since 2012. That was when George Church and his colleagues encoded a 53,426-word book
using less than a picogram of DNA (Church et al. 2012). For encoding the digital file, researchers
have divided it into tiny blocks of data and converted these data into DNA’s four-letter alphabet. They
encoded one bit per base: A or C for zero, G or T for one. Each DNA fragment was replicated thou-
sands of times. Since errors in synthesis and sequencing are rarely coincident, each molecular copy
corrects errors in the other copies. Moreover, each DNA fragment contained a digital “barcode” that
recorded its location in the original file. Reading the data required a DNA sequencer and a computer
to reassemble all of the fragments in order and convert them back into digital format. Density, stabil-
ity, and energy efficiency are all potential advantages of DNA storage (Extance 2016), while costs
and times for writing and reading are currently impractical. So, the DNA storage approach is not
promising if data are needed instantly, but it would be better suited for archival applications.
TRY EXERCISE 13.1
13.3.1.4 Evolutionary Computing
Biological evolution is the change in life forms that occurred over the last four billion years, and
that continues to happen on Earth. Darwin (1859) supposed that the changes were gradual, but the
fossil record also suggests that long periods of no change in the morphology of organisms have been
punctuated by relatively short periods of significant change, with the disappearance of some spe-
cies and the emergence of new ones (Eldredge and Gould 1972). The evolution of biological forms
468
Untangling Complex Systems
Self-organizing
living matter in
its original
form
Slow changes of the
environmental conditions
Global historical events
induce epigenetic events
e.g. impact of large meteorites
Local historical events
e.g. random genetic mutations
induce changes
Self-organizing
living matter in its
new form
Natural
selection
It survives in
OR
It dies
the offspring
FIGURE 13.12 The main factors contributing to the evolution of biological forms.
is both gradual and sudden, and it depends on more than one factor (see Figure 13.12). First, every living being has the power of self-organizing, of creating order within itself, with the tendency
of becoming always more complex (Kauffmann 1993). The evolution towards new forms may be
induced by the slow changes of the environmental conditions that trigger epigenetic events: some
genetic switches are turned entirely or partly off, and some others are completely or partially turned
on. However, the evolution may also be induced by either global historical events (such as the impact
of large meteorites on Earth) that provoke significant changes in the environment, or local histori-
cal events such as “random” genetic mutations that are sporadic and impossible to predict. The new
biological forms are tested continuously on their power of fitting to their ecosystem through natural
selection: those that are successful in surviving and reproducing continue to exist through its off-
spring. Sexual reproduction favors the variability because the offspring have a genetic code that is
the mixture of those of the parents.
The idea of imitating the basic features that govern the natural evolutionary processes in com-
puter science came onto the scene by the early 1960s. Such research is known as Evolutionary
Computation. There are different mainstreams, but the most known is Genetic Algorithm (GA)
proposed by Holland (1975). The input to a GA consists of a population of candidate programs.
A candidate program is written as strings of bits, numbers or symbols to accomplish a task. The
programs can be generated randomly. Each program is tested on how well it works on the desired
task by determining a fitness value. The programs that have the highest fitness values are selected
to be parents of the next generation. The offspring is generated by merging parts of the parents. The
new programs are then tested in their fitness, and the entire procedure is repeated over and over
again until we get satisfied with the achieved fitness values. Evolutionary Computation and Genetic
Algorithms have been used to solve problems in many scientific and engineering areas, as well as in
art and architecture (Beasley 2000).
13.3.1.5 Artificial Immune Systems
Our immune system has an innate and an adaptive component (read Box 13.1 in Chapter 12 for more
details), which has a diverse set of cells, molecules, and organs that work in concert to protect our
body. Our immune system has some attributes that are worthwhile mimicking. First, the pattern
How to Untangle Complex Systems?
469
recognition ability of the receptors. A healthy immune system is very effective in distinguishing
harmful substances from the body’s own components and directing its actions accordingly. Second,
the immune subsystems, especially the adaptive one, exhibit the ability to learn and remember
previously encountered pathogens. When a new pathogen appears, the Immune System learns how
to defeat it (the primary response). The next time that pathogen is encountered, a faster and often
more aggressive response is elicited (the secondary response). Finally, the immune system is spread
throughout the body and acts collectively without any central controller.
Artificial immune systems are designed by following a procedure (de Castro and Timmis 2002)
that requires the definition of the (1) representation, (2) affinity measure, and (3) immune algorithm.
The first step is the selection of an abstract representation of the components of the system, consid-
ering the application domain. Any immune response requires the shape recognition of an antigen
by the cell receptors. For modeling this shape recognition process, the concept of shape-space has
been introduced (Perelson and Oster 1979). The shape-space approach assumes that the properties
of the receptors and antigens can be represented by a data structure. Most often, the data structure
is an attribute string, which can be a real-valued vector, a binary string, et cetera (de Castro 2007).
The affinity of a receptor towards either an antigen or an element of the organism is measured by
calculating the distance between the two corresponding attribute strings in the shape-space. Finally,
an immune-inspired algorithm is selected. Various types of immune-inspired algorithms exist. The
choice must be taken considering the problem that has to be solved. For example, clonal selection-
based algorithms attempt to capture mechanisms of the antigen-driven proliferation of B-cells,
which induce an improvement in their binding abilities. Clonal selection algorithms capture the
properties of learning, memory, adaptation, and pattern recognition (Timmis et al. 2008). Other
examples are the immune network algorithms (Jerne 1974) that view the immune system as a regu-
lated network of molecules and cells that recognize each other. Antibodies and cells are nodes of the
network, and the training algorithm determines the growth or the prune of edges between the nodes,
based on their mutual affinities. The immune network acts in a self-organizing manner and gener-
ates memory effects. Immune network algorithms have been used in clustering, data visualization,
control, and optimization domains. In general, the theory of artificial immune systems finds broad
applicability in machine-learning, pattern recognition, security of information systems, and data
analysis (de Castro 2007).
13.3.1.6 Cellular Automata
A cellular automaton is a mathematical model of a dynamical system that is discrete in both space
and time. In fact, a d-dimensional cellular automaton is a d-dimensional grid of cells, each of which can take on a value from a finite set of integers (most often, just two values, 1 or 0, on or off). The
value of each cell at time step t + 1 is a function of the values of a small local neighborhood of cells
at time t. The cells update their state simultaneously according to a given local rule. The idea of
cellular automata came to John von Neumann, back in the 1940s, based on a suggestion made by
his colleague, the mathematician and nuclear physicist Stanislaw Ulam (both of them participated
to the Manhattan Project during the Second World War). von Neumann was trying to implement the
self-reproduction phenomenon, which is peculiar to the living matter, in machines, and he thought
to succeed by inventing cellular automata. In fact, cellular automata are examples of mathematical
models constructed from many simple identical components that, together, are capable of complex
behavior. This instance has been demonstrated by the physicist Stephan Wolfram (2002) by using
the simplest possible form that is one-dimensional, two-state cellular automaton. It consists of a line
of sites, with each site carrying a value ( ai) 0 or 1, white or black, dead or alive. The value ai of the site at each position i is updated in discrete time steps according to an identical deterministic rule
that is a function of the values ak of the neighbors:
a( t+ )1
( t)
( t)
( 1, 1)
i
= f ai− ai+ [13.3]
470
Untangling Complex Systems
Class 1
Class 2
0
5
5
10
10
15
15
20
20
25
25
30
30
35
35
40
40
45
45
50
50
Class 3
Class 4
5
5
10
10
15
15
20
20
25
25
30
30
35
35
40
40
45
45
50
50
FIGURE 13.13 Examples of the four classes of spatiotemporal patterns generated by one-dimensional cel-
lular automata following deterministic rules. These patterns have been achieved by using “ElementaryCAs”
model, Complexity Explorer project, http://complexityexplorer.org.
Different cellular automaton rules yield very different patterns. Different initial states with a par-
ticular cellular automaton rule yield patterns that differ in the details but are similar in form and
statistical features. All the possible patterns can be grouped into four classes (see Figure 13.13).
• Class 1: Independently of the initial configurations, all cells in the lattice switch to the off
state and stay that way, indefinitely. The final pattern is spatially homogeneous.
• Class 2: The final patterns are uniform or periodic, depending on the initial configuration.
• Class 3: Aperiodic and chaotic behavior.
