Artificial Intelligence
United States Patent

Date of Patent: Pending

Reser

ARTIFICIAL INTELLIGENCE SOFTWARE PROGRAMED TO SIMULATE MENTAL CONTINUITY BETWEEN
PROCESSING STATES

Inventor: Dr. Jared Edward Reser, 16380 Meadow Ridge Road Encino, CA 91436

ABSTRACT
A modular, hierarchically organized, artificial intelligence (AI) system that features reciprocating
transformations between a working memory updating function and an imagery generation system is
presented here. This system features an imagery guidance process implemented by a multilayered neural
network of pattern recognizing nodes. Nodes low in the hierarchy are trained to recognize and represent
sensory features and are capable of combining individual features or patterns into composite, topographical
maps or images. Nodes high in the hierarchy are multimodal, module independent, and have a capacity for
sustained activity allowing the maintenance of pertinent, high-level features through elapsing time. The higher-
order nodes select new features from each mapping to add to the store of temporarily maintained features.
This updated set of features, that the higher-order nodes maintain, are fed back into lower-order sensory
nodes where they are continually used to guide the construction of successive topographic maps.



TECHNICAL FIELD
The present invention relates to the field of software for computers and related devices, that aims to emulate
human intelligence using artificial intelligence and, in particular, artificial neural networks. The present
inventive method for simulating human intelligence involves the emulation of the mammalian cerebral cortex
utilizing a system of pattern recognizing nodes for selecting priority stimulus features, temporarily maintaining
these features in a limited-capacity working memory store, and allowing them to direct imagery generation as
long as they remain active. Properly integrated with existing AI technology, this method should enhance the
capabilities of problem solving agents with respect to pattern recognition, analytics, prediction, adaptive
control, decision making, and response to query. Overall, the invention is designed to provide an innovative
cognitive architecture for developing internal representations, updating these over time and utilizing them to
inform adaptive behavior.  



BACKGROUND OF THE INVENTION

Information Processing in the Mammalian Neocortex
The structure of the cerebral cortex is highly repetitive and is marked by the employment of millions of nearly
identical structures called cortical columns (Lansner, 2009). Columns are composed of closely connected
neural cell bodies and span the six layers of grey matter in the neocortex. These columns are virtually
interchangeable, as most columns share the same basic structure, and are thought to employ the same
cortical algorithm (Fuji et al., 1998). There are about 500,000 cortical columns in the average human cortex,
each occupying a space about two millimeters high and a half millimeter wide and each containing around
60,000 neurons (Lansner, 2009). Each column has its own inputs and outputs, and each performs neural
computation to determine if its inputs from other columns are sufficient to activate its outputs to other columns
(Rochester et al., 1956). Columns and other similar groups of neurons with the same tuning properties are
often referred to as cell assemblies, and this term will be used here.  Most neurons in an assembly share the
same receptive field, and thus even though they may play different roles within the assembly they contribute to
the assembly’s ability for encoding a unitary feature (Moscovich et al., 2007). Such assemblies of neurons are
thought to embody a stable microrepresentation or fragment of long term memory or previous experience. All
of the millions of pattern recognizers in the neocortex are simultaneously considering their inputs, and
continually determining whether or not to fire. In general, when a neuron or assembly fires, the pattern that it
represents has been recognized. Assemblies, like the neurons that compose them, function as “coincidence
detectors” or “pattern recognition nodes” (Fuji et al., 1998). The cortex is only one pattern recognizer high,
however, the hierarchy is structured horizontally across the surface of the cortex. The hierarchy is created by
horizontal movement of information between cortical assemblies.

Assemblies in lower-order sensory areas identify sensory features from the environment and combine them
into composite representations that mirror the geometric, and topographic orientations present in the sensory
input. The early visual system uses retinotopic maps that are organized with a geometry that is identical to that
used in the retina, and the auditory system uses tonotopic maps, where the mapping of stimuli is organized
by tone frequency (Moscovich, 2007). Early sensory areas create topographic mappings from patterns
recognized in the external environment, but also combine top-down inputs from higher association cortex into
internally-derived imagery as well (Damasio, 1989). This internally-derived imagery, such as that seen in the
“mind’s eye” is also topographically organized because it is created by the same lower-order networks. As you
move up the neocortical hierarchy, from posterior sensory areas to anterior association areas, assemblies
code for patterns that are more abstract. This is because higher-order assemblies have larger receptive
fields, retain features from larger spatial areas, and involve longer stretches of time (Fuster, 2009). Because
cortical assemblies are essentially pattern recognition nodes organized in a hierarchical system, they should
be able to be modeled by computers. The best way to do this with modern technology is to use an artificial
neural network.

Information Processing in Artificial Neural Networks

An artificial neural network is an interconnected group of artificial neurons that uses a mathematical or
computational model for information processing based on a connectionistic approach to computation. It is
generally an adaptive system capable of complex global behavior, that alters its own structure based on the
nonlinear processing of either external or internal information that flows through the network. The concept of a
neural network seems to have been first proposed by Alan Turing in his 1948 paper “Intelligent Machinery.”
Neural networks are usually software, generally require a massively parallel, distributed computing
architecture, and are ordinarily run on conventional computers (Russel et al., 2003). The neural network
ordinarily achieves intelligent behavior through parallel computations, without employing formal rules or
logical structures, and thus can be used for pattern matching, classification, and other non-numeric,
nonmonotonic problems (Nilsson, 1998).

The traditional neural network is a multilayer system composed of computational elements (nodes) and
weighted links (arcs). These networks are based on the human brain where the nodes are analogous to
neurons, or neural assemblies, and the arcs are analogous to axons and dendrites. Each node receives
signals from specific other nodes, processes these signals and then decides whether to “fire” at the nodes
that it sends output to. The first basic artificial neuron was described by McCullough and Pitts (1943) and has
a number of excitatory inputs whose weights can range between 0 and 1 and inhibitory inputs whose weights
range between -1 and 0. Each of the incoming inputs and its corresponding network weight are summed to
equal an activity level. If this activity level exceeds the neurons’s firing threshold, it will cause the neuron to fire.
The neuron can be made to learn from its experience, it processing activity causes either the threshold or
weights to be changed. Neural networks are typically defined by three types of parameters: 1) The
interconnection pattern between different layers of neurons; 2) The learning process for updating the weights
of the interconnections; 3) The activation function that converts a neuron’s weighted input to its output
activation.

Appropriate Neural Network Parameters For the Present Device

A network is “trained” to recognize a pattern by adjusting arc weights in a way that most efficiently leads to the
desired results. Arcs contributing to the recognition of a pattern are strengthened and those leading to
inefficient or incorrect outcomes are weakened. The network “remembers” individual patterns and uses them
when processing new data. Neural learning adjustments are driven by error or deviation in the performance of
the neuron from some set goal. The network is provided with training examples, which consist of a pattern of
activities for the input units, along with the desired pattern of activities for the output units. The actual output of
the network is contrasted with the desired output resulting in a measure of error. Connection weights are
altered so that the error is reduced and the network is better equipped to provide the correct output in the
future. Each weight must be changed by an amount that is proportional to the rate at which the error changes
as the weight is changed, an expression called the “error derivative for the weight.” In a network that features
back propagation the weights in the hidden layers are changed beginning with the layers closest to the output
layer, working backwards toward the input layer. Such backpropagating networks are commonly called
multilayer perceptrons (Rosenblatt, 1958). The present invention involves a number of multilayered neural
networks connected to each other, each using their own training criteria for backpropagated learning. For
instance the visual perception module would be trained to recognize visual patterns, the auditory perception
module would be trained to recognize auditory patterns, and the PFC module would be trained to recognize
goal-related patterns.

The hierarchical multilayered network, the neocognitron, was first developed by K. Fukushima (1975). This
system and its descendants are based on the visual processing theories of Hubel and Wiesel and form a
solid archetype for the present device because they feature multiple types of cells and a cascading structure.
Popular neural network architectures with features that could be valuable in programming the present device
include the adaptive resonance theory network (Carpenter & Grossberg), the Hopfield network, the Neural
Representation Modeler, the restricted coulomb energy network, and the Kohonen network. Teuvo Kohonen
(2001) showed that matrix-like neural networks can create localized areas of firing for similar sensory
features, which result in a map-like network where similar features were localized in close proximity and
discrepant ones were distant. This type of network uses a neighborhood function to preserve the topological
properties of the input space, and has been called a self-organizing map. This kind of organization would be
necessary for the present device to accomplish imagery generation, and would contribute to the ability of the
lower-order nodes in the sensory modules to construct topographic maps.

A neural network that uses principal-components learning uses a subset of hidden units that cooperate in
representing the input pattern. Here, the hidden units work cooperatively and the representation of an input
pattern is distributed across many of them. In competitive learning, in contrast, a large number of hidden units
compete so that a single hidden unit is used to represent a particular input pattern. The hidden unit that is
selected is the one whose incoming weights most closely match the characteristics of the input pattern. The
optimal method for the present purposes lies somewhere between purely distributed and purely localized
representations. Each neural network node will code for a discrete, albeit abstract pattern, and compete
among each other for activation energy and the opportunity to contribute to the depiction of imagery. However,
multiple nodes will also work together cooperatively to create composite imagery.

When active, high level nodes signal each of the low level nodes that they connect with, they are in effect,
retroactivating them. They are activating those that recently contributed to their activity, and activating previously
dormant ones as well. This retroactivation of previously dormant nodes constitutes a form of anticipation or
prediction, indicating that there is a high likelihood that the pattern that these nodes code for will become
evident (prospective coding). This kind of prediction is best achieved by a hierarchical hidden Markov model.
Utilizing Markov models, and their predictive properties will be necessary. This process is used in Ray
Kurzweil’s Pattern Recognition Theory of Mind (PRTM) model, which uses a hidden Markov model and a
plurality of pattern recognition nodes for its cognitive architecture (Kurzweil, 2012).  Hierarchical temporal
memory (HTM) is another cognitive architecture that models some of the structural and algorithmic properties
of the neocortex (Hawkins & Blakeslee, 2005). The hope with PRTM and HTM is that a neural network with
enough nodes and sufficient training should be able model high-order human abstractions. However,
distilling such abstractions and utilizing them to make complex inferences may necessitate an imagery
guidance mechanism with a working memory updating function.

In some neural networks, the activation values for certain nodes are made to undergo a relaxation process
such that the network will evolve to a stable state where large scale changes are no longer necessary and
most meaningful learning can be accomplished through small scale changes. The capability to do this, or to
automatically prune connections below a certain connection weight would be beneficial for the present
purposes. It is also important to preserve past training diversity so that the system does not become
overtrained by narrow inputs that are poorly representative.

The present invention could be significantly refined through the implementation of genetic algorithms that
could help to select the optimal ways to fine-tune the model and set the parameters controlling the
mathematics of things such as the connectivity, the learning algorithms, and the extent of sustained activity.
Evolutionary algorithms or genetic algorithms expose a large group of programmed “candidate solutions” to
evolutionary or selective forces in order to refine them. They attempt to generate solutions to optimization
problems using techniques inspired by natural evolution, such as inheritance, mutation, selection and
crossover.

It might also be beneficial to implement a rule-based approach, where a core set of reliable rules are coded
and used to influence decision making and goal prioritization. Many theorists agree that combining neural
network, and more traditional symbolic approaches will better capture the mechanisms of the human mind. In
fact, implementing symbolic rules to instantiate processing priorities could help the higher-order nodes to
account for goal-relevance.


BACKGROUND ART

The field of AI research is involved in creating a computing system that is capable of emulating certain
functions that are traditionally associated with intelligent human behavior. Most early AI systems were only
capable of responding in the manner in which human programmers provided for when the program was
written.  It became recognized that it would be valuable to have a computer which does not respond in a
preprogrammed manner.  AI systems capable of adaptive learning have since become important. There are
many different methods through which to construct such an apparatus. For example, see U.S. Patent
8,346,699 by Czora and U.S. Patent 6,738,753 by Hogan.

Neural networks have attempted to get around the programming problem by using layers of artificial neurons
or nodes (programming constructs that mimic the properties of biological neurons). Neural networks that
have been developed to date are largely software-based. For example, see U.S. Patent 7,103,452 by Retsina
and U.S. Patent 7,069,256 by Campos. Neural networks and genetic algorithms are widely implemented in
research and industry for their capabilities involving adaptive learning and advanced pattern recognition.  
However, they are used for processing tasks that are narrowly constrained and highly specialized, and there
has not yet been any strong form of intelligence derived from them. There are currently no neural networks, or
AI systems whatsoever, that are structured to model an analogue of the primate prefrontal cortex in order to
guide the progressive generation of successive topographic maps. Because there is no software structured
around identifying potentially goal-relevant information and holding it online to inform reciprocal cycles of
imagery generation and feature extraction for the purpose of systemizing the environment, current AI is rather
limited in scope and utility.

The current objective is to create an agent that through supervised or unsupervised feedback can progress to
the point where it takes on emergent cognitive properties and becomes a general problem solver or inference
program capable of goal-directed reasoning, backwards chaining, and performing means-end analyses. The
present device should constitute a self-organizing cognitive architecture capable of dynamic knowledge
acquisition, inductive reasoning, dealing with uncertainty, high predictive ability and low generalization error. It
will be able to find meaningful patterns in complex data and improve its performance by learning. It should
also be capable of autoassociation (the ability to recognize a pattern even though the entire pattern is not
present) and perceptual invariance (generalizing over the style of presentation such as visual perspective or
font).


BACKGROUND OF THE INVENTION

The Artificial PFC: Continuity Through Sustained Activity

Typical AI systems are designed to perceive the environment, evaluate objects therein, select an action, act,
and record the action, along with its efficacy and the results thereof to memory. There are no forms of artificial
intelligence that do this using a succession of maps guided by a continually updating buffer of salient
features. The present invention will do this with a novel information processing approach based on the
architecture of the human brain, but implemented with available computer hardware and input/output devices.
To create a strong form of AI it is necessary to have an understanding of what is taking place that allows
intelligence, thought, cognition, consciousness or working memory to move through space and time, or in
another word, to “propagate.” Such an understanding must be grounded in physics because it must explain
how the physical substrate of intelligence operates through space and time (Chalmers, 2010). The human
brain is just such an intelligent physical system that AI researchers have attempted to understand and
replicate using a biomimetic approach (Gurney, 2009). Features of the biological brain have been key in the
evolution of neural networks, but the brain holds other information processing principles that have not been
harnessed by A.I. efforts.

The human prefrontal cortex (PFC) is thought to be instrumental in cognitive control, and the ability to
orchestrate thought and action in accordance with internal goals. Cognitive control stems from the active
maintenance of patterns of activity in the PFC that represent goal-relevant features. These temporarily
maintained representations bias other processing in order to ensure that goals are properly attended to. The
mammalian brain, and especially the human prefrontal cortex (PFC), has neurons that are capable of
“sustained firing,” allowing them to generate action potentials at elevated rates for several seconds at a time
(generally 1-30 seconds) (Fuster, 2009). In the mammalian brain, prolonged firing of certain assemblies of
neurons in the PFC allows the maintenance of specific features, patterns, and goals (Baddeley, 2007). The
temporary persistence of these patterns ensures that they continue to transmit their effects on network
weights as long as they remain active. In contrast, neurons in most other brain areas, including sensory areas
only remain active for a few milliseconds unless sustained PFC input makes their continued activity possible
(Fuster, 2009).

Because the activity in PFC cells is sustained, and does not fade away before the next instantiation of activity,
there is a temporally dynamic and overlapping pattern of neural activity that makes possible the psychological
juggling of information in working memory (Reser, 2012). Thus the human brain is an information processing
system that has the ability to maintain a large list of representations that is constantly in flux as new
representations are constantly being added, some are being removed and still others are being maintained. It
is this distinct pattern of activity that allows consciousness by creating both spatial and temporal continuity
between processing states. Continuity is defined as being uninterrupted in time. The pattern of activity in the
brain is constantly changing, but because some individual neurons persist during these changes, particular
features of the overall pattern will be uninterrupted or conserved over time. The present device will be
constructed to mimic this biological system.

Computational operations, that take place as a computer implements lines of code to transform input into
output, have discrete starting and stopping points. For this reason computers do not have temporal continuity
in their information processing. The sustained activity of prioritized features in the brain is staggered and
overlapping, insuring that human thought features a continuous cascade of cognitive elements that persist
through time. Thus there is no objective stopping or starting point of thought. Instead, thought itself is
composed of the startings and stoppings of huge numbers of individual elements that, when combined,
create a dynamic and continuous whole (Reser, 2012). See figure 7 for example. If this sustained firing did not
happen at a neural level, humans, like some lower animals, would have far less mental continuity over
elapsing time. Instantaneous mental states would be discrete and because information could not be carried
over to subsequent states, the ability to process or make associations between temporally distant stimuli
would be impaired. This is why the prefrontal cortex is associated with working memory, executive function,
mental modeling, planning and goal setting. The most enduring PFC neurons correspond to what the
individual is most focused on, the underlying theme or element that stays the same as other contextual
features fluctuate. It is currently not possible to engineer the human brain in a way that increases the number
and duration of active higher-order representations. However, in a biomimetic instantiation it would be fairly
easy to increase the number and duration of simultaneously active higher-order representations.
Accomplishing this would allow the imagery that is created to be informed by a larger number of concerns,
and would ensure that important features were not omitted simply due to the fact that their activity could not be
sustained due to biological limitations.

The Neocortex: Reciprocating Crosstalk between Association and Sensory Cortex

The higher-order features that are maintained over time by sustained neural firing are used to create and
guide the construction of mental imagery (Reser, 2012). The brain’s connectivity allows reciprocating cross-
talk between fleeting bottom-up imagery in early sensory cortex and lasting top-down priming in late
association cortex and the PFC. This process allows humans to have progressive sequences of related
thoughts, where thinking is based heavily on lower order sensory areas and the topographic mappings that
they generate in order to best represent a set of higher-order features. In a sense the higher, and lower order
areas are constantly interrogating each other, and providing one another with their expert knowledge. For
instance, the higher-order areas have no capacity to foresee how the specifications that they hold will be
integrated into metric imagery. Also, the images created by lower-order nodes must introduce other,
unspecified features into the imagery that it builds and this generally provides the new content for the stream
of thought. For example, if higher order nodes come to hold features supporting the representations for “pink,”
“rabbit,” and “drum,” then the subsequent mappings in lower-order visual nodes may activate the
representations for batteries, and the auditory nodes may activate the representation for the word “Energizer
bunny.” The central executive (the PFC and other association areas) direct progressive sequences of mental
imagery in a number of topographic sensory and motor modules including the visuospatial sketchpad, the
phonological (articulatory) loop and the motor cortex. This model frames consciousness as a polyconceptual,
partially-conserved, progressive process, that performs its high-level computations through “reciprocating
transformations between buffers.” More specifically, it involves reciprocating transformations between a
partially conserved store of multiple conceptual specifications and another nonconserved store that integrates
these specifications into veridical, topographic representations.

SUMMARY OF THE INVENTION

It is an object of the present invention to simulate human intelligence by emulating the mammalian fashion for
selecting priority stimuli, holding these stimuli in a working memory store and allowing them to temporarily
direct imagery generation before their activity fades.
It is an object of the present invention to enhance AI data processing, decision making, and response to query.
Briefly, a known embodiment of the present invention is a software using neural networks that models a large
set of programming constructs or nodes that work together to continually determine, in real time, which from
their population should be newly activated, which should be deactivated and which should remain active over
elapsing time to form the “stream” or “train” of thought.
An advantage of the present invention is that a computer can be caused to develop a simulated intelligence.

Another advantage of the present invention is that it will be easier and more natural to use a computer or
computerized machine.

A third advantage of the present invention is that it will be readily implemented using available computer
hardware and input/output devices.

These and other objects and advantages of the present invention will become clear to those skilled in the art
in view of the invention, and the industrial applicability thereof, as described herein and as illustrated in the
several figures of the drawing. The objects and advantages listed are not an exhaustive list of all possible
advantages of the invention. Moreover, it will be possible to practice the invention even where one or more of
the intended objects and/or advantages might be absent or not required in the application. Further, those
skilled in the art will recognize that various embodiments of the present invention may achieve one or more,
but not necessarily all, of the above described objects and advantages. Accordingly, the listed advantages are
not essential elements of the present invention, and should not be construed as limitations.


BRIEF DESCRIPTION OF THE DRAWINGS




FIG.1 is a diagram depicting how high-level features are displaced, newly activated, and coactivated in the
neural network to form a “stream” or “train” of thought. Each feature is represented by a letter. 1) Shows that
feature A has already been deactivated and that now B, C, D and E are coactivated. When coactivated, these
features spread their activation energy resulting in the convergence of activity onto a new feature, F. Once F is
active it immediately becomes a coactivate, restarting the cycle. 2) Shows that feature B has been deactivated,
that C, D, E and F are coactivated, and G is newly activated. 3) Shows that feature D, but not C has been
deactivated. In other words, what is deactivated is not necessarily what entered first, but what has proven,
within the network, to receive the most converging activity. C, E, F and G coactivate and converge on H causing
it to become active.




























FIG 2. is a diagram depicting the reciprocal transformations of information between lower-order sensory
nodes and higher-order PFC nodes. Sensory areas can only create one sensory image at a time, whereas the
PFC is capable of holding the salient or goal-relevant features of several sequential images at the same time.
































FIG. 3 is a diagram depicting the behavior of features that are held active in the PFC. 1) Shows that features B,
C, D and E which are held active in the PFC all spread their activation energy to lower-order sensory areas
where a composite image is built that is based on prior experience with these features. 2) Shows that
features involved in the retinotopic imagery from time sequence 1 converge on the PFC neurons responsible
for feature F. Feature B drops out of activation, and C, D, E and F remain active and diverge back onto visual
cortex. 3) Shows that this same process leads to G being activated and D being deactivated.
































FIG. 4. is a list of processes involved in the central AI algorithm implemented by the present device.
1)        Either sensory information from the environment, or top-down internally held specifications, or both are
sent to low-order sensory neural network layers that contain feature extracting cells. This includes either
feedforward sensory information from sense receptors (experiential perception) or from downstream
retroactivation from higher-level nodes (internally guided imagery).
2)        A topographic sensory map is made by each low-order, sensory neural network. These topographic
maps represent the networks best attempt at integrating and reconciling the disparate stimulus and feature
specifications into a single composite, topographic depiction. The map that is created is based on prior
probability and training experience with these features.
3)        In order to integrate the disparate features into a meaningful image, the map making neurons will
usually be forced to introduce new features. The salient or goal-relevant features that have been introduced
are extracted through a perceptual process where active, lower-order nodes spread their activity to higher-
order nodes. As the new features pass through the neural networks, some are given priority and are used to
update the limited-capacity, working memory, storage buffer that is composed of active high-level nodes.
4)        Salient features that cohere with features that are already active in the higher-order nodes are added to
the active features there. The least relevant, least fired upon features in higher-order areas are dropped from
activation. The sustained firing of a subset of higher-order nodes allows the important features of the last few
maps to be maintained in an active state.
5)        At this point it is necessary for the system to implement a program that allows it to decide if it will
continue operating on the previously held nodes or redirect its attention to the newly introduced nodes. Each
time the new features garnered from the topographic maps are used to update the working memory store, the
agent must decide what percentage of previously active higher-order nodes should be deactivated in order to
reallocate processing resources to the newest set of salient features. Prior probability with saliency training
will determine the extent to which previously active nodes will continue to remain active.
6)        The updated subset of higher-order nodes will then spread its activity backwards toward lower-order
sensory nodes in order to activate a different set of low-order nodes culminating in different topographic
sensory map.
7)        A. The process repeats.
B. Salient sensory information from the actual environment interrupts the process. The lower-order nodes and
their imagery, as well as the higher-order nodes and their priorities, are refocused on the new incoming
stimuli.



FIG 5. Demonstrates the architecture of the interfacing neural networks.





























FIG 7. Depicts an octopus within a brain in an attempt to communicate how continuity is made possible in the
brain and the in present device. When an octopus exhibits seafloor walking, it places most of its arms on the
sand and gradually repositions arms in the direction of its movement. Similarly, the mental continuity exhibited
by the present device is made possible because even though some representations are constantly being
newly activated and others deactivated, a large number of representations remain active together. This
process allows the persistence of “cognitive” content over elapsing time, and thus over machine processing
states.























DETAILED DESCRIPTION OF THE INVENTION

The device is a modular, hierarchically organized, artificial intelligence (AI) system that features reciprocating
transformations between a working memory updating function and an imagery generation system. This device
features a recursive, algorithmic, imagery guidance process to be implemented by a multilayered neural
network of pattern recognizing nodes. The software models a large set of programming constructs or nodes
that work together to continually determine, in real time, which from their population should be newly activated,
which should be deactivated and which should remain active to best inform imagery generation.

The device necessitates a highly interconnected neural network that features a hierarchically organized
collection of pattern recognizers capable of both transient and sustained activity. These pattern recognition
nodes mimic assemblies (minicolumns) of cells in the mammalian neocortex and are arranged with a similar
connection geometry. Like neural assemblies the nodes exhibit a continuous gradient from low-order nodes
that code for sensory features, to high-order nodes that code for temporally or spatially extended relationships
between such features. The lower order nodes are organized into modules by sensory modality. In each
module, nodes work both competitively and cooperatively to create topographic maps. Nodes are grouped
according to the feature they are being trained to recognize. These maps can be generated by external input,
by internal input from higher-order nodes, or a mix of the two. The architecture will feature backpropagation,
self-organizing maps, bidirectionality, Hebbian learning as well as a combination between principal-
components learning and competitive learning. The program will have an embedded processing hierarchy
composed of many content feature nodes between the input modalities and its output functions.

Nodes lower in the hierarchy are trained to recognize and represent sensory features and are capable of
combining individual features or patterns into metric, topographical maps or images. Lower-order nodes are
unimodal, and organized by sensory modality (visual, auditory, etc.) into individual modules. Nodes high in the
hierarchy are multimodal, module independent, and have a capacity for sustained activity allowing the
conservation of pertinent, high-level features through elapsing time. The higher nodes are integrated into the
architecture in a way that makes them capable of identifying a plurality of goal-relevant features from both
internal imagery and environmental input, and temporarily maintaining these as a form prioritized information.
The system is structured to allow repetitive, reciprocal interactions between the lower, bottom-up, and higher,
top-down nodes. The features that the higher nodes encode are utilized as inputs that are fed back into lower-
order sensory nodes where they are continually used for the construction of successive topographic maps.
The higher nodes select new features from each mapping to add to the store of temporarily maintained
features. Thus the most salient or goal-relevant features from the last several mappings are maintained. The
group of active, higher-order nodes is constantly updated, where some nodes are newly added, some are
removed, yet a relatively large number are retained. This updated list is then used to construct the next
sensory image which will be necessarily similar, but not identical to, the previous image. The differential,
sustained activity of a subset of high-order nodes allows thematic continuity to persist over sequential
processing states.

All of the nodes within the device function as a continuous whole and are highly interconnected, but can be
decomposed into separate but more closely connected neural networks. These various modules have a
specific connectional architecture that is depicted in Figure 5. These networks consists of a bottom layer of
input cells, succeeded by alternating layers of local-feature extracting cells, and a top layer of output cells.
Nodes belonging to an individual network are highly interconnected with each other. The individual neural
networks interface through the connections between top layer output cells and bottom layer input cells. Each
network is organized so that multiple lower-order nodes can converge on higher order nodes, and single
higher-order nodes can diverge upon multiple lower-order nodes. The neural networks range from unimodal,
feature representation nodes, to multimodal, concept representation nodes. The system will begin untrained
with random connection weights between nodes. Learning should be concentrated on the early sensory
networks first. This will follow the ontogenetic learning arc seen in mammals where the earliest sensory
areas myelinate in infancy and the late association areas such as the PFC do not finish myelinating until
young adulthood. Of course, this form of artificial intelligence would have to have a prolonged series of
developmental experiences, similar to a childhood, to learn which representations to keep active in which
scenarios. The network will act to consolidate or potentiate in memory the specific groupings of nodes that
have produced favorable outcomes, in order to more rapidly inform future decision making.
These bottom-up to top-down reciprocations are organized into very precise oscillations that propagate in
regularly timed intervals across the network so that they do not interfere with each other. The oscillations
reciprocate back and forth at just the right speed so that each area has the time to process its inputs and
send an output before the next complement of inputs arrive. It is important to carefully structure timing
mechanisms in the present device so that messaging is not muddled or noisy.

The agent discussed here would be capable of integrating multiple specialized AI programs into a larger
composite of coordinated systems. To do this, it would be necessary to interface these systems with the input
side of the imagery generation system. Existing AI technology could be integrated with the system that is
described here in order to more quickly expand its behavioral repertoire and knowledge base. For example,
databases and encyclopedic content could be used as sensory input and the functions of other AI, adaptive
control and robotics systems could be added to its repertoire of available motor outputs and premotor
representations. The system should have open access to a memory bank of text including dictionaries,
thesauri, newswire articles, literary works and encyclopedic entries. The system should be able to integrate
with multiple applications such as rule-based systems, expert systems, fuzzy logic systems, genetic
algorithms, and archived digital text. The present system would benefit from the integration of existing
programs for input and output e.g. visual perception programs, and robotic movement programs. This patent
does not laboriously describe these components because they already exist in well-developed forms.

Other Features

In a general sense the imagery generation protocol is allowing discrete features to be bound into composite
maps. If the higher-order nodes for the features blue, wrinkled and glove were made sufficiently active they
should be used to create a topographic map of a glove that is blue and wrinkled. Some of the features held by
the higher-order nodes will not be able to be worked into a topographic map if the neural network does not
have the previous experience to know how to corepresent them. At the beginning of their ontogenetic learning
arc, higher-order nodes will be activated arbitrarily due to their random connections to lower-order nodes. In
the program’s infancy, a specific object will activate all of the higher-order nodes that are connected to the
features associated with that object. With time, only the higher-order nodes used the most will survive and a
much smaller subset of neurons that respond to all of the features at once will come to be the only nodes
activated by the object.  

Repeated loops of conserved, higher-order features can be ended when attention is captured by an object or
concept that competes for attention. The ability to free up higher-order nodes to attend to a new stimulus will
be programmed by training. Before proper training is accomplished the system may not be able to reallocate
its resources properly when its attention shifts.  

Cell assemblies in the primate PFC hold tiny fragments of larger representations. Individual cell assemblies
work cooperatively to represent larger psychological units known as chunks. George Miller has hypothesized
that perhaps we can hold “7 plus or minus 2” chunks at a time. Cowan has demonstrated that 4 chunks may
be a more realistic number. If these chunks can be imitated by a neural network, then it should be relatively
simple to program the network to increase the number of chunks and the size of the network, effectively
increasing processing resources in a way that is impossible in humans.

The program software would translate natural language queries and other user entries such as audio, video
and still images, into instructions for the operating system to execute. This would involve transforming the
input into the appropriate form for the system’s first layer of neural nodes. Simulating a simple neural network
on von Neumann technology requires numerical database tables with many millions of rows to represent its
connections which can require vast amounts of computer memory. It is important to select a computing
platform or hardware architecture that will support this kind of software.

This system will eventually have to embrace a model of utility function. Generally these models allow the agent
to sense the current state of the world, predict the outcome of multiple potential actions available to it,
determine the expected utility of these actions, and execute the action that maximizes expected utility. These
decisions should be driven by probabilistic reasoning that chooses actions based on probability distributions
of possible outcomes. Furthermore the device should eventually assume a hybrid architecture between a
reflex agent (that bypasses use of the association areas) and a decision-theoretic agent. Every time a
problem is solved using explicit deliberation, a generalized version of the solution is saved for future use of
the reflex component. This will allowing the device to construct a large common sense knowledge base of
both implicit and explicit behaviors.

Behavioral Output

To accomplish overt behavior, higher inputs are fed not only to the lower sensory nodes, but also in a similar,
top-down manner to a behavior module that will guide natural language output and other behaviors such as
robotic control. The final layer of nodes in this behavior module will be nodes that directly control movement
and verbalization and the higher nodes will be continuous with the higher-order PFC-like nodes. The software
functions in an endless loop of reciprocating transformations between sensory nodes, motor nodes and PFC-
like buffer.

Knowledge representation and knowledge engineering are central to AI research. Strong AI necessitates
extensive knowledge of the world and must represent things such as: objects, properties, categories,
relations between objects, events, states, time, causes, effects and many more (Moravec, 1988). Many
problems in AI can be solved, in theory, by intelligently searching through many possible solutions. Logical
proof can be attained by searching for a path that leads from premises to conclusions, where each step is the
application of an inference rule. Planning algorithms search through trees of goals and subgoals, attempting
to find a path to a target goal, a process called means-end analysis.

In order to solve problems, AI systems generally must have a number of attributes: 1) a way of representing
knowledge with syntax and semantics, 2) an ability to search a problem set, 3) a capacity for propositional and
first order logic, 4) an ability to use knowledge to perform searches, accomplish constraint satisfaction, plan,
infer, perform probabilistic reasoning, maximize utility, and act under uncertainty. There are developed
computational systems for each one of these things. The present device will not have any of these attributes
before its training commences. These abilities will be emergent in its network provided that it has the proper
training examples. For instance, when it creates a topographic map from high-order specifications, it is
searching its knowledge base for the most probable way to codepict or propositionalize the specifications in a
logical, veridical fashion based on prior probability.   

The imagery that is created is either based on external input, or internal, top-down specifications. Imagery is
assessed using more imagery. Each image will be assessed for appetitive or aversive content, the
architecture depicted in Fig 5 will be copied onto two separate, yet nearly identical systems, one fine-tuned for
approach behaviors and the other for withdrawal behaviors.

Other Forms of Sustained Activity

It will be necessary for the PFC module to have a “gating” function, regulating access of prioritized contextual
representations into active memory. This gating will be integrally involved in the processing of task-relevant
information and the inhibition of task-irrelevant information. Information related to behavioral goals must be
actively organized and maintained, such that these representations can bias behavior in favor of goal-directed
activities over temporally-extended periods. Many studies have heavily suggested that the mammalian PFC
system is engaged when reward or punishment contingencies change (Braver and Cohen, 1999; Miller &
Cohen, 2001). Seamans and Robbins (2010) have greatly elaborated on a functional explanation for why this
is the case. They have stated that when reward or punishment contingencies change, the DA system is
phasically activated because it is adaptive for the animal to anchor upon and further process novel or
unpredicted events. Continued activation across time is necessary when the animal makes a prediction error,
is uncertain, feels pressured to better understand its situation, or increased cognitive effort is necessary to
process the significance of a novel event.  Sustained firing of PFC nodes in the present device will occur in
response to a variety of events including both appetitive and aversive ones (Seamans & Robbins, 2010). The
PFC representations of salient, uncertain, unpredicted, or novel events are kept active over time to aid in the
processing of their significance.

Aside from having a PFC analogue, the network could also have an analogue of cortical priming and an
analogue of the hippocampus. Humans have thoughts that carry continuity because changes in content are
gradual as more recent activations/representations are given a higher priority than older ones. Activity that
transpired minutes ago is given moderate priority, activity from seconds ago is given high priority and activity
from mere milliseconds ago is given the highest priority. This dynamic is made possible by the PFC
analogue, but could be accentuated by analogues of cortical priming. To allow for an analogue of cortical
priming, all recently active neurons would retain a moderate amount of increased, but subthreshold activity.
The activity level of recently used nodes in both the higher and lower-order areas would not quite fall back to
zero. This would ensure that recently used patterns and features would be given a different form of priority, yet
to a lesser and more general extent than that allowed by the PFC analogue. Regarding the network partitions
depicted in Figure 5, the sensory, motor and hippocampal neural networks would show the least priming, the
association area, and premotor neural networks would show moderate priming, and the PFC would show the
highest degree of priming. Functions for the parameters of priming could be fine-tuned by genetic algorithms.
Furthermore, the network could have an analogue of the hippocampus. A hippocampal analogue would keep
a record of contextual, or episodic clusters of previous node activation. Instead of keeping a serial record of
averaged activity, the hippocampus analogue would capture episodic constellations of node activity and save
these to be reactivated later. These episodic memory constellations would be activated when a large subset
of the constellation is present during processing. This means that when neural network activity closely
approximates an activity constellation that was present in the past, the hippocampal analogue is capable of
reactivating the original constellation.  The activity of the hippocampal analogue should be informed by actual
hippocampal anatomy and the “pattern completion” hypothesis of hippocampal function. To build an analogue
into a neural net it would be necessary to have a different form of memory that can be cued by constellations of
activity that closely resembles a past (autobiographical or episodic) occurrence. This memory system would
then be responsible for “completing the pattern,” or passing activation energy to the entire set of nodes that
were initially involved in the original experience, allowing the system a form of episodic recall. As with the
actual brain (Amaral, 1987), in the present device, the hippocampus should be reciprocally connected with the
PFC and association areas but not with primary sensory or motor areas.

DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS
The computer programming code (whether software or firmware) will typically be stored in one or more
machine readable storage devices. The article of manufacture containing the computer programming code to
run the neural network is used by either executing the code directly from the storage device, by copying the
code from the storage device into another storage device such as a hard disk, Ram, etc. or by transmitting the
code on a network for remote execution.

The scope of the invention should be determined not by the embodiment(s) illustrated but by the appended
claims and their legal equivalents:
1.        An artificial intelligence system for solving complex problems, comprising:
A)        a computer apparatus including:
i)        interface means of accepting computer-readable data input,
ii)        memory means for storing computer-readable data,
iii)        processor means for manipulating computer-readable data,
iv)        interface means for communicating computer-readable data output
v)        a temporary storage buffer for each sense modality
vi)        a long-term memory used by said AI program
B)        A plurality of modular intelligent states, similar in structure, each comprising:
i)        Means of accepting sense data
ii)        Means of accepting policy instructions
iii)        Algorithmic AI means of evaluating and making decisions and implementing actions
C)        A means for evaluation of success and reinforcement of the algorithmic artificial intelligence processes
D)        A means to execute two search functions, one using a breadth search algorithm and the other a depth
search algorithm.



CLAIMS
“A {insert title} comprising: {list the parts one by one} {then explain how each are connected}.”
1.        A modular, hierarchically organized, artificial intelligence (AI) system that features a working memory
updating function and the capacity for imagery generation. The system comprises an algorithmic, imagery
guidance process to be implemented by neural network software that will simulate the neurocognitive
functioning of the mammalian prefrontal cortex.
2.        Cognitive control stems from the active maintenance of features/patterns in the PFC module that allow
the orchestration of processing in accordance with internally selected priorities.
3.        The network contains nodes that are capable of “sustained firing,” allowing them to bias network activity,
transmit their weights, or otherwise contribute to network processing for several seconds at a time (generally
1-30 seconds).
4.        The network is an information processing system that has the ability to maintain a large list of
representations that is constantly in flux as new representations are constantly being added, some are being
removed and still others are being maintained. This distinct pattern of activity, where some individual nodes
persist during processing makes it so that particular features of the overall pattern will be uninterrupted or
conserved over time.
5.        Because nodes in the PFC network are sustained, and do not fade away before the next instantiation of
topographic imagery, there is a continuous and temporally overlapping pattern of features that mimics
consciousness and the psychological juggling of information in working memory. This also allows
consecutive topographic maps to have related and progressive content.
6.        If this sustained firing is programmed to happen at even longer intervals, in even larger numbers of
nodes, the system will exhibit even more mental continuity over elapsing time. This would increase the ability
of the network to make associations between temporally distant stimuli and allow its actions to be informed by
more temporally distant features.
7.        The network’s connectivity allows reciprocating cross-talk between fleeting bottom-up imagery in early
sensory networks and lasting top-down priming in association and PFC networks. The features that are
maintained over time by sustained neural firing are used to create and guide the construction of mental
imagery. The PFC and other association areas direct progressive sequences of mental imagery in the visual,
auditory and somatosensory networks.
8.        The network involves reciprocating transformations between a partially conserved store of multiple
conceptual specifications and another nonconserved store that integrates these specifications into veridical,
topographic representations.


References Cited:

Amaral DG. 1987. Memory: Anatomical organization of candidate brain regions. In: Handbook of Physiology;
Nervous System, Vol V: Higher Function of the Brain, Part 1, Edited by Plum F. Bethesda: Amer. Physiol Soc.
211-294.

Baars, Bernard J. (2002) The conscious access hypothesis: Origins and recent evidence. Trends in Cognitive
Sciences, 6 (1), 47-52.

Baddeley, A.D. (2007). Working memory, thought and action. Oxford: Oxford University Press.

Carpenter, G.A. & Grossberg, S. (2003), Adaptive Resonance Theory, In Michael A. Arbib (Ed.), The Handbook
of Brain Theory and Neural Networks, Second Edition (pp. 87-90). Cambridge, MA: MIT Press

Chalmers, D.J. 2010.The Character of Consciousness. Oxford University Press.

Crick F, Koch C. A framework for consciousness. Nature Neuroscience. 6(2): 119-126.

Damasio AR. Time-locked multiregional retroactivation: A systems level proposal for the neural substrates of
recall and recognition. Cognition, 33: 25–62, 1989.

Edelman, G. Neural Darwinism: The Theory of Neuronal Group Selection (Basic Books, New York 1987).

Fuji H, Ito H, Aihara K, Ichinose N, Tsukada M. (1998). Dynamical Cell Assembly Hypothesis – Theoretical
possibility of spatio-temporal coding in the cortex. Neural Networks. 9(8):1303-1350.

Fukushima, Kunihiko (1975). "Cognitron: A self-organizing multilayered neural network". Biological
Cybernetics 20 (3–4): 121–136. doi:10.1007/BF00342633. PMID 1203338.

Fuster JM. 2009. Cortex and Memory: Emergence of a new paradigm. Journal of Cognitive Neuroscience. 21
(11): 2047-2072.

Gurney, KN. 2009. Reverse engineering the vertebrate brain: Methodological principles for a biologically
grounded programme of cognitive modeling. Cognitive Computation. 1(1) 29-41.

Hawkins, Jeff w/ Sandra Blakeslee (2005). On Intelligence, Times Books, Henry Holt and Co.
Hebb, Donald (1949). The Organization of Behavior. New York: Wiley.

Teuvo Kohonen. 2001. Self Organizing Maps. Springer-Verlag Berlin Heidelberg: New York.

Klimesch W, Freunberger R, Sauseng P. Oscillatory mechanisms of process binding in memory.
Neuroscience and Biobehavioral Reviews. 34(7): 1002-1014.

Kurzweil, R. (2012). How to Create a Mind: The Secret of Human Thought Revealed. Viking Adult.

Kurzweil, Ray (2005). The Singularity is Near. Penguin Books. ISBN 0-670-03384-7.

Lansner A. 2009. Associative memory models: From the cell-assembly theory to biophysically detailed cortex
simulations. Trends in Neurosciences. 32(3):179-186.

Luger, George; Stubblefield, William (2004). Artificial Intelligence: Structures and Strategies for Complex
Problem Solving (5th ed.). The Benjamin/Cummings Publishing Company, Inc.. ISBN 0-8053-4780-1.

M Riesenhuber, T Poggio. Hierarchical models of object recognition in cortex. Nature neuroscience, 1999.

McCarthy, John; Hayes, P. J. (1969). "Some philosophical problems from the standpoint of artificial
intelligence". Machine Intelligence 4: 463–502.

McCulloch, Warren; Pitts, Walter, "A Logical Calculus of Ideas Immanent in Nervous Activity", 1943, Bulletin of
Mathematical Biophysics 5:115-133.

Meyer K, Damasio A. Convergence and divergence in a neural architecture for recognition and memory.Trends
in Neurosciences, vol. 32, no. 7, 376–382, 2009.

Minsky, Marvin (2006). The Emotion Machine. New York, NY: Simon & Schusterl. ISBN 0-7432-7663-9.

Moravec, Hans (1988). Mind Children. Harvard University Press. ISBN 0-674-57616-0.

Moscovich M. Memory and Working-with-memory: A component process model based on modules and
central systems. Journal of Cognitive Neuroscience. 4(3):257-267.

Moscovitch M, Chein JM, Talmi D & Cohn M. Learning and memory. In Cognition, brain, and consciousness:  
Introduction to cognitive neuroscience. Edited by BJ Baars& NM Gage. London, UK: Academic Press; 2007, p.
234.

Nilsson, Nils (1998). Artificial Intelligence: A New Synthesis. Morgan Kaufmann Publishers. ISBN 978-1-
55860-467-4.

Reser, Jared Edward. (2012). Assessing the psychological correlates of belief strength: Contributing factors
and role in behavior. Doctoral Dissertation. University of Southern California.

Rochester, N.; J.H. Holland, L.H. Habit, and W.L. Duda (1956). "Tests on a cell assembly theory of the action of
the brain, using a large digital computer". IRE Transactions on Information Theory 2 (3): 80–93.

Rosenblatt, F. (1958). "The Perceptron: A Probalistic Model For Information Storage And Organization In The
Brain". Psychological Review 65 (6): 386–408. doi:10.1037/h0042519. PMID 13602029.

Rumelhart, D.E; James McClelland (1986). Parallel Distributed Processing: Explorations in the Microstructure
of Cognition. Cambridge: MIT Press.

Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle
River, New Jersey: Prentice Hall, ISBN 0-13-790395-2

Turing, Alan (1950), "Computing Machinery and Intelligence", Mind LIX (236): 433–460,