Breaking the curse of overexploitation
“This phase arrives towards the end of a subject’s life-span, and is characterized by a learning investment of 0. As the end approaches, it is suboptimal to continue investing in gaining new information and the subject should invest its time only in exploiting the knowledge it had already accumulated, temporarily increasing its intake rate of resources.”
Oded Berger-Tal et al., 2015, The Exploration-Exploitation Dilemma: A Multidisciplinary Framework.
When we look at the current stage of Earth’s exploitation by humans, we intuitively realise that the above observation of Oded Berger-Tal et al. can be relevant to humanity as a whole. We can also estimate the the current phase of the lifespan of our civilization based on the balance of its exploration-exploitation activities. The belief in veridicality of our model of the world is itself a clear indication of the knowledge exploitation phase. The world is constantly changing. We can’t catch up with it without constant exploration. Learning investment of 0 can be justified by the belief that we already know the truth. As soon as we start believing we know the truth, the truth escapes from us.
Given the recent advances and broadening acceptance of supervised deep learning it is easy to conclude that deep learning is bringing the end of our civilization even closer as it scales up exploitation of the old knowledge. Although, it is less evident but deep learning can, in fact, be our only hope because it’s created by nature predominantly for exploration and the establishment of new knowledge.
In this essay I share findings of several teams of researchers suggesting that deep neural networks (DNNs) or similar ‘lattices of receptors’ placed in between of stimulus and response may represent one of the most universal and important building blocks of life that supports the entire knowledge establishment process.
I’ve also put together some examples which suggest that nonlinear dynamics which under certain conditions spontaneously emerge in deep networks (of either neurons or receptors) may represent the most fundamental mechanism of real life learning in real time due to their extreme sensitivity to initial conditions which provide for the most rapid and vast increase of information.
At last, I’ve collected some examples of frameworks, mathematical models and engineering approaches which can help to elaborate a solution that will refocus deep learning from exploitation to exploration. It may mark the emergence of an entirely new generation of AI systems capable of dynamic real life learning in real time.
The exploration-exploitation dilemma
“The trade-off between the need to obtain new knowledge and the need to use that knowledge to improve performance is one of the most basic trade-offs in nature, and optimal performance usually requires some balance between exploratory and exploitative behaviors. Researchers in many disciplines have been searching for the optimal solution to this dilemma.” Several years ago a multidisciplinary team of researchers from Israel and US universities presented in a paper “a novel model in which the exploration strategy itself is dynamic and varies with time in order to optimize a definite goal, such as the acquisition of energy, money, or prestige.”
After giving credit to the reinforcement learning (RL) solutions “based on a Bayesian modeling approach where the agent’s decisions are the product of a weighted average of some prior knowledge regarding the environment and current sampling information, and the agent’s need to explore is directly based on its perception of the environment, growing whenever the environment changes… due to the fact that uncertainty should promote exploration in an attempt to reduce it, and indeed there is evidence that surprising events and changes to the environment promote animals to learn faster,” authors conclude that RL solutions “are also very mechanistic in nature and are, in many cases, specifically tailored to solve certain tasks, such as passing through mazes, with no attention given to the general motivation and ecological background of the subject. In other words, the above mentioned models have concentrated on the how rather than on the why of the decision-making process.”
Their model “depicts a subject that can invest in energy acquisition (exploitation) or knowledge acquisition (exploration), according to a strategy that represents the proportion of time the subject invests in knowledge acquisition as a function of time along its lifetime.” While they “focus on the optimal exploration-exploitation strategies at different stages of a subject’s life-span” we propose to explore if a similar dynamical model can be applied to much shorter periods of time.
Let’s, now, see how bacteria, cockroaches and worms establish knowledge.
How do cockroaches explore?
When a cockroach senses a gust of wind with its cerci and abdominal ganglion it immediately rotates in the opposite direction from the wind and starts running. An article from Business Insider explains that such a suboptimal reaction of cockroaches to a very coarse stimulus that may indicate the approach of predators secured survival of cockroaches over hundreds of billions years.
“What is remarkable about the cockroach is not simply that it has survived so long but that it has done so with a singularly simple and seemingly suboptimal mechanism: It moves in the opposite direction of gusts of wind that might signal an approaching predator. This “risk management structure” is extremely coarse; it ignores a wide set of information about the environment — visual and olfactory cues, for example — which one would think an optimal risk-management system would take into account,” a Business Insider’s article states.
This example raises a simple question: what is optimal from the evolutionary point of view? The fittest survive, indeed, but do the fittest optimise their perceptions to have the most accurate model of the environment or do they optimise their policies to achieve the most desired outcomes?
First, single out a vital parameter
Cockroaches demonstrate that the most accurate model of the world is not required for evolution to design the optimal policy for survival. Instead of collecting as much information about the environment as possible cockroaches are tuned to detect only one general but vital cue. Yet they became true champions in sensing that one particular cue. “The cerci, two posterior appendages, contain filiform hairs that detect air currents and are one of the most sensitive sensory receptors in biology,” as Claire A. McGorry et al. specify in their paper.
Cockroaches are not the only champions in sensing a particular vital cue. “Cyanobacteria in the oceans are among the world’s most important oxygen producers and carbon dioxide consumers. Synechocystis is a spherical single-celled cyanobacteria” that is for over a century known for its ability to move towards light. But the method of how a tiny bacteria can sense where to move remained unclear until Nils Schuergers et al. discovered that “Synechocystis cells do not respond to a spatiotemporal gradient in light intensity, but rather they directly and accurately sense the position of a light source.”
“We show,” they explain in their paper, “that directional light sensing is possible because Synechocystis cells act as spherical microlenses, allowing the cell to see a light source and move towards it. A high-resolution image of the light source is focused on the edge of the cell opposite to the source, triggering movement away from the focused spot. Spherical cyanobacteria are probably the world’s smallest and oldest example of a camera eye.”
Bacteria, of course, doesn’t recognise the high-resolution image on its edge. It only reacts to the spot of light inferring the direction in which to move from the spot’s location. A very elegant solution, isn’t it?
Utilitarian model of the world instead of objective model
The interface theory of perception coined by Donald D. Hoffman expands on the idea that optimal policy is more important for survival than optimal veridicality of perception. “We find that veridical perceptions — strategies tuned to the true structure of the world — are routinely dominated by nonveridical strategies tuned to fitness. Veridical perceptions escape extinction only if fitness varies monotonically with truth. Thus, a perceptual strategy favored by selection is best thought of not as a window on truth but as akin to a windows interface of a PC.” They boldly conclude in their paper.
Everything that we need in the phase of knowledge exploitation is an interface, indeed. Cockroaches should run using this interface straight in the direction opposite to the wind. They would all be dead by now if they did so, however. Single trajectory is too easy to predict for a predator. Cockroaches still live because they can behave unpredictably. They use utilitarian model of the world optimised by policies instead of objective model optimised by accuracy.
Hypersensitivity to vital parameters
It is very tempting to agree with Hoffman’s logic based on the examples of cockroaches and bacteria but something is missing in it. Cockroaches and bacteria in our examples measure the single parameter of the environment that they use for optimisation of their policies with the highest possible sensitivity. They, actually, obtain the most accurate and veridical perception of the environment, although, through a very narrow window on truth.
Bacteria senses a very small change in the environment. It detects the gradient of the change very rapidly. It upscales a tiny chemical impuls into a “chain reaction” that results in a much stronger action: to move in the direction of gradient increase of a detected favorable stimulus or in the direction of gradient decrease of an unfavorable stimulus. Normally is moves around in very short strolls. When it detects the gradient it accelerates for longer periods of time in the desired direction.
Scientists from Cornell University have discovered long ago that bacteria creates a lattice of receptors at its surface to increase sensitivity and to amplify the signal. The description of the way how the lattice works reminds me very much the description of how nonlinear dynamics emerge in a deep neural network with randomly assigned initial connection weights.
Vast variety of responses to tiny changes in vital parameters
Although, it is generally accepted that “cockroaches respond to wind puffs, which may signal a predator attack, by making a swift turn followed by a forward acceleration,” a paper of Paolo Domenici et al. clearly demonstrates that all cockroaches rotate to a different degree. The most of them, indeed, rotate away from the wind but each of them selects a slightly different trajectory of escape. Some cockroaches (2,8–18% depending on a dataset) actually run towards the wind instead of escaping it. Researchers made a guess that cockroaches use different escape routes because they disguise predators this way.
May such deviations from optimal policy have an evolutionary rationale as well?
In fact, they have. “In cockroaches, wind evokes strong terrestrial escape responses in Periplaneta americana and Blattella germanica, but only weak escape responses in Blaberus craniifer and no escape responses in Gromphadorhina portentosa,” Claire A. McGorry et al. state in their paper. Their research proved that all four cockroach species possess wind-sensitive interneurons which provide input to the premotor/motor neurons of insects irrespectively of their behavioral response to wind. Hence the reason for different policies is not anatomical. Does it mean that different species have different response strategies to a threat or do they classify wind gusts differently as big, moderate or non-existent threat?
Anyway, the variety of cockroaches’ responses to a single stimulus is huge. What for does this variety exist? Let’s hypothesize.
According to Ashby’s law of requisite variety only “variety can destroy variety.” It means that an organism can survive in an environment only if it can have an equal or wider repertoire of responses to the repertoire of environmental challenges. Complexity of challenges require complexity of responses.
Motion picture, not a still-life
Cockroaches and bacteria in our examples model the dynamics of the world, not its static states. It’s a very narrow but deep model focused on only one stimulus. But agent’s sensitivity to it is extremely high, like chaos that is extremely sensitive to initial conditions. It also allows an agent to maintain a temporally deep model by slicing the inflow of data into simplest possible episodes (or scenes) along the timeline or the trajectory of phasic states of the sensory input into and makes an animated model of the world from several tiny slices to establish the gradient of change and to react to it in real time. This animated narrative encoding process may manifest “itself in the form of sequential metastable spatio-temporal patterns,” as specified in the model of sequential episodic memory initiation. It may be using the hypersensitivity of chaos to initial conditions in order to rapidly amplify the received vital sensory signal.
Chaos — learning emerges from sensitivity to initial conditions
“We re-purposed a neural circuit from the nervous system of the nematode C. elegans. It is responsible for generating a simple reflexive behavior — the touch-withdrawal,” says Mathias Lechner, who is now working at the Institute of Science and Technology (IST) Austria.
“In a standard RNN-model, there is a constant link between neuron one and neuron two, defining how strongly the activity of neuron one influences the activity of neuron two”, says Ramin Hasani. “In our novel RNN architecture, this link is a nonlinear function of time.”
Deep Temporal Models and Active Inference
“The deep temporal aspect of these models means that evidence is accumulated over nested time scales, enabling inferences about narratives (i.e., temporal scenes). We illustrate this behaviour with Bayesian belief updating — and neuronal process theories — to simulate the epistemic foraging seen in reading. These simulations reproduce perisaccadic delay period activity and local field potentials seen empirically.”
The Anatomy of Inference: Generative Models and Brain Structure
“Generative models that evolve continuous time or discrete time likely coexist in the brain, mirroring the processes generating sensory data. While, at the level of sensory receptors, data arrive in continuous time, they may be generated in a sequential, categorical manner at a deeper level of hierarchical structure. For example, a continuous model may be necessary for low level auditory processing, but language processing depends upon being able to infer discrete sequences of words (which may themselves make up discrete phrases or sentences).”
Active inference, communication and hermeneutics
“In our previous paper, we focused on the dynamical phenomena that emerge when two dynamical systems try to predict each other. Mathematically, this dynamical coupling is called generalised synchrony (aka synchronisation of chaos).”
Discrete Sequential Information Coding: Heteroclinic Cognitive Dynamics
“The hierarchical sequential segmentation of information into discrete events — patterns — is a fundamental intrinsic feature of brain dynamics. This concept has been used to design top-down explanations for brain activity on the view that the brain infers causes of its sensory input (Kiebel et al., 2009; Friston et al., 2011). In this setting, hierarchical sequential dynamics in general — and stable heteroclinic channels in particular — have been used as the basis of generative models for the Bayesian brain. We discuss here an adequate mathematical approach that is applicable for the description and prediction of consciousness, emotion, and human behavioral activity.”
“We would like to end with a remark on the popular view that brain computational models need to be extremely high dimensional to be predictive. This view is based on the fallacy that computational dimension is related to the complexity of the brain itself as a “hardware” system with different interacting spatial scales from which cognition emerge. Such modeling is unfeasible yet, as the brain remains only partially observable. However, we may not need it to explain key aspects of cognitive processes because we are talking about mind dynamics with finite resources, i.e., specific kinds of brain activity such as attention, memory retrieval, decision making, etc. A top-down mathematical model of such processes can be built using the following dynamical principles that we discussed above: (i) clusterization the neural activity in space and time and formation of information patterns; (ii) discrete sequential information coding; (iii) robust sequential coordinated dynamics based on heteroclinic chains of metastable clusters; and (iv) sensitivity of such sequential dynamics to intrinsic and external informational signals. These principles open a new direction for the understanding of the observed brain dynamics and the creation of the basis of a mathematical theory of consciousness.”
The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach
“…we propose a model-based RL method based on learning an approximate, factorized transition model. The approximate transition model involves discrete, abstract states acting as information bottlenecks, which mediate the transitions between successive full states. Once learned, the approximate transition model is then applied to learn the agent’s policy (for example, using Q-learning with rollout simulations). This method has several advantages. First, the factorized model has significantly fewer parameters compared to a non-factorized transition model, making it highly sample efficient. Second, by learning the abstract state representation with the specific goal of obtaining an optimal policy (as opposed to maximizing the transition model’s predictive accuracy), it may be possible to trade-off some of the transition model’s predictive power for an improvement in the policy’s performance. By grouping similar states together into the same discrete, abstract state, it may be possible to improve the performance of the policy learned with the approximate transition model.”
Mind-to-mind heteroclinic coordination: model of sequential episodic memory initiation
“… we present and study a low-dimensional model of mind-to-mind episodic memory interaction. We emphasize from the beginning that we intend not to model the brain itself as a system but to create a dynamical model for the activity of this system. Our ultimate goal is to describe, understand and make predictions of mind dynamics, obtaining, in particular, dynamical models of specific classes of such activities as cognition, creativity, and autobiographic memory.”
Developing Concepts with Children Who Are Deaf-Blind
“Each deaf-blind child develops their own unique concepts based on their personal experiences. Here are some ideas that make sense from the perspective of the deaf-blind people who had them, but that might seem “odd” to someone with sight and hearing:
- a boy thought “going home” meant the feel of a bumpy road and a series of turns in the car
- a boy experiencing snow for the first time thought it was ice cream and asked for chocolate
- a girl touched a wet leaf and signed “cry” (it felt like tears)
- a girl thought food came from a mysterious place up high (it was always set down on the table from above)
- a young man didn’t know, even after many years, that his family’s pet cat ate (he had never seen it or touched it as it ate, and no one had ever told him)”
“A deaf-blind child will have difficulty developing accurate ideas about the world unless she has at least one trusting, significant, meaningful relationship to serve as a center from which to explore the world in gradually widening circles. The process of developing concepts is a shared adventure between a child and the child’s communication partners. It involves the co-creation of meaning. The child does not make meaning by herself; she and her communication partners make meaning together (Nafstad & Rodbroe, 1999).”