The Freedom of Thought and The Slavery of Reward

4 min readOct 6, 2019

“Humans think using their brain’s navigation system.” Only a couple of months ago I referred to this idea of eminent scientists in my post Thinking Spaces and named it groundbreaking. I’ve changed my mind since then.

We don’t use our brain’s navigation system for thinking. On the contrary, we are using our brain’s thinking system for navigation. For our brains thinking means predicting, planning and executing.

Mental time travel is much more important to the brain than traveling in physical spaces. The brain is constantly on the move in time matching future conditions to past outcomes and vice versa. Memory of the past serves as a foundation of valid predictions for the future. The past and the future are just two parts of the same movie. It’s absolutely logical and rational that the brain uses the same device, the hippocampus to process both of them.

Distance in time is more important for planning for the future than distance in space. Spatial navigation is a sort of mental time travel by itself. You want to know where to move before you move. Cognitive map allows us to predict what we should do next based on our model of the future. Models of the past and of the present are irrelevant to planning for the future. The accuracy of predictions and the distance in time to which we can extend them are two crucial factors which define the fitness of our actions for the purpose of survival.

Our intended actions and the evolution of the environment meet at some point in the future. There is always a time delay between input and output. It can be just a fraction of a millisecond or a decade of years but the action of an organism is always carefully planned in advance with all possible accuracy given the available data and time for planning and execution.

Our brain demonstrates the accuracy of probability estimation of the Ideal Bayesian observer but it's not necessarily using Bayesian statistics to achieve it. The brain may well be using chaotic dynamics to achieve the same result. We look for landmarks not because they are at a longer distance away from us than cues but because we can use them for a longer mental travel in time with the purpose of the best possible planning our actions.

Bacteria are traveling in time for milliseconds when they sense tiniest concentration differences in the environment and take actions based on the dynamics of concentration gradient hence bacterium predicts future consequences of the evolution of the environment, plans and executes actions required to secure a favorable development in the future. Its time travel horizon is milliseconds.

Human brain can be modelled with a colony of bacteria in biofilm. Together our neurons can travel in time for light years but each neuron travels in time for milliseconds like a bacterium. Yet in both cases we see the same cycle of sampling of the environment in order to locate as distant predictors in time as possible (be it landmarks or chemical gradients); predicting the future on their basis; planning and execution of actions which will lead for the most favorable outcome. Selection of the most favorable (the most desired I would say) outcome is based on predicting outcomes for different actions or no action. The longer the time horizon for planning the wider the variety of possible actions and their combinations which become available.

It goes without saying that the data available for the brain is always incomplete and our predictions and plans on such basis are far from accurate. Hence our planned and executed actions are suboptimal. Therefore, like bacteria in chemotaxis we regularly take samples on the way, verify predictions and tune our plans and actions accordingly. Then we move on again in an adjusted but yet suboptimal direction. We do it at several levels starting from single neurons up to neural networks, up to the brain, up to the consciousness. The integrated process involving all levels is thinking.

Humans can’t be human without the ability to think. We need to be able to predict the behavior of other humans even perfect strangers to cooperate effectively in the human society. Our ability to think has been disabled to a great extend by the reinforcement principle that governs the lives of modern humans literally from cradle to grave.

As I explained before reinforcement principle is the invention of humans. Learning by association of an effect with an external reward is a product of human arts hence it’s artificial. The intelligence based on that artificial principle is artificial as well. Bacteria can establish cause-effect relationships in its environment without any external reinforcement. Are humans more stupid than bacteria? Of course, we are not.

Reinforcement learning is a quick and dirty hack that allows us to disable our self-control system that plays a crucial role in thinking. Disabling the control system opens the way to addictions but suppresses agency at the same time. After being hacked by reinforcement, it becomes easier to control people yet at the expense of their autonomy and ability to think in general.

Reinforcement learning makes people less smart, in the first place. In the second place, it enables even least smart people to learn. It makes humans unfit for the natural environment in which less smart people simply wouldn’t survive. Yet it makes even least smart people fit for the artificial machine-like environment of civilization.

It's just fine when the machine is stable and predictable. When the machine of civilization becomes unstable everyone gets in trouble. We’ve only recently entered a prolonged period of such instability.

The Freedom of Thought and The Slavery of Reward

Written by Yuri Barzov