The term inference in Karl Friston’s theory means the derivation of an unknown cause from a known consequence. According to Friston, living beings guess about the cause hidden from them (by the Markov blanket), based on the outcome, which surprises them — precisely by the absence of an obvious cause.
“This surprise can be thought of as a prediction error, which can be used to update the best guess to provide a better prediction”, Friston clarified, helping me to finally understand that a strange event can be also classified as a prediction error although no specific prediction in respect of such an event was made beforehand. An outlier is a prediction error indeed!
When they find a suitable cause or explanation for their sensations, the prediction error is resolved. If they manage to explain away the prediction error or surprise, then their new explanation or causal relationship (regularity, pattern) replenishes their picture of the world (objective reality). This is known as Bayesian belief updating as a nod to the underlying rules of probability and inference.
Thus, living things resolve uncertainty as Friston put it. Surprise serves as their driving motive, but they are surprised not by uncertainty, but by unexpectedness.
It is the unexpected uncertainty, which I love so much, that both stimulates learning and provides an opportunity to learn something because the surprise here is the equivalent of the unknown.
We do not know what was the cause for the event to happen, and we are curious to find out why it happened. We do not know, which means we can find out. This means that we can learn something. Once we find out the reason for that inexplicable event, we will not be surprised if a similar event happens again. In short, minimizing surprise compels us to be curious creatures.
It seems to me that everything I have written so far about Friston’s theory should not only be easy to understand but should also seem painfully familiar to any person who has read the first section of this book about the natural method of learning. In my opinion, this is it, word for word.
Further details begin. We do not just speculate about the reason on the coffee grounds, but we use the Bayesian inference for this, yet applied in reverse. It means, we take the observed outcome in the form of raw data, put it through the individual filter of our structured experience (subjective perception), and get an update of the picture of the world (objective reality) either with the help of a new pattern we have invented — a cause combined with an effect or by determining an event that falls out of this picture as an outlier.
Naturally, I am not at all surprised that in this case the probability distribution according to Bayes’s formula is calculated only approximately. Friston actually uses the probabilistic Nalimov-Bayes syllogism, most probably without even realizing it.
Further, the meaning is unpacked into text by mathematical means, in which I am not fluent. However, I would not be surprised if Friston’s mathematics resembles the mathematics of Kolmogorov complexity, which describes both the process of compressing text (a pattern of algorithms) into meaning (the shortest algorithm that can reproduce the entire pattern) and the reverse process of extracting text from meaning. “Indeed, this is exactly where the predictive coding gloss on free energy minimization started in the 1950s, where it was used to compress sound files. Compression, efficiency, and the minimization of complexity have been central and recurrent themes throughout”, Friston commented, further strengthening this link.
Where is free energy here? Friston’s free energy is a measure of surprise. It has no semantic relation to thermodynamic free energy, which we discussed in the previous chapters. Perhaps there is some formal similarity only in mathematics. Therefore, I do not use this term in describing Friston’s theory. What interests me most about it is the meaning.
Friston’s theory, however, contradicts the behavior of the wise squeaker-gudgeon in the fairy tale of Mikhail Saltykov-Shchedrin. He sat in his hole, not sticking his head out, because he was afraid of the expected (known) uncertainty and, as a result, waited for the only unexpected certainty called death. When assessing the likelihood of being or not being eaten by a pike or caught on bait, the wise squeaker-gudgeon always proceeded from the worst course of events, the probability of which seemed to the squeaker-gudgeon simply enormous.
Well, God bless him. The wise squeaker-gudgeon is a fictional character, but the behavior of people often doesn’t differ from the behavior of that squeaker-gudgeon. What the hell resolution of uncertainty you are talking about?! We just have to sit in holes and tremble! “Lived trembled and died trembled”, as it was written by Saltykov-Shchedrin.
Expected (known) uncertainty can’t be resolved because it is already a result of the resolution of unexpected uncertainty. We can learn nothing from exploring it further. You can flip a coin a million times, but it does not help to predict which side it will fall next time. Although the probability is known: 50 to 50. Neither heads nor tails will surprise us. We will be surprised if the coin stands on its edge. Only then will we start looking for the cause and learn something if we find it (and even if we don’t find it).
Different events occur with varying degrees of probability. Some almost certainly happen, others only rarely. Knowing the probabilities, we can use this knowledge to minimize bad outcomes (and their consequences) and maximize good ones. This is called knowledge exploitation. It is important but it’s not the subject of this book.
I really like the metaphor of knowledge as something like thermodynamic free energy. Knowledge can do work. However, in the course of work, part of the knowledge is dissipated in the same manner as part of the energy that produces work gets dissipated into heat. The level of informational entropy of the system is increasing, but we cannot export this entropy to the environment if the level of entropy there is higher than in our system.
In order to export entropy to the environment, we must lower the level of our internal entropy below the level of external entropy.
If entropy (i.e., average surprise) is a measure of ignorance, then we need to lower our ignorance. How can we do it?
This can be done by resolving unexpected uncertainty and thereby reducing entropy. Each correct solution along this path increases our knowledge only by a fraction but expands the space of our ignorance many times more. By reducing the entropy of knowledge in the field of knowledge exploitation, we simultaneously increase it in the field of knowledge creation.
I am not yet ready to describe what this process looks like, but I rely on my judgment on eyewitness accounts. Einstein, Planck, Feynman — it seems to me that this list can be continued with many more names of great scientists — everyone said and wrote that each new discovery opens up a huge new field of unknown.
In response to the correct answer, we get a whole dozen of new questions. Prominent American theoretical physicist John Archibald Wheeler compared knowledge to an island in the ocean of ignorance. The larger the island becomes, the longer its coastline with ignorance stretches.
By eliminating one reason for surprise, we simultaneously create a lot of new reasons. Probably, someone will say that this increase in entropy with the help of its decrease contradicts some important law of nature but it doesn’t mean it’s unreal.
It seems to me now that Erwin Schrödinger wrote about this kind of negative entropy when he claimed that life feeds on it. Any dead matter is capable of following the laws of nature, but only life can invent these laws.
In an article about curiosity, exploration, and insight, Friston and his colleagues really described the natural learning method in great detail and, from my point of view, very precisely. However, in their description, it again boiled down to minimizing surprise.
Finding the cause, indeed, turns an unexpected event into an expected one, however, only within very narrow limits — only when we observe the event that we have identified as the cause. Friston sees this narrowness and introduces several more levels, at which the unexpected is also minimized. He gets a fairly complete and consistent design, which, nevertheless, is based on nothing but minimization of surprise.
A child is born into a world full of surprises. For him, everything happens for the first time. He not only looks for the causes of everything that happens around him but also constantly expands his space, meeting more new surprises.
I cannot get rid of the thought that the behavior of a living creature, according to Friston’s theory, is still a modification of the behavior of the wise squeaker-gudgeon.
“If you want to enjoy life, keep your eyes open,” the father instructed the squeaker-gudgeon. “The pike is in the pool so that the crucian carp won’t doze,” says a Russian popular proverb. This is, of course, completely anti-scientific, but let’s investigate what they are talking about. Why is it necessary for the squeaker-gudgeon to keep its eyes open? Are we looking not to miss an expected event or an unexpected one?
If the crucian carp already knows that the pike is sitting in the pool, then for him the attack of the pike will be an expected event that fits into his model of the world. However, he does not know when and from which direction the attack will come. How can he minimize this surprise with Friston’s model?
In my opinion, it turns out that he can not in any way unless he hides in a hole like the wise squeaker-gudgeon or jumps out of the water onto the shore. Death minimizes all surprises, my regular Facebook opponent wrote recently.
It turns out that after birth we maximize surprises, and then we begin to minimize them until we reduce them to zero with the help of death. Although we will no longer know about this, the model by which we predict the future according to Friston will cease to exist.
Probably, I would be reconciled with Friston’s theory by replacing minimization with eating. Life feeds on unexpectedness. We absorb surprise because it feeds our minds. We minimize the surprise by eating it.
Of course, you can call lunch a process of minimizing food. But minimizing food is not the goal. Our goal is to get enough and enjoy the process.
I don’t know if Friston’s outstanding math can be adapted to the mind-feeding process instead of minimization. It would be great if it could. Friston found the mind-feeding analogy excellent so now it will be definitely done.
- Friston, K. J., Lin, M., Frith, C. D., Pezzulo, G., Hobson, J. A., & Ondobaka, S. (2017). Active Inference, Curiosity and Insight. Neural computation, 29(10), 2633–2683. https://doi.org/10.1162/neco_a_00999
- Mikhail Saltykov-Schedrin (1883) Fairy Tales for the Grown out of Age Kids
- V. V. Nalimov, Jeanna Drogalina-Nalimov, K. Zuyev (2000) The Universe of Meanings
- Alexander Shen, Vladimir Andreevich Uspensky, Nikolay Vereshchagin. Kolmogorov Complexity and Algorithmic Randomness. American Mathematical Society, 2017, 9781470431822. fflirmm-01803620f
- “We live on an island surrounded by a sea of ignorance. As our island of knowledge grows, so does the shore of our ignorance.” John A. Wheeler. Quoted in The New Challenges. Scientific American (Dec. 1992)
- Schrödinger, E., & Penrose, R. (1992). What is Life?: With Mind and Matter and Autobiographical Sketches (Canto). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139644129