Little-known Pavlov’s Theory of Learning Without Reinforcement
In 1933 Ivan Petrovich Pavlov, who coined the term ‘reinforcement’ in 1906, wrote a paper that was published four decades later. The paper contained Pavlov’s description of the type of associative learning that didn’t require reinforcement. Pavlov’s most loyal apprentices didn’t believe their ears when they heard this description in one of his last lectures. They went to check the stenogram and when they discovered the same text they accused the stenographer of making mistakes. Here is that description from Pavlov’s paper Psychology as a Science:
“A case of great fundamental importance, first systematically investigated by Thorndike, consists of opening boxes in which animals were confined and from which they sought to get out either for the sake of freedom or to obtain food lying outside the box. The matter was reduced to a mass of varied movements of the animal, which finally led to the opening of the door, and the special movement that achieved the goal was performed by the animal more and more quickly and accurately with repetition of the experiment. This obviously meant that knowledge of the connection of material objects of the environment was acquired, and with it power over them.
In experiments with conditioned stimuli, the animal determines the relationship of individual objects of the environment to itself. In Thorndike’s experiments, the animal becomes acquainted with the relationship of external things to each other, with their connections. Consequently, this is knowledge of the surrounding world. This is the embryo, the germ of science. And the acquisition of this knowledge is accomplished by the same method by which the material of modern natural science is constantly accumulated, i.e., by trial and error. By the same method of all mankind (and not specialists) the entire grandiose so-called human empiricism has been and is being collected. The difference between it and science is that in science the area of trials, experiments and errors is increasingly narrowed, since the new is sought, relying on the old. Thus, each new association concerning the relations of external things is an addition to knowledge, and the use of this knowledge is what is called understanding. It is impossible to imagine understanding anything otherwise. How can one understand something without knowing, without having various associations, i.e., connections of external objects!
Now the next important question: what does it mean, what is the significance of reinforcement in conditioned reflexes and these or those impulses used in Thorndike’s experiments? Obviously, in these two series of experiments the state of affairs is significantly different. In conditioned reflexes, on the one hand, the matter is about connecting in the cortex the points of the applied external stimuli with the points of the cortical representation of the corresponding unconditioned reflex of emotion, i.e. on the formation of a certain association, on the other hand — on the excitation that maintains the active state of the cortex of the hemispheres by a given emotion, on the high or sufficient tone of the cortex.
In Thorndike’s experiments, certain kinesthetic tactile and visual stimuli from known external objects and their position are combined with other, also certain visual, and perhaps together with kinesthetic stimuli from the same or other external objects. Instincts, emotions, play a separate role, being the stimuli of the animal’s motor activity, sometimes chaotic (this is only at the very beginning of the animal’s orientation in the environment immediately after birth), sometimes almost constantly, directed to a certain extent by previously formed associations, previous knowledge. Thus, here the formation of an association, on the one hand, and the maintenance of the necessary motor activity of the animal, as well as the necessary tone of the cortex, on the other, are separated from each other, and in the procedure of conditioned reflexes they are merged. That this analysis is correct is proven by the details of the above experiment of Podkopaev and Narbutovich. Experiments with the association of two indifferent stimuli when they coincided in time were conducted on animals several times, but without success.
At the same time, it could be noted that when they were repeated, the animal quickly developed an indifferent attitude towards them (extinction of the orienting reflex). Therefore, in new experiments with the same task, the goal was to maintain orienting reflexes in animals to the applied stimuli for as long as possible, so that a connection between the stimuli could be formed while the cortex was still active.
For this purpose, one [stimulus] was a tone, but constantly slightly changing, i.e. evoking an orienting reflex for novelty, and the other was a silently moving object, since movement in general is a physiologically longer-acting stimulus and only then is an association formed.”
Published: Unpublished and little-known materials of I.P. Pavlov. L., 1975. P. 99–103.