April 13, 2010

Working for nothing: the power of variable-ratio reinforcement

The ability to manipulate behaviour simply by changing schedules of reinforcement, has always amazed me. Reinforcement is the presentation of a rewarding stimulus (e.g. food, money, praise etc) or removal of a negative stimulus (e.g. pain, anticipation etc) in response to a given behaviour. Generally speaking, the more we reinforce, the greater the frequency of response and the less likely it is that the response will disappear from the behavioural repertoire (‘extinction’). This reflects much of the behaviour we see in animals and other people.

However, the relationship isn’t a linear one and by trying different schedules of reinforcement, we can demonstrate that the predictability of reinforcement has a huge bearing on the rate of response. It turns out that by presenting reinforcement at variable intervals (e.g. on average every 10 seconds) or variable ratios (e.g. on average every 10 lever presses), we can generate quite astonishing rates of response. This passage from Skinner (1961, pp. 106-7) gives some idea of just how powerful variable-ratio reinforcement can be in the case of a pigeon:

We are all familiar with this schedule because it is the heart of all gambling devices and systems. The confirmed or pathological gambler exemplifies the result: a very high rate of activity is generated by a relatively slight net reinforcement. Where the “cost” of a response can be estimated (in terms, say, of the food required to supply the energy needed, or of the money required to play the gambling device), it may be demonstrated that organisms will operate at a net loss.

When the food magazine is disconnected after intermittent reinforcement, many responses continue to occur in greater number and for a longer time than after continuous reinforcement … The potential responding built up by reinforcement may last a long time. We have obtained extinction curves six years after prolonged reinforcement on a variable-ratio schedule. Ratio schedules characteristically produce large numbers of responses in extinction. After prolonged exposure to a ratio of 900:1 … the bird was put in the apparatus with the magazine disconnected. During the first 4½ hours it emitted 73,000 responses.

“It may be demonstrated that organisms will operate at a net loss.” There’s that point again just in case you missed it. The idea that you can get an animal to work at an unsustainable rate – at a rate that would ultimately kill it – simply by varying when you present it with food is sobering indeed. The implications of this for our understanding of human behaviour are profound, but all too often ignored. A person’s reinforcement landscape can be as coercive and disempowering as a drug addiction, just as it can be nurturing and make manifest all of the best and most desirable of human characteristics. We would all do well to remember this before judging others.

References

Skinner, B. F. (1961). Cumulative Record. (Enlarged ed.) New York: Appleton-Century-Crofts. (Reprinted from Skinner, B. F. (1957). The experimental analysis of behavior. American Scientist, 45, 343-371).

Kudos

Working for nothing: the power of variable-ratio reinforcement

Now read this

Thinking outside the brain: problem-solving, hardware evolution and embodied cognition