In operant conditioning, reinforcement is any change in an animal's surroundings that (a) occurs after the animal behaves in a given way, (b) seems to make that behavior re-occur more often in the future and (c) that reoccurence of behavior must be the result of the change.

For example: you drop a coin in a slot on an unlabeled, unfamiliar machine, and a potato chip immediately appears in an opening below. If you then drop coins into the slot more often than you would have if no potato chip had appeared, the appearance of the potato chip is reinforcement for the coin-dropping behavior.

Note that it is the coin-dropping behavior that is reinforced, not you. The potato chip serves as a reinforcer, reinforcing or strengthening that behavior, only to the extent that such coin-dropping subsequently occurs more often because of it.

The study of reinforcement has produced an enormous body of reproducible experimental results. Reinforcement is the central concept and procedure in the experimental analysis of behavior.

Schedules of reinforcement

A chart demonstrating the different response rate of the schedules of reinforcement, each hatch mark designates a reinforcer being givenWhen enough of the variations in an animal's surroundings are reduced or "controlled," its behavior patterns after reinforcement are remarkably predictable. When rates of reinforcement are adjusted in particular ways, even very complex behavior patterns can be predicted. A schedule of reinforcement is the protocol for determining which responses (i.e., which individual occurrences of a given behavior) will be reinforced. The two extremes are continuous reinforcement, in which every response results in reinforcement, and extinction, in which no response is reinforced.

Other schedules include

  1. Fixed ratio (FR), in which every nth response is reinforced.
  2. Fixed interval (FI), in which reinforcement occurs after the passage of a specified length of time from the beginning of training or from the last reinforcement, provided that at least one response occurred in that time period.
  3. Variable ratio (VR), in which the number of responses required between reinforcements varies, but on average equals a predetermined number.
  4. Variable interval (VI), in which reinforcement occurs after the passage of a varying length of time around an average, provided that at least one response occurred in that period.

Ratio schedules produce higher rates of responding than interval schedules. Variable schedules produce higher rates than fixed schedules. The variable ratio schedule produces both the highest rate of responding and the greatest resistance to extinction (that is, resistance to "petering out"). One notable example is gambling behavior. In the fixed ratio schedule, there's a pause after a reinforcer is delivered. This is called a post-reinforcement pause. The fixed interval schedule do produce post-reinforcement pauses, but they are scalloped-shape. Any responses produced before the elapsed time are not reinforced, therefore a subject has learned to respond at a gradual rate.

Positive vs. negative

Positive reinforcement changes the animal's surroundings by adding a stimulus: a physical object (like a food pellet or paycheck) or energy (like light from a lamp).

Negative reinforcement changes the surroundings by removing an aversive stimulus - such as turning off a painful electric current or removing a hated ex-spouse's picture. Speaking colloquially, an aversive stimulus is something the animal finds "bad;" its removal is thus a "good" thing from the animal's point of view.

some "bad" thing
(aversive stimulus)

some "good" thing
(reinforcing stimulus)

presented

positive punishment

positive reinforcement

taken away

negative reinforcement

negative punishment

Distinguishing "positive" from "negative" in these cases is largely a matter of emphasis. For example, in a very warm room, a current of external air serving as reinforcement may be positive because it is relatively cool but negative because it removes the uncomfortably hot air. Furthermore, the distinction seems to have no real use in research or applied psychology, although one may some day be found. Until then, many behavioral psychologists simply refer to reinforcement or punishment—without polarity—to cover all consequent environmental changes.

Punishment

Punishment is any change in an animal's surroundings that occurs after a given behavior and seems to reduce the frequency of that behavior. As with reinforcement, it is the behavior, not the animal, that is punished. Whether a change is or is not punishing is only known by its effect on the rate of the behavior, not by any "hostile" features of the change. In positive punishment or type I punishment, an experimenter punishes a response by adding an aversive stimulus into the animal's surroundings (a brief electric shock, for example). In negative punishment or type II punishment, a positive reinforcer is removed (as in the removal of a feeding dish). As with reinforcement, it is not usually necessary to speak of positive and negative in regard to punishment.

Punishment is not a mirror effect of reinforcement. In experiments with laboratory animals and studies with children, punishment decreases the frequency of a previously reinforced response only temporarily, and it can produce other "emotional" behavior (wing-flapping in pigeons, for example) and physiological changes (increased heart rate, for example) that have no clear equivalents in reinforcement.

Punishment is considered by some behavioral psychologists to be a "primary process" – a completely independent phenomenon of learning, distinct from reinforcement. Others see it as a category of negative reinforcement, creating a situation in which any punishment-avoiding behavior (even standing still) is reinforced.

Aversive stimulus, punisher, and punishing stimulus are synonyms. Punishment may be used for (a) an aversive stimulus or (b) the occurrence of any punishing change or (c) the part of an experiment in which a particular response is punished.

Other reinforcement terms

  1. An unconditioned reinforcer, sometimes called a primary reinforcer, is a stimulus or situation considered to be inherently reinforcing (such as affection, food, or opportunity for sleep).
  2. A conditioned reinforcer, sometimes called a secondary reinforcer, is a stimulus or situation that has acquired reinforcing power after being paired in the animal's environment with an unconditioned reinforcer or an earlier conditioned reinforcer (such as praise).
  3. A generalized reinforcer is a conditioned reinforcer that has been paired with many other reinforcers (such as money).
  4. Differential reinforcement of incompatible behavior (DRI) is used in reducing an already frequent behavior without punishing it by reinforcing a specific incompatible response (like leaving a room so that fighting with someone in it is not possible).
  5. In differential reinforcement of other behavior (DRO), any behavior other than some undesired behavior is reinforced.
  6. Differential reinforcement of low response rate (DRL): a behavior is reinforced only if it occurred infrequently. "If you ask me for a potato chip no more than once every 10 minutes, I will give it to you. If you ask more often, I will give you none."
  7. Differential reinforcement alternate behavior (DRA): the reinforcers for the undesirable behavior are used instead for a more desirable behavior. For example, a teacher will pay attention to students who sit than those who walk or talk in class.
  8. In reinforcer sampling a potentially reinforcing but unfamiliar stimulus is presented to an animal without regard to any prior behavior. The stimulus may then later be used more effectively in reinforcement.
  9. Social reinforcement involves various sorts of access to and interaction with others.
  10. Satiation occurs when a stimulus that had reinforced some behavior no longer seems to do so.

Shaping & chaining

Shaping involves reinforcing successive, increasingly accurate approximations of a response desired by a trainer. In training a rat to press a lever, for example, simply turning toward the lever will be reinforced at first. Then, only turning and stepping toward it will be reinforced. As training progresses, the response reinforced becomes progressively more like the desired behavior. Chaining is similar but involves reinforcing various simple behaviors separately and then linking them together in a more complex series.

Controversies

The standard idea of behavioral reinforcement has been criticized as circular, since it appears define a reinforcer by an effect it will have in an as-yet unseen future. Other definitions have been proposed, such as F. D. Sheffield's "consummatory behavior contingent on a response," but these are not broadly used in psychology.

History of the terms

In the 1920s Russian physiologist Ivan Pavlov may have been the first to use the word reinforcement with respect to behavior, but (according to Dinsmoor) he used its approximate Russian cognate sparingly, and even then it referred to strengthening an already-learned but weakening response. He did not use it, as it is today, for selecting and strengthening new behavior. Pavlov's introduction of the word extinction (in Russian) approximates today's psychological use.

In popular use, positive reinforcement is often used as a synonym for reward, with people (not behavior) thus being "reinforced," but this is contrary to the term's consistent technical usage. Negative reinforcement is often used by laypeople and even social scientists outside psychology as a synonym for punishment. This is contrary to modern technical use, but it was B. F. Skinner who first used it this way in his 1938 book. By 1953, however, he followed others in thus employing the word punishment, and he re-cast negative reinforcement for the removal of aversive stimuli.