報酬と意思決定
Reward and Decision Making
O2-8-3-1
不確実な報酬予測に対する眼窩前頭皮質神経細胞の反応ーリスク?もしくはサリエンス?
Risk-responsive orbitofrontal neurons track acquired salience

○小川正晃1,2
○Masaaki Ogawa1,2, Matthijs A. A. van der Meer3, Guillem R. Esber4, Domenic H. Cerri2, Thomas A. Stalnaker4, Geoffrey Schoenbaum4
マサチューセッツ工科大学1, メリーランド大学2, ウォータールー大学3, 米国国立衛生研究所4
Massachusetts Institute of Technology, Cambridge, MA, USA1, University of Maryland, Baltimore, MD, USA2, University of Waterloo, Ontario, Canada3, NIDA, Baltimore, MD, USA4

Decision-making is impacted by reward uncertainty and risk (i.e. variance). Activity in the orbitofrontal cortex, an area implicated in decision-making, has been shown to covary with these quantities. However, this activity could reflect the heightened salience of situations in which multiple outcomes - reward and reward omission - are expected. To resolve these accounts, rats were trained in a simple odor-cued response task, in which 4 different odor cues were associated with 4 different probabilities of reward, 100, 67, 33 and 0%, respectively. Consistent with prior reports, some orbitofrontal neurons (36%) fired differently in anticipation of uncertain (33% and 67%) versus certain reward (100% and 0%). However, over 90% of these neurons also fired differently prior to 100% versus 0% reward (or baseline), or prior to 33% versus 67% reward. These responses are inconsistent with risk, but fit well with the representation of acquired salience linked to the sum of cue-outcome and cue-no-outcome associative strengths. Thus, these results suggest a novel mechanism whereby the orbitofrontal cortex might regulate learning and behavior.
O2-8-3-2
強化学習の神経回路機構:ドーパミンによる報酬予測誤差の表象機構に関する新たな仮説
Neural circuit mechanism of reinforcement learning: a new hypothesis on the mechanism of dopaminergic representation of reward prediction error

○森田賢治1, 森島美絵子2,3, 坂井克之4, 川口泰雄2,3
○Kenji Morita1, Mieko Morishima2,3, Katsuyuki Sakai4, Yasuo Kawaguchi2,3
東京大院・教育・身体教育学1, 生理研・大脳神経回路論2, 総研大・生理科学3, 東京大院・医・認知・言語神経科学4
Physical and Health Education, Univ of Tokyo, Tokyo1, Div Cerebral Circuitry, National Institute for Physiological Sciences, Okazaki2, Dept Physiol Sci, SOKENDAI, Okazaki3, Dept Cognitive Neuroscience, Univ of Tokyo, Tokyo4

Midbrain dopamine neurons have been suggested to encode temporal difference (TD) reward prediction error, but underlying circuit mechanism remains elusive. Recently we have proposed a mechanism for that (Morita et al., 2012, Trends Neurosci) based on new findings about two major subclasses of corticostriatal neurons, which predominantly project to the direct and indirect pathways of the basal ganglia, respectively. Specifically, by virtue of unidirectional connection from the former subclass to the latter and strong recurrent excitation only within the latter subclass, these two subclasses could respectively represent current and previous states/actions. Their values would then be calculated in the downstream direct and indirect pathway striatal neurons, which presumably up- and down-regulate the dopamine neuronal response via the output nuclei of the basal ganglia, in parallel with controlling action selection/execution and termination, respectively. Combined with inputs from the pedunculopontine tegmental nucleus that represent obtained reward, TD reward prediction error is computed in the dopamine neurons, which presumably controls corticostriatal plasticity so that the striatal neurons can represent updated reward predictions. Based on this hypothesis, we constructed a computational model of the cortico-basal ganglia system, and simulated learning tasks that were used in behavioral and electrophysiological studies. The results show that our model successfully reproduces the observed across- and within-trial changes in the dopamine neuronal response, as well as subject's choice behavior and changes in reaction times. Moreover, our model can also explain the observed distinct effects of manipulations of the direct or indirect pathway striatal neurons. Our model thus provides a closed-circuit unified account for the dopaminergic control of reinforcement learning and reward-oriented behavior, with rich predictions that are expected to be tested in the future.
O2-8-3-3
脚橋被蓋核におけるニューロン活動の増加・減少による報酬価値予測の表現
Reward prediction related increases or decreases in neuronal activity of the monkey pedunculopontine tegmental nucleus

○岡田研一1, 小林康1,2,3,4
○Ken-ichi Okada1, Yasushi Kobayashi1,2,3,4
大阪大学大学院 生命機能研究科1, 大阪大学・社会経済研究所・行動経済学研究センター2, 脳情報通信融合研究センター3, JSTさきがけ4
Osaka Univ Grad School of Frontier Biosciences, Toyonaka1, Osaka Univ Research Center for Behavioral Economics, Suita2, Center for Information and Neural Networks, Osaka3, PRESTO, Japan Science and Technology Agency (JST), Saitama4

The pedunculopontine tegmental nucleus (PPTN) of the brainstem receives afferent inputs from reward-related structures, including the cerebral cortices and the basal ganglia, and in turn provides strong excitatory projections to dopamine neurons. This anatomical evidence predicts that PPTN neurons may carry reward information and contribute computation of reward prediction error on dopamine neurons.We previously examined neuronal activity of the PPTN during a reward-biased visually guided saccade task, where the magnitude of given reward (large/small) was cued by the shape of the initial fixation target (FT). A population of PPTN neurons tonically increased their activity during task execution period, and some showed stronger responses to large reward-predicted cues than that to small-reward cues and might encode the predicted reward value. In addition, some neurons started to change their tonic activity even before the FT appearance, possibly in anticipation of upcoming task event. A partially overlapped population of neurons showed the prediction of reward value-related response and anticipatory response by their tonic activity.Here we show another group of PPTN neurons, which exhibited reverse response patterns to tonic increasing neurons. These neurons exhibited rather high frequency spontaneous activity during inter trial interval, and then the activity was tonically decreased around the time of FT appearance and rebounded after reward delivery. Some of these neurons showed smaller activity for large-reward predicted cue; reverse to the modulation pattern of the tonic increasing neurons. Predictive decreases in activity were also evident in some of tonic decreasing neurons.The opposite and systematic modulation patterns of tonic increasing and decreasing neurons suggest that the PPTN neurons could send both positive and negative reward prediction components to dopaminergic neurons, which are necessary for the computation of the reward prediction error signal.
O2-8-3-4
Representation of cue-reward contingency by perirhinal(PRh) neurons is flexible but the flexibility requires a long-term learning of possible contingencies
○Manoj Eradath1,2, Tsuguo Mogami1, Keiji Tanaka1
Lab for Cognitive Brain Mapping, Brain Science Institute, RIKEN1, Graduate School of Science and Engineering, Saitama University2

To further clarify the nature of reward-related activities in PRh cortex, we recorded activities of PRh cells while cue-reward contingency was changed. A visual cue presentation and reward delivery (or no reward) were repeated twice in a trial. There was a fixed cue-reward contingency in the first part: a half of 24 cues indicated a reward, and the remaining no reward. A reward was randomly provided in 50% of trials regardless of the cue in the second part. The eye fixation break frequency and water tube sucking strength showed that the monkeys anticipated the reward condition only in the first part. Responses of PRh cells to reward-contingent cues were consistently larger than those to no-reward-contingent cues in the first part, whereas there were no such differences in the second part. After several months of recordings, we reversed the conditions between first and second parts of a trial. The monkeys' behavior followed the change within a few days; however, many PRh cells continued to exhibit differential responses in the first part for about one month. After this period, PRh cells displayed differential activities only in the second part. We then started the switch within single days. The cue-reward contingency existed only in the first part of a trial for the first 400 trials, and then it moved to the second part for the remaining trials of the day. The monkey followed the switch within 100 trials. Many of recorded PRh cells also followed the switch: differential responses moved from the first to the second part of trial within the day. Thus, it took a month for PRh cells to learn the new temporal context of cue-reward contingency when the switch was first introduced. However, once the monkey had become familiar with the switch, PRh cells flexibly followed it within 100 trials. The representation of expected reward by PRh cells can follow changes in temporal context of contingency, but this flexibility requires a long-term learning of possible contingencies.
上部に戻る 前に戻る