TOPシンポジウム
 
シンポジウム
36 適応的・予測的行動制御を支える並列的・階層的神経メカニズム
36 Parallel and hierarchical neural mechanisms for adaptive and predictivebehavioral control
座長:松本 正幸(筑波大学)・五味 裕章(NTT コミュニケーション科学基礎研究所)
2022年7月1日 9:04~9:23 ラグナガーデンホテル 羽衣:中 第9会場
2S09m-01
ヒューマノイドロボットの階層的運動学習
Hierarchical Humanoid Motor Learning

*森本 淳(1,2)
1. 京都大学、2. 国際電気通信基礎技術研究所(ATR)
*Jun Morimoto(1,2)
1. Kyoto University, 2. ATR

Keyword: Humanoid robot, Motor learning, Reinforcement learning, Whole-body control

For high-dimensional systems such as humanoid robots, a huge number of trials is required to acquire a global optimal policy that covers the entire state space thorough model-free policy learning methods. Therefore, for real-world applications, finding a local optimal trajectory-based policy based on model-based approaches is favorable. In our group, we have proposed a computationally efficient hierarchical motor learning method for real-time humanoid control based on the idea of a singular perturbation method, and inspired by the hierarchical control architecture used in sensorimotor circuits in the brain. Specifically, similar to sensorimotor circuits where environmental information is received into all hierarchical layers, here also information from the sensors is sent to all constituent layers where it can be processed in parallel. The top layer of the hierarchy uses a long-term horizon and a large time-step size to optimize entire body movements whereas the middle layer uses a short-time horizon and a small time-step size to optimize the motion of each limb using model predictive control calculations. Specifically, we extract fast dynamics from the humanoid robot system by introducing two different time scales. When compared with the system with smaller time scale, the larger-time-scale system can be considered to be a static environment. We then focus on optimizing the movements that belongs to the smaller-time-scale dynamics for the short-time horizon with the small time-step size. Therefore, the middle layer can quickly re-plan movement to cope with rapid changes in the environment. At the bottom layer, a reflex-based controller maintains the robot’s posture with a very short control period. This bottom-layer controller is not model-based, but rather is inspired by and similar to the reflex-based controller found in biological systems. We evaluated our framework in skating tasks with simulated and real lower-body humanoids that have rollers on their feet. Our simulated robot was able to generate agile motions in real time, including flipping down a cliff. In a real lower-body humanoid, our model was also able to successfully generate sliding movement down a slope, indicating its effectiveness for controlling agile movement in humanoid robots.
2022年7月1日 9:23~9:42 ラグナガーデンホテル 羽衣:中 第9会場
2S09m-02
潜在感覚運動制御ループの調節と表現
Modulation and representation of implicit sensorimotor loops

*五味 裕章(1)
1. NTTコミュニケーション科学基礎研究所
*Hiroaki Gomi(1)
1. NTT Communication Science Labs.

Keyword: hierarchical sensorimotor loop, visuomotor response, implicit sensorimotor control, Bayesian inference

Many studies on information processing for sensorimotor control in the brain shed light on the computational frameworks involved in hierarchical and parallel mechanisms (e.g., Allen and Tsukahara 1974; Doya 1999; Merel et al. 2019). Basic physiological mechanisms of the peripheral sensorimotor control have widely investigated for a century, and many researches on the modulation of reflexive manual responses showed that reflexive responses are adjusted in accordance with the voluntary reactions (Evarts & Tanji 1976; Rothwell et al., 1980). However, essential question still remains how the low-level processing (e.g., reflexive response) is adjusted in hierarchical computation.
Here I will introduce two studies on the rapid visuomotor response evoked by a large-field visual motion during reaching movement, named manual following response (MFR). First one is to investigate whether the response is modulated without assistances of voluntary reactions. To examine this, the contexts of postural and visual stabilities were manipulated in the experiment. The results showed that the MFR increased by giving the unstable postural context, and decreased by giving the unstable visual context. These modulations are successfully explained by the Bayesian optimal formulation in which the manual response is ascribed to the compensatory response to the estimated self-motion affected by the preceding contextual situations.
The second one is to elucidate how the representation of this visuomotor response has been acquired. Based on the idea that the MFR is considered as a compensatory response to the postural motion, we hypothesized that the stimulus-dependent MFR modulation shown in previous studies (Gomi et al. 2006) is characterized in estimating self-motion from the visual motion. To examine this hypothesis, we created a convolutional neural network (CNN) model to estimate self-motion from image sequence recoded by the head mount camera during daily body motions. The spatiotemporal frequency tuning of the CNN to the visual stimulus has a peak at high temporal and low spatial frequency, which is similar to that of MFR, supporting the hypothesis above.
These studies indicate that the reflex response, MFR, functions to adjust arm movements for postural variations inferred from visual motion and prior states of postural stability. We need to further explore various hierarchical interaction mechanisms among sensorimotor systems, which realize well-coordinated behaviors.
2022年7月1日 9:42~10:01 ラグナガーデンホテル 羽衣:中 第9会場
2S09m-03
経済的意思決定におけるサル眼窩前頭皮質、腹側線条体および中脳ドーパミンニューロンの役割
Distinct roles of the orbitofrontal cortex, ventral striatum, and midbrain dopamine neurons of non-human primates in economic decision-making

*惲 夢曦(1)
1. ハーバード大学
*Mengxi Yun(1)
1. Harvard University

Keyword: Economic decision-making, Orbitofrontal cortex, Dopamine neurons, Ventral striatum

Economic decision-making is a ubiquitous behavior in nature. It can be divided into several sub-processes: option evaluation, option selection, and decision evaluation. It has traditionally been thought that the prefrontal cortex computes option selection whereas subcortical regions participate in options and outcome evaluations. Our recent studies challenge this view, however, by providing evidence supporting that subcortical dopamine (DA) neurons lead the value-to-choice transformation process, while the orbitofrontal cortex (OFC) participates in decision evaluation by comparing actual choice outcome (what was gained) with counterfactual choice outcome (what would have beengained with a different choice).
We developed an economic decision-making task in which monkeys were sequentially offered by two options and were required to select the first, or to reject it in favor of the upcoming second during the first option presentation. When the monkey was making a decision about the first option, we found that DA neurons represented diverse signals related not only to the option value but also to the animal’s choice. The time-course of these signal representations corresponded to the value-to-choice transformation. Although these dynamics were also observed in the OFC, the choice signal in DA neurons preceded that of OFC neurons, suggesting that the value-to-choice transformation is completed earlier in DA neurons.
We next investigated that after the monkey finalized its decision, how the actual and counterfactual choice outcomes were processed in the OFC, DA neurons, and the ventral striatum (VS), another key region in the subcortical reward system. Consistent with previous studies, we observed that all three regions robustly represented the actual value of the second option (i.e., when the monkey rejected the first option). On the contrary, the counterfactual value of the second option (i.e., when the monkey chose the first one) was dominantly represented in the OFC compared with the VS, but not in DA neurons. This suggests a gradient in the capacity of processing counterfactual value from cortical to subcortical reward systems. Further, the OFC represented the actual and counterfactual values simultaneously and antagonistically, providing a possible mechanism for the comparison between actual and counterfactual outcome values. Together, our studies suggest a new way of cooperation between cortical and subcortical regions during economic decision-making.
2022年7月1日 10:01~10:20 ラグナガーデンホテル 羽衣:中 第9会場
2S09m-04
モデルベースとモデルフリー強化学習システムの間の非同期競合と協調
Asynchronous competition and cooperation between model-based and model-free reinforcement learning systems

*内部 英治(1)
1. 国際電気通信基礎技術研究所
*Eiji Uchibe(1)
1. Advanced Telecommunications Research Institute International

Keyword: Model-based reinforcement learning, Model-free reinforcement learning, Parallel learning architecture

Reinforcement learning (RL), which is a framework for learning an optimal policy from environmental rewards, has been extensively investigated in neuroscience, psychology, artificial intelligence, and robotics. RL algorithms are typically categorized into model-based RL, which explicitly estimates an environmental model and a reward function, and model-free RL, which directly learns a policy from real or generated experiences. Habitual and goal-directed strategies can be algorithmically regarded as model-free and model-based approaches. A model-free RL is simple, inflexible, and fast; a model-based RL is complex, flexible, and deliberative. Although many neuroscientific studies have described the evidence for the parallel existence of model-based and model-free RL systems in animals and humans, the neural or computational mechanisms by which one RL system dominates the other in controlling behavior remain an ongoing investigation. To elucidate how the arbitration mechanism determines which of these RL systems controls behavior at a single moment in time, we show an asynchronous parallel reinforcement learning framework that can coordinate model-based and model-free RL systems according to the learning progress. We focus on the differences in control frequencies. Our proposed method consists of multiple RL systems, one of which is stochastically selected based on the state value functions that represent the expected amount of rewards. Our work’s main contribution is separating the replay buffers collected by each learner and transforming the experience replay buffer to absorb the differences in control frequencies. To investigate how and when our proposed method switches between model-based and model-free RL, we conduct MuJoCo benchmark problems. We compared our proposed method with a case that ignored the difference in control frequencies (i.e., a synchronous version). The results show that our proposed algorithm selected the simple model-based method with a short control frequency in the early stage of learning, the complex model-based method in the middle stage of learning, and the model-free method in the late learning stage.
2022年7月1日 10:20~10:39 ラグナガーデンホテル 羽衣:中 第9会場
2S09m-05
行動柔軟性における側坐核の並列神経回路機構
Parallel neural network mechanisms of nucleus accumbens in flexible behavior

*疋田 貴俊(1)、マクファーソン トム(1)
1. 大阪大学蛋白質研究所
*Takatoshi Hikida(1), Tom Macpherson(1)
1. Inst Protein Res, Osaka Univ, Suita, Japan

Keyword: Behavioral flexibility, Basal ganglia, Reversal learning, Attentional set-shifting

Behavioral flexibility refers to the adaptation of behavior in response to changes in the internal or external environment, and is a critical skill for survival in our everchanging world. Indeed, impaired behavioral flexibility is a major characteristic of several neurodegenerative disorders including Alzheimer’s, Huntington’s, and Parkinson’s diseases, as well as psychiatric conditions, including schizophrenia, autism spectrum disorders, and obsessive-compulsive disorder. Flexible goal-directed behavior is thought to be collaboratively controlled by cognitive/associative and limbic information processing cortico-basal ganglia-thalamo-cortical loop circuits. In these basal ganglia networks, the outputs of the striatum/nucleus accumbens (NAc) are transmitted through two parallel projection pathways formed of dopamine D1 receptor- or D2 receptor-expressing medium spiny neurons (D1-/D2-MSNs) that are differentially modulated by intra-NAc dopamine availability. However, the role of each NAc pathway in controlling reward and aversive learning, as well as flexibility of behavior has been unclear. Here, we use transgenic mice in combination with pathway-specific expression of viral vectors to create mice in which transmission-blocking tetanus toxin was specifically expressed in the NAc D1- or D2-MSN pathways (D1-/D2-MSN-blocked mice). Using this technique, we first revealed distinct functional roles for parallel NAc pathways in limbic control: the D1-MSN pathway was critical for reward-based learning, while the D2-MSN pathway was critical for aversive learning. We next investigated the contribution of these parallel NAc pathways in guiding two learning abilities crucial for flexible behavior: reversal learning (switching between learned action-outcome associations) and set-shifting (switching to a new cognitive strategy). Using an attentional set-shifting task, we revealed that the NAc D2-MSN, not the D1-MSN, pathway controlled flexible behavior relying on the ability for reversal learning, but not set-shifting. These findings provide new insights into the neural network mechanisms of parallel cortico-basal ganglia-thalamo-cortical loops in the control of limbic and cognitive functions.
2022年7月1日 10:39~10:58 ラグナガーデンホテル 羽衣:中 第9会場
2S09m-06
大脳基底核による姿勢と歩行の制御;Parkinsonとの関連において
Posture-Gait Control by the Basal Ganglia with reference to Parkinson's disease

*高草木 薫(1)
1. 旭川医科大学
*Kaoru Takakusaki(1)
1. Asahikawa Medical University

Keyword: Parkinson's disease, gait automatization, anticipatory postural adjustment, Vulnerable-neurotransmitters

I review the current understanding of the mechanisms of posture-gait control by the basal ganglia (BG) with reference to the posture-gait disturbance in Parkinson’s disease (PD). The posture-gait disorder in PD is a failure in habitually acquired gait automatization. The incidence of gait disturbances increases as the stage of this disease progresses, increasing the risk of falls. While the Lewy body (LB) degeneration of the dopamine neurons is the core pathology of PD, damages also exist in the cholinergic, serotonergic, and noradrenergic neurons vulnerable to synucleinopathy. Moreover, the brainstem and spinal cord are the major sites of LB degeneration. Recent studies suggest that various repertoires of gait patterns are integrated depending on the context, and they become habits as a series of automatized gait behaviors. The core network of automatized gait control exists in the BG, midbrain, including the mesencephalic locomotor region (MLR), and the lower spinal cord. In addition, the BG and cerebellum alter activities of the cerebral cortex via the thalamocortical network, brainstem, and limbic system in a context-dependent manner through reward-oriented and error-based learning. Because each structure receives either efferent copies of the motor command or multisensory feedback, spatiotemporal coordination of these signals ensures that the extensive repertoires of gait patterns are appropriately coupled to anticipatory or reactive postural adjustments to achieve goal-directed gait behaviors. Therefore, redundancies in the organization of whole systems may allow adaptation and compensation. However, multiple dysfunctions may compromise the capacity to sufficiently adapt and sometimes leads to maladaptive changes and impair gait control. Here, we present a working hypothesis on the pathophysiological mechanism of posture-gait disturbances in PD based on findings in our animal experiments so far. Specifically, I mention the following three issues. The first is that the damage in cortical visuomotor processing may disturb anticipatory postural adjustment. The second possibility is excessive GABAergic BG output to the MLR disturbs locomotor rhythm, and to the pedunculopontine nucleus (PPN) induces muscular rigidity. Thirdly, damages of the PPN cholinergic neurons may imbalance the activity in the reticulospinal and vestibulospinal tracts to elicit anteflexion posture.