TOP一般口演(若手道場)
 
一般口演(若手道場)
若手道場 モデリング、ハードウェア、応用
Wakate Dojo: Modeling, Hardware Implementation, and Applications
座長:林(高木) 朗子(理化学研究所脳神経科学研究センター)・谷口 忠大(立命館大学)
2022年6月30日 17:10~17:25 沖縄コンベンションセンター 会議場A2 第7会場
1WD07e2-01
人工神経回路と人における注意の比較
Comparison between attentions of artificial neural networks and those of humans

*赤星 宏知(1)、北澤 茂(1,2,3)、山本 拓都(4)
1. 大阪大学大学院生命機能研究科、2. 大阪大学大学院医学系研究科、3. 脳情報通信融合研究センター、4. 大阪大学医学部
*Hirosato Akahoshi(1), Shigeru Kitazawa(1,2,3), Takuto Yamamoto(4)
1. Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan, 2. Department of Brain Physiology, School of Medicine, Osaka University, Osaka, Japan, 3. Center of Information and Neural Networks, Osaka University, Osaka, Japan, 4. Faculty of Medicine, Osaka University, Osaka, Japan

Keyword: Artificial neural networks, unsupervised learning, eye-tracking, autism spectrum disorder

A recent artificial neural network with attention (vision-transformer, ViT; Dosovitskiy et al., 2020) outperformed traditional hierarchical networks like so-called Alex-Net (Krizhevski et al., 2012) in classifying images into pre-defined labels. However, the attention of ViT trained for classification with labels was rather noisy. A more recent study (Caron et al., 2021) reported that the attention of the 12-layer-ViT became noiseless when it was trained without labels (distillation with no labels, DINO) so that general information obtained from each image was maximized. We examined whether the attention of the ViT trained by the DINO protocol was similar to that of the mankind and whether we need all 12 layers to achieve a good performance. For this purpose, we compared eye movements of 104 participants while they viewed a 77-s-long video clip with the peak attention of the ViT. The eye-movement data were borrowed from Nakano et al. (2010). We prepared ViT’s with different number of layers (2, 4, 8 and 12) and simulated their gaze behaviors by calculating the peak of attention for each frame of the 77-s-long video clip. Multidimensional scaling was used to summarize the gaze behaviors so that the participants and ViT’s with similar gaze patterns would cluster together in a two-dimensional plane. As reported previously, children and adults with typical development clustered in the center, reflecting a standard gaze behavior, whereas participants with autism spectrum disorder (ASD) were distributed around the periphery. The gaze patterns of the 2-layer-ViT were placed in the periphery much farther from the center than those of the ASD participants. However, the gaze behaviors were improved as the number of layers increased from 2 to 4, 8, and 12 and became comparable to those of children with ASD. It was noteworthy that the best performance was achieved by the 8-layer-ViT, and the gaze behavior was comparable to the ordinary TD children. We suggest that the 8-layer-ViT could be a good model of human attention and used as a reference system for studying neural bases of attention.
2022年6月30日 17:25~17:40 沖縄コンベンションセンター 会議場A2 第7会場
1WD07e2-02
Monkey Act, Monkey React: An Automatic Video Captioning System of Wild Monkey Behaviors for Social Interaction Analysis of Nonhuman Primates
*Riza Rae Aldecoa Pineda(1,3), Takatomi Kubo(1), Masaki Shimada(2), Kazushi Ikeda(1)
1. Division of Information Science, Nara Institute of Science and Technology, Nara, Japan, 2. Department of Animal Sciences, Teikyo University of Science, Yamanashi, Japan, 3. Department of Computer Science, University of the Philippines, Diliman, Quezon City, Philippines

Keyword: Video Captioning, Animal Behavior Modelling, Deep Learning

Primate behavior studies have gained substantial traction over the years, providing more comprehensive knowledge on the foundations and evolution of both human and nonhuman primate cognition. As primates are social animals, kinship, interactions, and social dominance based on physical prowess and formed alliances play a significant role in their behavioral and cognitive development. Observing social signals and interactions, which are complex webs of action-reaction events of varying intensities, provides valuable data for determining how these factors shape an individual and its population. However, collecting adequate data for such studies requires a significant amount of time and effort. Current sampling methods either involve observing focal individuals or units for a specific period, noting all instances of behaviors that occurred, or scanning an entire group, noting only target behaviors. Video recordings provide permanent and reliable visual information that can capture fast, complex, and rare behaviors in great detail. Despite this, analyzing videos during post-collection is tremendously time-consuming, where a minute of behavior could take hours to investigate, even for highly-trained experts.

With the goal of improving data processing with less human intervention, various studies have proposed state-of-the-art deep learning methods for tracking and classifying the behaviors of animals from videos. Video captioning, which aims to generate descriptive sentences about a video's contents, has gained much attention in the past decade, primarily focusing on improving descriptions for human actions. Deep architectures designed for this generally follow the encoder-decoder structure, where the encoder extracts spatio-temporal features using CNNs and RNNs, and the decoder generates accurate captions using RNNs. With current methods trained on human datasets, such tools have yet to be constructed for animal behavior studies. Without context to surrounding objects or events, behavior classification provides a limited view of the causality and consequences of actions, especially for complex animals. We propose a novel automated video captioning system for nonhuman primates such as monkeys to bridge this gap. Our system improves the overall efficiency of the traditional behavior analysis framework by automating the preprocessing stage and, as a step further, provides significant insight into the transferability of human behavior features to other primates.
2022年6月30日 17:40~17:55 沖縄コンベンションセンター 会議場A2 第7会場
1WD07e2-03
高次元神経ダイナミクスの情報熱力学的コスト・速度限界
Information-Thermodynamic Cost and Speed Limit in High-dimensional Neural Dynamic

*関澤 太樹(1)、神谷 俊輔(1)、伊藤 創祐(2,3)、大泉 匡史(1)
1. 東京大学大学院総合文化研究科、2. 東京大学大学院理学系研究科物理学専攻、3. 国立研究開発法人科学技術振興機構
*Daiki Sekizawa(1), Shunsuke Kamiya(1), Sosuke Ito(2,3), Masafumi Oizumi(1)
1. Graduate School of Arts and Scence, The University of Tokyo, Tokyo, Japan, 2. Department of Physics, Graduate School of Science, The University of Tokyo, Tokyo, Japan, 3. JST, PRESTO, Saitama, Japan

Keyword: COMPUTATIONAL NEUROSCIENCE, STOCHASTIC THERMODYNAMICS, ECOG

The brain enables various information processing by flexibly transitioning to different brain states. Since transitioning between states involve a transition cost, quantifying the cost can reveal the characteristics of information processing in the brain in terms of which transitions are easy or difficult. For this purpose, several methods for quantifying transition costs have been proposed. However, considering situations biological systems face with where state transitions must be performed as quickly as possible within limited resources, simply quantifying the costs is not sufficient. In addition to that, it will be important to evaluate how efficiently the consumed costs are converted to realizing the state transitions. Here, we propose a framework to quantify the efficiency of the state transitions in the brain based on stochastic thermodynamics. Recently, Dechant et al. (2021) derived the cost-speed trade-off inequality, which states that the maximum speed of a transition a system can attain is bounded by the consumed cost, where the cost is entropy production and the speed is a traveled path length in the information geometrical space in a fixed duration. Based on this theory, we can define the efficiency η (0 ≤ η ≤ 1) as the ratio of the consumed cost converted to the speed of the travel. If η is large, it means that the transition is close to the speed limit under the consumed cost. We applied this framework to the human electrocorticography (ECoG) data set (Miller, 2019), and evaluated the efficiency of the state transitions during a simple visual task and a working memory task. We regarded the time course of the probability distribution of the event-related potentials (ERP) as the traveled path in the information geometrical space and computed the efficiency. To capture the characteristics of high-dimensional brain dynamics, we evaluated how the estimated efficiency η changes as the number of electrodes for the analysis (dimension) is increased. We found that irrespective of subjects and tasks, the efficiency η is about 0.8 in lower dimensions, and converges to about 0.5 as dimension increases. We expect our framework offers a novel perspective to assess how efficiently the brain performs various functions through state transitions under limited resources.
2022年6月30日 17:55~18:10 沖縄コンベンションセンター 会議場A2 第7会場
1WD07e2-04
モデルと学習則の設計に起因する報酬分布の変容
Achieving desirable reward distributions by design

*ホーランド マシュー(1)
1. 大阪大学
*Matthew J. Holland(1)
1. Osaka University

Keyword: Learning algorithms, Reward distributions

生物から人工物まで学習能力を持つシステムの根幹をなすのは、フィードバックである。経験を通して自らの状態を評価し、その評価を次の行動につなげていく。機械学習の技術に焦点を当てると、フィードバックとは損失や報酬の算出と処理にあたる概念であるが、学習過程において刻々と得られるフィードバックが当の学習システムの「性能」とどのように関係するか、改めて検討する必要がある。

まず、現代の機械学習では多くの損失関数が整備されていることはよく知られている。たとえば、識別誤差、二乗誤差、識別マージンに基づく凸ポテンシャル関数(ヒンジ損失など)、ロジスティクス損失など、多種多様である。学習問題やデータセットの特性に合わせて、「どの損失関数を使うか」を決め、その損失をフィードバック形成の基軸とする。ただし「損失を計算すること」は「学習システムの性能を測ること」とは根本的に違うことに注意したい。損失はあくまで具体的なタスク(予測、分類、圧縮と復元など)における成功度を示すものであり、これだけでは学習の成否は決まらない。

それでは、学習システムの性能の本質は一体どこにあるかというと、「汎化能力」である。概念的にこれは当然のことだが、「汎化能力を実際にどう測るか」という問いは決して自明ではない。しかし、特に統計的機械学習の研究分野では「汎化能力は損失の平均値に決まっている」という考え方が問われない「常識」として完全に定着しており、学習システムが設計の段階から平均的な性能に偏重していることが客観視できない状況であるといえる。

そこで本研究では、汎化能力の本質は損失や報酬の分布に宿るという考え方を起点に、学習システムの本来の目的と望ましい報酬分布の整合性を求める次世代の方法論を検討している。その一環として、本研究では、ニューラルネットワークの典型的なアーキテクチャ、初期化法、正則化法、最適化法などが学習過程における報酬分布にどのような影響を及ぼすかを入念に検証し、その知見を「汎化指標そのものを選択する」という新たな意思決定を取り入れた機械学習ワークフローに結びつける。今回の発表では代表的な失敗例と成功例を両方取り上げ、設計法と報酬分布の関係性を明らかにした上で、試行錯誤の軽減を念頭に置いた設計法の可能性について論じる。
2022年6月30日 18:10~18:25 沖縄コンベンションセンター 会議場A2 第7会場
1WD07e2-05
深層強化学習を用いたシミュレーション環境におけるロボット股義足の歩行位相に基づく制御
Gait phase-based control of a robotic prosthesis for hip-disarticulation prosthesis using a simulation environment with deep reinforcement learning

*栗原 徹(1)、植山 祐樹(2)、原田 正範(2)
1. 防衛大学校理工学研究科、2. 防衛大学校機械工学科
*Toru Kurihara(1), Yuki Ueyama(2), Masanori Harada(2)
1. Graduate School of Science and Engineering, National Defense Academy of Japan, 2. Department of Mechanical Engineering, National Defense Academy of Japan

Keyword: deep learning, neural network, robotic prosthesis, gait simulation

In individuals with lower-limb amputations, robotic prostheses may improve their quality of life to enhance their mobilities. However, it has remained difficult for safe and reliable prosthetic-limb control strategies to achieve natural and seamless gaits according to several situations, e.g., level-ground walk, and ascent or descent slope. To develop a controller for the robotic prostheses, a technique of user gait phase estimation may play a key role. The gait phase is the percentage of time elapsed since the last gait event, such as heel strike, and is a promising framework to properly support the robotic prostheses during cyclic tasks. Accurate estimation of the gait phase may greatly reduce the burden of walking on the user. The objective of this study is to develop a control method of a robotic prosthesis for hip-disarticulation amputees using gait phase estimation. Thus, we constructed a simulation environment to evaluate the control performance. The simulation environment was composed of a humanoid model and a neural network controller generating prosthetic gait motions of the hip-disarticulation amputees. The humanoid model mimicked the human skeleton with 10 degrees of freedom. We replicated the prosthetic gait motion to replace actuators of the lower-limb joints with non-actuated passive joints. The neural network controller was obtained using the deep deterministic policy gradient (DDPG) algorithm which is a model-free, online, and off-policy reinforcement learning method. The DDPG is composed of two deep neural networks, i.e., actor and critic networks, that searches for an optimal policy that maximizes the expected cumulative long-term reward. Under the simulation environment, we implemented a gait phase estimation method using walking events based on with or without the heel contact. The gait phase could be estimated accurately, and it allowed to improve the control performance of the robotic prosthetic limb. In accordance with this study, our proposed method could improve the control performance in the simulation environment. Therefore, the method also seems effective in the real environment to generate natural walking motions for the robotic prosthesis. In future work, we are implementing the method into the actual prosthesis.