TOPシンポジウム
 
シンポジウム
12 世界モデルと深層強化学習の展開
12 Advancement in World model and Deep Reinforcement Learning
座長:谷口 忠大(立命館)・鈴木 雅大(東京大学大学院工学系研究科)
2022年6月30日 14:05~14:25 ラグナガーデンホテル 羽衣:東 第8会場
1S08a-01
自己教師あり学習に基づく知能のモデル
Model of Intelligence based on Self-Supervised Learning

*松尾 豊(1)
1. 東京大学
*Yutaka Matsuo(1)
1. The University of Tokyo

Keyword: deep learning, self-supervised learning, world model, intelligence

The field of deep learning is progressing rapidly, and the importance and effectiveness of self-supervised learning is well recognized. In this talk, I will propose a model of intelligence based on self-supervised learning. First, a world model is created by self-supervised learning through observation and action. The world model is a deep generative model of the external world based on experience, and is useful for planning actions. The proposed model of intelligence is a two-story model, where the world model is the first floor and the second floor is responsible for linguistic input and output. The world model on the first floor is used from the outside, and language drives the world model based on verbal input. In other words, "imagining" is the primitive way of understanding meaning. Since linguistic input and output is a series of symbols, it can be regarded as a Turing machine. For a given linguistic task, a corresponding algorithm is learned. As a function of these learned algorithms, various intelligent processes such as memory, recall, reasoning, and symbol processing are realized. Historically speaking, it is the progress of human civilization that has led to the accumulation of various useful modules through the advancement of language tasks. In current deep learning, there is a fundamental technical problem, which is why these mechanisms are only partially realized. The correspondence between these models of intelligence and the brain will also be discussed. Basically, I will show that there is a good correspondence between the structure of the brain and models of intelligence. The relationship with consciousness, evolution, etc. will also be discussed.
2022年6月30日 14:25~14:45 ラグナガーデンホテル 羽衣:東 第8会場
1S08a-02
予測符号化と能動的推論に用いた認知ニューロロボティクス研究
Studies on cognitive neurorobotics using the framework of predictive coding and active inference

*谷 淳(1)
1. 沖縄科学技術大学院大学
*Jun Tani(1)
1. Okinawa Institute of Science and Technology

Keyword: predictive coding, active inference, free energy principle, neurorobotics

本研究の焦点は、認知エージェントが世界との反復的な相互作用を経て、どのような表現と機能を発達させることができるかを調査することである。 この目的のために、我々は過去20年間以上に及び予測符号化と能動的推論に基づく様々なモデルを提案し、その作動をロボット実験を通じて検証してきた。 本講演では、そうしたロボット実験で発見された一連の興味深い現象を紹介する。 これらの発見を通じて、自明ではない身体的認知メカニズムについてより深い理解が得られると期待する。
2022年6月30日 14:45~15:05 ラグナガーデンホテル 羽衣:東 第8会場
1S08a-03
世界モデルの産業応用を目指して: 精度・転移可能性・サンプル効率の改善に対する取り組み
Towards accurate, transferable, and sample efficient world models for industrial application

*奥村 亮(1)、谷口 忠大(2,1)
1. パナソニック株式会社、2. 立命館大学
*Ryo Okumura(1), Tadahiro Taniguchi(2,1)
1. Panasonic Corporation, 2. Ritsumeikan University

Keyword: World model, Contrastive learning, Imitation learning, Newtonian VAE

World model is promising technology for robot control from images. In many cases, however, model accuracy, transferability and sample efficiency are not enough for industrial application like pick-and-place, fitting such as connector insertion and wire manipulation. We introduce some examples to improve the world model from these perspectives. In the first case, we capture tiny objects in the images to improve the model accuracy for fine manipulation. We rewrite ELBO loss function of the world model and apply contrastive learning to train the world model without using image decoders which probably ignore small pixels. Moreover, we apply the contrastive learning for multi-view images to extract 3D information in workspaces. Second, we explain domain agnostic world models, which eliminate domain dependent information from state representation. We use domain adversarial loss and domain conditional decoder to disentangle and eliminate domain information from state representation. Imitation learning in latent space is achieved by using expert data in different domain from agent. In third case, we acquire the world model that satisfies Newton's law of motion and achieve proportional control in the latent space. We adopt recurrent state space model to obtain contextual information from observation history and induce forward dynamics model that is equivalent to Newtonian physics. We improve sample efficiency drastically to induce the strict constraints into the forward dynamics and utilize classical method for control. In these cases, acquiring adequate latent space for a specific task is a key factor. We further discuss advantages and limitations of these methodologies.
2022年6月30日 15:05~15:25 ラグナガーデンホテル 羽衣:東 第8会場
1S08a-04
ロボット制御における「永久学習機関」を目指して
Toward a Perpetual Learning Robot

*グウ セイショウ(1)
1. グーグル
*Shixiang Shane Gu(1)
1. Google

Keyword: Deep Learning, Reinforcement Learning, Robotics, World Models

Many supervised learning and generative modeling applications (e.g. computer vision, NLP, molecular biology, etc.) have experienced exponential progresses with exponential growths in data and computation. Recent models such as DALL-E and GPT-3, termed foundation models, are essentially perpetual learning machines, capable of learning new concepts and capabilities by simply ingesting more data with minimal human engineering. Reinforcement learning (RL) applications such as robotics control, however, are mostly limited to successes in narrow domains, where learning requires substantial human inputs and learned knowledge is often non-transferrable. How can we develop a continual learning system for RL agents that automatically grow in capabilities without human interventions? I will discuss five necessary components of such a system: 3U2C (1) Universal Objective, (2) Universal Algorithm, (3) Universal Architecture, (4) Continual Data, (5) Continual Computation. I will touch on topics including: intelligence measures for RL agents, sequence modeling for RL, 3D invariance and objectness priors, bottlenecks in real-world robot learning, hardware-accelerated simulators, unsupervised RL and skill discovery, and progresses toward for a “perpetual learning machine” for robotics.
2022年6月30日 15:25~15:45 ラグナガーデンホテル 羽衣:東 第8会場
1S08a-05
ひとつの脳がいかにして様々なタスクを扱えるのか:
汎用人工知能(AGI)が実現する可能性を見通す
How to handle heterogeneous tasks with single brain: Perspective that the artificial general intelligence (AGI) is not so far away

*山川 宏(1,2,3)
1. 東京大学、2. 全脳アーキテクチャ・イニシアティブ、3. 電気通信大学
*Hiroshi Yamakawa(1,2,3)
1. The Univesity of Tokyo, 2. The Whole Brain Architecture Initiative, 3. The University of Electro-Communications

Keyword: ARTIFICIAL GENERAL INTELLIGENCE, HIERACHICAL MOTOR CONTROL IN CORTEX, MULTI-TASK LEARNING , ENTIFICATION

The representations we use to perceive the world are not just image signals obtained directly from sensors or motor signals to muscles. Rather, they are something more abstract that are perceived as an entity placed in the external world, like a physical object or a process like throwing/walking. By pointing to that each entity by using symbols, communication through language becomes possible, and by handling the relationship between multiple physical objects (entities), complex actions such as using tools can be achieved.
Recent advances in AI technology have made it possible to separate and recognize a chair and a cat as physical objects, even if these objects are combined in an image. There are also emerging technologies that combine Entities, such as generating a still image of "Pikachu on a motorcycle" and "avocado chair" from verbal instructions [1].If the technology continues to develop at this rate, will it also be possible to decompose and combine multiple tasks, which is an important capability of general-purpose artificial intelligence? For example, will we be able to distinguish between the tasks of cooking and cleaning up in the kitchen, while simultaneously combining them for efficient execution?
To get a hint of this, let us examine what is going on in the brain here. It is well known that visual object recognition is achieved by a feed-forward hierarchy of ventral visual pathway starting from V1 area. Given that the neocortex is composed of almost canonical local circuits, it is possible that a similar mechanism exists in motor control. Indeed, although not as clear as visual information processing, there is a hierarchy of areas in the motor cortex of the marmoset that conveys sensory input in a feed-forward fashion, from area 4 of cortex parts a and b (primary motor) to area 6 of cortex ventral part a [2].
Considering these findings, the principle for entification of diverse types of task processes (entities), looks as same as that of diverse types of physical objects (entities). If so, the technology for dealing with plural entities that is currently being advanced in deep learning of recognition could be extended to the motor control domain in the same way, then a key problem for the realization of artificial intelligence (AGI) could look solved.
References:
[1] OpenAI (2021). DALL·E: Creating Images from Text, https://openai.com/blog/dall-e/
[2] Theodoni, P., Majka, P., Reser, D. H., Wójcik, D. K., Rosa, M. G. P., & Wang, X.-J. (2021). Structural Attributes and Principles of the Neocortical Connectome in the Marmoset Monkey. Cerebral Cortex , 32(1), 15–28.