Active Learning and Artificial Curiosity in Robots

Active learning and artificial curiosity
for fast lifelong learning of sensorimotor skills
in robots

This project studies algorithms for active exploration in autonomous life-long robot learning. In particular, we study mechanism of active learning which can drive a robot to acquire continuously novel sensorimotor and cognitive skills by ordering autonomously its own learning experiences from simple to progressively more complex, i.e. self-generating its own adapted learning curriculum.

We elaborated computational mechanisms of curiosity-driven self-exploration, partly inspired by central aspects of intrinsic motivation in humans, for efficient learning of multiple sensorimotor tasks in high-dimensional real-word robots.

As detailed below, these mechanisms leverage the synergy between three principles:

  • Driving exploration with an intrinsic reward measuring empirical learning progress, which focuses learning on problems of optimal difficulty and avoid trivial or too difficult situations;
  • Learning sensorimotor models through goal babbling, which leverages redundancies in sensorimotor spaces;
  • Multiscale active learning, which we call Strategic Learning, where a learner (or a teacher) decide concurrently (or hierarchically) how to use its various kinds of learning resources: what to learn, how to learn it, when to learn it, and possibly from whom to learn it.

As detailed below, we have shown that these mechanisms have strong properties of robustness and efficiency for active learning in statistical robot learning problems (Baranes and Oudeyer, 2013Lopes et al., 2012), especially in the context of strategic and life-long learning (Lopes and Oudeyer, 2012;Nguyen and Oudeyer, 2013).

We are also using some variants of these algorithms as tools to model spontaneous exploration, curiosity and information seeking in humans, and how it can impact the structure of cognitive development, as detailed on this related page.

.A good starting point for reading about our work in this area is the following list of articles:

Intrinsic Motivation Systems for Autonomous Mental Development
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007) 
IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286. Bibtex

Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots

Baranes, A., Oudeyer, P-Y. (2013)
Robotics and Autonomous Systems, 61(1), pp. 49-73. Bibtex

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
Lopes M., Lang T., Toussaint M. and Oudeyer P-Y. (2012)
Neural Information Processing Systems (NIPS 2012), Tahoe, USA. Bibtex

Active Choice of Teachers, Learning Strategies and Goals for a Socially Guided Intrinsic Motivation Learner
Nguyen, M., Oudeyer, P-Y. (2013) 
Paladyn Journal of Behavioural Robotics.

SOFTWARE: A number of these models are available in the open-source Explauto Python library, available on GitHub (code written by C. Moulin-Frier and P. Rouanet):
Explauto: an open-source Python library to study autonomous exploration in developmental robotics
Clément Moulin-Frier; Pierre Rouanet; Pierre-Yves Oudeyerurl
ICDL-Epirob – International Conference on Development and Learning, Epirob, Oct 2014, Genoa, Italy.

Direct links towards more details:

Note: complementary information on this topic is also available on the FLOWERS INRIA team web site.

Statistical active learning for robot skill learning

In statistical machine learning, we have elaborated active learning mechanisms with the goal to allow robots to learn efficiently novel sensorimotor skills in high-dimensional, non-linear, non-stationary and redundant spaces, and within severe time constraints. They are based on three key principles that can be combined: empirical evaluation of learning progress, goal babbling, and strategic learning.

Exploration driven by empirical estimation of learning progress: A first key principle we are studying is active learning of sensorimotor models driven by the maximization of empirically evaluated learning progress. This drives the learner to explore zones of its sensorimotor space where its predictions, or its competences, improve maximally fast in practice (as opposed to in theory). As a side effect, the learner first explores easy activities, then when they are learnt it automatically shifts to progressively more complex ones. In large and real-world spaces, where it is impossible to assume strong analytical properties of the relation between the learner and the environment, this was shown to be significantly more efficient than active learning approaches which maximize novelty, surprise or entropy (or their reduction if it is estimated in a model-based manner). Yet, estimating efficiently the empirical learning progress is highly challenging since it is a spatially and temporally non-stationary quantity. We have developed over the years a series of algorithms addressing this challenge, starting from IAC (Oudeyer et al., 2007), R-IAC (Baranes and Oudeyer, 2009), SAGG-RIAC (Baranes and Oudeyer, 2013), McSAGG-RIAC (Baranes and Oudeyer, 2011), SGIM-ACTS (Nguyen and Oudeyer, 2013), zeta-Rmax, zeta-BEB (Lopes et al., 2012), and SSB (Lopes and Oudeyer, 2012). For example, (Baranes and Oudeyer, 2009) presents the R-IAC architecture and shows how it allows efficient learning of hand-eye coordination in robots. In (Lopes et al., 2012), an RL formulation of this approach is compared to PAC-MDP approaches (e.g. R-Max) and Bayesian RL approaches (e.g. exploration bonuses), providing natural extensions that make them more robust in complex non-stationary spaces (zeta-R-Max and zeta-EB algorithms).

Autonomous and active goal babbling: A second key principle is goal babbling, also called goal exploration. A crucial need in robot learning is the acquisition of inverse models, where a robot has to efficiently learn a mapping between goal or task parameters and the parameters of motor controllers that reach these goals. In goal exploration (Oudeyer and Kaplan, 2007), the learner selects its own goals and self- explores and learns only sub-parts of the sensorimotor space that are sufficient reach these goals: this allows to leverage the redundancy of these spaces by building dense tubes of learning data only where it is necessary for control. The selection of goals can be made active, by sampling goals for which the empirical estimation of competence progress is maximal. This allows the robot learner to avoid spending too much time on unreachable or trivial goals, and progressively explore self-generated goals/tasks of increasing complexity. The SAGG-RIAC architecture (Baranes and Oudeyer, 2013Baranes and Oudeyer, 2010) instantiates this approach and was shown to allow orders of magnitude speed-up for learning skills such as omnidirectional locomotion of quadruped robots and learning how to control a fishing rod with a flexible wire.

Strategic learning and strategic teaching for life-long learning. Strategic learning refers to mechanisms that allow a learner (or a teacher) to decide concurrently (or hierarchically) how to use its various kinds of learning resources: what to learn, how to learn it, when to learn it, and possibly from whom to learn it.

Indeed, for life-long learning of multiple tasks in real world robots, time, physical and cognitive resources are limited: learning requires that multiple kinds of choices be made by the learner or by its teacher. For example, one has to choose how to allocate time to the practice and learning of each task, to choose which data collection method to use (e.g. self-exploration versus imitation learning), and to choose which statistical inference method to use (e.g. using different kinds of representations and inference biases). These choices generate an ordered and structured learning trajectory, and this structure can have a major impact on both what is learned and how efficiently it is learned.

We have introduced a formal framework to study Strategic Learning, using the Strategic Student Problem model (Lopes and Oudeyer, 2012). This has been formally linked to techniques in the Bandit literature, and led to the Strategic Bandit algorithm, which actively chooses learning resources based on empirical evaluation of learning progress (Lopes and Oudeyer, 2012).

We have also experimented how Strategic Learning can be used to address a very important question in muti-task life-long robot learning (Nguyen and Oudeyer, 2013): how this can allow the robot learner to concurrently decide what task to learn at a given moment, when to do self-exploration andwhen to imitate for improving on this task, and in the latter case how to imitate (emulation vs. mimicry) and whom to imitate (from one of several available teachers).

barre92

Selected video talks

(27th june 2010) Developmental Constraints on Active Learning for the Acquisition of Motor Skills in High-Dimensional RobotsRSS 2010 Workshop “Towards Closing the Loop: Active Learning for Robotics”, Zaragoza, Spain.

barre92

Selected experiments videos

The Playground Experiment. We have built an experimental setup, called the Playground Experiment, which allowed to show how the curiosity algorithm which we developped allows for the self-organization of developmental trajectories with sequences of behavioural stages of increasing complexity (Oudeyer et al., 2007, Oudeyer and Kaplan, 2006).

Learning omnidirectional quadruped locomotion.In this experiment, we showed how the successive architectures we developped allow a quadruped robot, initially equipped with parameterized motor primitives in the form of a 24 dimensional oscillator (sinuses with various parameters in most of the joints), learns to use these motor primitives to locomote precisely in all directions and in varied manners. In the article (Baranes and Oudeyer, 2013), we study extensively a physical simulation of this experimental setup with active learning algorithms.

barre92

Selected publications:

Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots
Baranes, A., Oudeyer, P-Y. (2013)
Robotics and Autonomous Systems, 61(1), pp. 49-73. http://dx.doi.org/10.1016/j.robot.2012.05.008.Bibtex

Exploration strategies in developmental robotics: a unified probabilistic framework
Moulin-Frier, C. and Oudeyer, P-Y. (2013)
Proceedings of IEEE International Conference on Development and Learning and Epigenetic Robotics, IEEE ICDL-Epirob, Osaka, Japan.

Active Choice of Teachers, Learning Strategies and Goals for a Socially Guided Intrinsic Motivation Learner
Nguyen, M., Oudeyer, P-Y. (2013) 
Paladyn Journal of Behavioural Robotics.

Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints
Oudeyer P-Y., Baranes A., Kaplan F. (2013)
in Intrinsically Motivated Learning in Natural and Artificial Systems, eds. Baldassarre G. and Mirolli M., Springer. Bibtex

Socially Guided Intrinsic Motivation for Robot Learning of Motor Skills
Nguyen, M., Oudeyer, P-Y. (2013)
Autonomous Robots, doi:http://dx.doi.org/10.1007/s10514-013-9339-y

Object Learning Through Active Exploration
Ivaldi, S., Nguyen, M., Lyubova, N., Droniou, A., Padois, V., Filliat, D., Oudeyer, P-Y., Sigaud, O. (in press)
IEEE Transactions on Autonomous Mental Development

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
Lopes M., Lang T., Toussaint M. and Oudeyer P-Y. (2012)
Neural Information Processing Systems (NIPS 2012), Tahoe, USA. Bibtex

The Strategic Student Approach for Life-Long Exploration and Learning
Lopes M., Oudeyer P-Y. (2012)
in Proceedings of IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-Epirob), San Diego, USA. Bibtex

The Interaction of Maturational Constraints and Intrinsic Motivations in Active Motor Development
Baranes, A., Oudeyer, P-Y. (2011)
in proceedings of the IEEE International Conference on Development and Learning (ICDL-Epirob) , Frankfurt, Germany. Bibtex

Intrinsically Motivated Goal Exploration for Active Motor Learning in Robots: a Case Study
Baranes, A., Oudeyer, P-Y. (2010)
inProceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), Taipei, Taiwan. Bibtex

Active Learning and Intrinsically Motivated Exploration in Robots: Advances and Challenges (Guest editorial)
Lopes, M., Oudeyer, P-Y. (2010)
IEEE Transactions on Autonomous Mental Development, 2(2), pp. 65-69. Bibtex

R-IAC: Robust intrinsically motivated exploration and active learning
Baranes, A., Oudeyer, P-Y. (2009)
IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155-169. Bibtex

Intrinsic Motivation Systems for Autonomous Mental Development
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007)
IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286. DOI: 10.1109/TEVC.2006.890271.Bibtex

What is intrinsic motivation? A typology of computational approaches
Oudeyer P-Y. and Kaplan F. (2007)
Frontiers in Neurorobotics, 1:6, doi: 10.3389/neuro.12.006.2007. Bibtex

Discovering Communication
Oudeyer P-Y., Kaplan F. (2006)
Connection Science, 18(2), pp. 189–206. Bibtex

Motivational principles for visual know-how development.
Kaplan, F., Oudeyer, P-Y (2003)
In Prince, C.G. and Berthouze, L. and Kozima, H. and Bullock, D. and Stojanov, G. and Balkenius, C., editor, Proceedings of the 3rd international workshop on Epigenetic Robotics : Modeling cognitive development in robotic systems, no. 101, pages 73-80, 2003. Lund University Cognitive Studies.