The concept of criticality in reinforcement learning

Yitzhak Spielberg, Amos Azaria

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

This paper introduces a novel idea in human-aided reinforcement learning - the concept of criticality. The criticality of a state indicates how much the choice of action in that particular state influences the expected return. In order to develop an intuition for the concept, we present examples of plausible criticality functions in multiple environments. Furthermore, we formulate a practical application of criticality in reinforcement learning: The criticality-based varying stepnumber algorithm (CVS) - a flexible stepnumber algorithm that utilizes the criticality function, provided by a human, in order to avoid the problem of choosing an appropriate stepnumber in n-step algorithms such as n-step SARSA and n-step Tree Backup. We present experiments in the Atari Pong environment demonstrating that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.

Original languageEnglish
Title of host publicationProceedings - IEEE 31st International Conference on Tools with Artificial Intelligence, ICTAI 2019
PublisherIEEE Computer Society
Pages251-258
Number of pages8
ISBN (Electronic)9781728137988
DOIs
StatePublished - Nov 2019
Event31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019 - Portland, United States
Duration: 4 Nov 20196 Nov 2019

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume2019-November
ISSN (Print)1082-3409

Conference

Conference31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019
Country/TerritoryUnited States
CityPortland
Period4/11/196/11/19

Keywords

  • Human aided reinforcement learning Human agent interaction

Fingerprint

Dive into the research topics of 'The concept of criticality in reinforcement learning'. Together they form a unique fingerprint.

Cite this