Criticality-Based Advice in Reinforcement Learning (Student Abstract)

Yitzhak Spielberg, Amos Azaria

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the ways to make reinforcement learning (RL) more efficient is by utilizing human advice. Since human advice is expensive, the central question in advice-based reinforcement learning is, how to decide in which states the agent should ask for advice. To approach this challenge, various advice strategies have been proposed. Although all of these strategies distribute advice more efficiently than naive strategies, they rely solely on the agent's estimate of the action-value function, and therefore, are rather inefficient when this estimate is not accurate, in particular, in the early stages of the learning process. To address this weakness, we present an approach to advice-based RL, in which the human's role is not limited to giving advice in chosen states, but also includes hinting a-priori, before the learning procedure, in which sub-domains of the state space the agent might require more advice. For this purpose we use the concept of critical: states in which choosing the proper action is more important than in other states.

Original languageEnglish
Title of host publicationIAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations
PublisherAssociation for the Advancement of Artificial Intelligence
Pages13057-13058
Number of pages2
ISBN (Electronic)1577358767, 9781577358763
StatePublished - 30 Jun 2022
Event36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online
Duration: 22 Feb 20221 Mar 2022

Publication series

NameProceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Volume36

Conference

Conference36th AAAI Conference on Artificial Intelligence, AAAI 2022
CityVirtual, Online
Period22/02/221/03/22

Fingerprint

Dive into the research topics of 'Criticality-Based Advice in Reinforcement Learning (Student Abstract)'. Together they form a unique fingerprint.

Cite this