TY - GEN
T1 - Criticality-Based Advice in Reinforcement Learning (Student Abstract)
AU - Spielberg, Yitzhak
AU - Azaria, Amos
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2022/6/30
Y1 - 2022/6/30
N2 - One of the ways to make reinforcement learning (RL) more efficient is by utilizing human advice. Since human advice is expensive, the central question in advice-based reinforcement learning is, how to decide in which states the agent should ask for advice. To approach this challenge, various advice strategies have been proposed. Although all of these strategies distribute advice more efficiently than naive strategies, they rely solely on the agent's estimate of the action-value function, and therefore, are rather inefficient when this estimate is not accurate, in particular, in the early stages of the learning process. To address this weakness, we present an approach to advice-based RL, in which the human's role is not limited to giving advice in chosen states, but also includes hinting a-priori, before the learning procedure, in which sub-domains of the state space the agent might require more advice. For this purpose we use the concept of critical: states in which choosing the proper action is more important than in other states.
AB - One of the ways to make reinforcement learning (RL) more efficient is by utilizing human advice. Since human advice is expensive, the central question in advice-based reinforcement learning is, how to decide in which states the agent should ask for advice. To approach this challenge, various advice strategies have been proposed. Although all of these strategies distribute advice more efficiently than naive strategies, they rely solely on the agent's estimate of the action-value function, and therefore, are rather inefficient when this estimate is not accurate, in particular, in the early stages of the learning process. To address this weakness, we present an approach to advice-based RL, in which the human's role is not limited to giving advice in chosen states, but also includes hinting a-priori, before the learning procedure, in which sub-domains of the state space the agent might require more advice. For this purpose we use the concept of critical: states in which choosing the proper action is more important than in other states.
UR - http://www.scopus.com/inward/record.url?scp=85147606481&partnerID=8YFLogxK
U2 - 10.1609/aaai.v36i11.21665
DO - 10.1609/aaai.v36i11.21665
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85147606481
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 13057
EP - 13058
BT - IAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations
PB - Association for the Advancement of Artificial Intelligence
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Y2 - 22 February 2022 through 1 March 2022
ER -