TY - GEN
T1 - The concept of criticality in reinforcement learning
AU - Spielberg, Yitzhak
AU - Azaria, Amos
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - This paper introduces a novel idea in human-aided reinforcement learning - the concept of criticality. The criticality of a state indicates how much the choice of action in that particular state influences the expected return. In order to develop an intuition for the concept, we present examples of plausible criticality functions in multiple environments. Furthermore, we formulate a practical application of criticality in reinforcement learning: The criticality-based varying stepnumber algorithm (CVS) - a flexible stepnumber algorithm that utilizes the criticality function, provided by a human, in order to avoid the problem of choosing an appropriate stepnumber in n-step algorithms such as n-step SARSA and n-step Tree Backup. We present experiments in the Atari Pong environment demonstrating that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.
AB - This paper introduces a novel idea in human-aided reinforcement learning - the concept of criticality. The criticality of a state indicates how much the choice of action in that particular state influences the expected return. In order to develop an intuition for the concept, we present examples of plausible criticality functions in multiple environments. Furthermore, we formulate a practical application of criticality in reinforcement learning: The criticality-based varying stepnumber algorithm (CVS) - a flexible stepnumber algorithm that utilizes the criticality function, provided by a human, in order to avoid the problem of choosing an appropriate stepnumber in n-step algorithms such as n-step SARSA and n-step Tree Backup. We present experiments in the Atari Pong environment demonstrating that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.
KW - Human aided reinforcement learning Human agent interaction
UR - http://www.scopus.com/inward/record.url?scp=85076115234&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2019.00043
DO - 10.1109/ICTAI.2019.00043
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85076115234
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 251
EP - 258
BT - Proceedings - IEEE 31st International Conference on Tools with Artificial Intelligence, ICTAI 2019
PB - IEEE Computer Society
T2 - 31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019
Y2 - 4 November 2019 through 6 November 2019
ER -