Abstract
We tackle the problem of an agent interacting with humans in a general-sum environment, i.e., a non-zero sum, non-fully cooperative setting, where the agent's goal is to increase its own utility. We show that when data is limited, building an accurate human model is very challenging, and that a reinforcement learning agent, which is based on this data, does not perform well in practice. Therefore, we propose that the agent should try maximizing a linear combination of the human's utility and its own utility rather than simply trying to maximize only its own utility. We provide a formula to compute what we believe to be the optimal trade-off for the ratio between the human's and the agent's utility when attempting to maximize the agent's utility. We show the performance of our proposed method in two different domains. That is, our proposed agent not only maximizes the social welfare of both the human and the autonomous agent, but performs significantly better than agents not accounting for the human's utility function in terms of the agent's own utility.
Original language | English |
---|---|
Pages | 2079-2086 |
Number of pages | 8 |
State | Published - 2022 |
Event | 44th Annual Meeting of the Cognitive Science Society: Cognitive Diversity, CogSci 2022 - Toronto, Canada Duration: 27 Jul 2022 → 30 Jul 2022 |
Conference
Conference | 44th Annual Meeting of the Cognitive Science Society: Cognitive Diversity, CogSci 2022 |
---|---|
Country/Territory | Canada |
City | Toronto |
Period | 27/07/22 → 30/07/22 |
Keywords
- Human modeling
- Human-agent
- Reinforcement Learning
- interaction