A simple implementation of using human preferences to train a reward function on user preference given agent data.
source
A simple implementation of using human preferences to train a reward function on user preference given agent data.
source
“As an Amazon Associate I earn from qualifying purchases.”