Remarks
(1) Abramson, J., Ahuja, A., Barr, I., Brussee, A., Carnevale, F., Cassin, M., Chhaparia, R., Clark, S., Damoc, B., Dudzik, A. and Georgiev, P., 2020. Imitating interactive intelligence. arXiv preprint arXiv:2012.05672.
(2) Abramson, J., Ahuja, A., Brussee, A., Carnevale, F., Cassin, M., Fischer, F., Georgiev, P., Goldin, A., Harley, T. and Hill, F., 2021. Creation of multimodal interactive agents with imitation and self-supervised learning. arXiv preprint arXiv:2112.03763.
(3) Abramson, J., Ahuja, A., Carnevale, F., Georgiev, P., Goldin, A., Hung, A., Landon, J., Lillicrap, T., Muldal, A., Richards, B. and Santoro, A., 2022. Evaluation of multimodal interactive agents. arXiv preprint arXiv:2205.13274.
(4) Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T. and Joseph, N., 2022. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint arXiv:2204.05862.
(5) Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S. and Amodei, D., 2017. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.