• 대한전기학회
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • 한국과학기술단체총연합회
  • 한국학술지인용색인
  • Scopus
  • crossref
  • orcid

References

1 
R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, pp. 1–552, 2018.URL
2 
D. Silver, J. Schrittwieser, and K. Simonyan, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, Oct. 2017.URL
3 
D. Silver, J. Schrittwieser, and I. Antonoglou, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” in Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, Dec. 2017.URL
4 
O. Vinyals, W. Czarnecki, and M. Mathieu, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, Dec. 2019.URL
5 
J. Ibarz, J. Tan, and S. Levine, “How to train your robot with deep reinforcement learning: Lessons we have learned,” in Proceedings of the International Symposium on Robotics Research (ISRR), Hanoi, Vietnam, Oct. 2019.DOI
6 
S. Gu, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, pp. 3389–3396, May 2017.DOI
7 
B. R. Kiran, I. Sobh, and V. Talpaert, “Deep reinforcement learning for autonomous driving: A survey,” in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, Jun. 2021.DOI
8 
L. Ouyang, J. Wu, and P. Mishkin, “Training language models to follow instructions with human feedback,” in Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, Dec. 2022.URL
9 
M. L. Puterman, “Markov decision processes,” in Handbooks in Operations Research and Management Science, D. P. Heyman and M. J. Sobel, Eds., Elsevier, vol. 2, pp. 331–434, 1990.DOI
10 
R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” in Machine Learning, vol. 8, no. 3–4, pp. 229–256, May 1992.DOI
11 
V. Mnih, A. P. Badia, and D. Silver, “Asynchronous methods for deep reinforcement learning,” in Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA, pp. 1928–1937, Jun. 2016.URL
12 
J. Peters, and S. Schaal, “Natural actor-critic,” in Neurocomputing, vol. 71, no. 7–9, pp. 1180–1190, Mar. 2008.DOI
13 
J. Schulman, P. Abbeel, and M. Jordan, “Trust region policy optimization,” in Proceedings of the International Conference on Machine Learning (ICML), Lille, France, pp. 1889–1897, Jul. 2015.URL
14 
J. Schulman, P. Dhariwal, and A. Radford, “Proximal policy optimization algorithms,” in arXiv preprint arXiv:1707.06347, Jul. 2017.DOI
15 
K. Kersandt, “Deep reinforcement learning as control method for autonomous UAVs,” MS Thesis, Universitat Politècnica de Catalunya, 2018.URL
16 
V. Makoviychuk, and M. Macklin, “Isaac Gym: High performance GPU-based physics simulation for robot learning,” in arXiv preprint arXiv:2108.10470, Aug. 2021.DOI
17 
N. Rudin, S. Hoeller, and R. Hafner, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning (CoRL), PMLR, 2022.URL
18 
A. Juliani, V. Berges, and D. Lange, “Unity: A general platform for intelligent agents,” in arXiv preprint arXiv:1809.02627, Sep. 2018.URL