• ๋Œ€ํ•œ์ „๊ธฐํ•™ํšŒ
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ๋‹จ์ฒด์ด์—ฐํ•ฉํšŒ
  • ํ•œ๊ตญํ•™์ˆ ์ง€์ธ์šฉ์ƒ‰์ธ
  • Scopus
  • crossref
  • orcid

  1. (Dept. of Electrical Eng., Myongji University, Korea)
  2. (Dept. of Electrical Eng., Myongji University, Korea)



Power system, Measurement noise, Disturbance observer, Deep Q-Network, Swing equation, Line fault detection

1. ์„œ ๋ก 

์ „๋ ฅ์‹œ์Šคํ…œ์€ ๋น„์„ ํ˜• ๋™์—ญํ•™์œผ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์œผ๋ฉฐ, ํƒœํ’, ํ˜ธ์šฐ, ๋‚™๋ขฐ ๋“ฑ์˜ ์ž์—ฐ์žฌํ•ด๋‚˜ ์‹œ์Šคํ…œ ๊ณ ์žฅ ๋“ฑ์— ๋งค์šฐ ์ทจ์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค. 3์ƒ ๋‹จ๋ฝ ๊ณ ์žฅ๊ณผ ๊ฐ™์€ ์„ ๋กœ ๊ณ ์žฅ์ด ๋ฐœ์ƒํ•˜๋ฉด ์„ ๋กœ์˜ ๋ฆฌ์•กํ„ด์Šค์˜ ๋ณ€ํ™”๋ฅผ ๊ฐ€์ ธ์™€ ๊ณ„ํ†ต์„ ๋ถˆ์•ˆ์ •ํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์œผ๋ฉฐ ์‹ฌํ•œ ๊ฒฝ์šฐ์—๋Š” ๋„“์€ ์ง€์—ญ์— ์ •์ „์„ ์ผ์œผํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ „๋ ฅ์‹œ์Šคํ…œ์˜ ๊ณ ์žฅ์„ ์‹ ์†ํ•˜๊ฒŒ ํŒ๋‹จํ•˜๊ณ  ๋ณดํ˜ธํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜๊ณ  ์„ ๋กœ ๊ณ ์žฅ์„ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค (1-10).

์ „๋ ฅ์‹œ์Šคํ…œ์˜ ์ „๊ธฐ๏ฝฅ๊ธฐ๊ณ„์ ์ธ ์ง„๋™์€ ๋™๊ธฐ๋ฐœ์ „๊ธฐ์˜ ๊ธฐ๊ณ„์ ์ธ ์ž…๋ ฅ๊ณผ ์ „๊ธฐ์ ์ธ ์ถœ๋ ฅ ์‚ฌ์ด์˜ ๋ถˆ๊ท ํ˜•์— ์˜ํ•ด์„œ ๋ฐœ์ƒํ•˜๊ณ  ์‹ฌํ•œ ๊ฒฝ์šฐ์—๋Š” ๋™๊ธฐ ํƒˆ์กฐ๋ผ๊ณ  ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์ „๋ ฅ์‹œ์Šคํ…œ์˜ ์•ˆ์ •๋„ ํ•ด์„์€ ๋™๊ธฐ๋ฐœ์ „๊ธฐ์˜ ๋™๊ธฐํ™”์—ฌ๋ถ€๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋น„์„ ํ˜• ๋™์š”๋ฐฉ์ •์‹์˜ ํ•ด๋ฅผ ๊ตฌํ•จ์œผ๋กœ์จ ํ•ด์„์ด ๊ฐ€๋Šฅํ•˜๊ณ , ์‚ฌ๊ณ ์— ๋”ฐ๋ฅธ ๊ณ ์žฅ ์„ ๋กœ ์ฐจ๋‹จ ์ดํ›„์— ๊ณ„ํ†ต์ด ์ƒˆ๋กœ์šด ํ‰ํ˜•์ ์œผ๋กœ ์ˆ˜๋ ดํ•  ๊ฒƒ์ธ์ง€ ํ˜น์€ ์ „๋ ฅ ๋™์š” ์ดํ›„์— ๋™๊ธฐ ํƒˆ์กฐ๋กœ ์ด์–ด์งˆ ๊ฒƒ์ธ์ง€๋ฅผ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค (1-4,10).

๋…ผ๋ฌธ (11)์—์„œ๋Š” ์‹œ์Šคํ…œ์˜ ๋ถˆํ™•์‹ค์„ฑ์— ๋Œ€ํ•œ ๊ฐ•์ธ์„ฑ ํ–ฅ์ƒ ๊ธฐ๋ฒ•์œผ๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์™ธ๋ž€ ๊ด€์ธก๊ธฐ(DOB; Disturbance Observer) (12)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ณ„ํ†ต์˜ ์„ ๋กœ ๊ณ ์žฅ ๊ฐ์ง€ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ์—ˆ๋‹ค. ์ฃผํŒŒ์ˆ˜ 60[Hz]์ธ ๊ณ„ํ†ต ์ „์••์˜ 5์ฃผ๊ธฐ ์ด๋‚ด๋กœ ๊ณ ์žฅ์„ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์ด๋“(high gain) PI ๊ด€์ธก๊ธฐ๋ฅผ ๊ณ ์•ˆํ–ˆ์ง€๋งŒ, ์ถœ๋ ฅ์— ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•  ๊ฒฝ์šฐ ๊ด€์ธก๊ธฐ๊ฐ€ ์žก์Œ์ด ํฌํ•จ๋œ ์ƒํƒœ๋ฅผ ์ถ”์ •ํ•˜์—ฌ ๊ด€์ธก ์˜ค์ฐจ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค (13).

๋ณธ ๋…ผ๋ฌธ์€ ์ธ๊ณต์ง€๋Šฅ์„ ์ด์šฉํ•˜์—ฌ ์™ธ๋ž€๊ณผ ์ธก์ • ์žก์Œ์ด ๋ชจ๋‘ ์กด์žฌํ•˜๋Š” 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ์—์„œ ๊ฐ•์ธํ•˜๊ฒŒ ์„ ๋กœ ๊ณ ์žฅ ํŒ๋ณ„์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์˜ ๋ณด์ƒ์— ๋”ฐ๋ฅธ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์ตœ์  ๊ฐœ๋…์„ ๊ฐ–๋Š” ๊ด€์ธก ์ด๋“์„ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ, ์ž˜ ์•Œ๋ ค์ง„ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ Deep Q-Network(DQN)์„ ์ด์šฉํ•˜์˜€๋‹ค. ์ด์ „ ๊ฒฐ๊ณผ๋กœ ๋…ผ๋ฌธ (14)์—์„œ๋Š” ์žก์Œ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ํ™˜๊ฒฝ์—์„œ 3๊ฐœ์˜ Deep Q-Network๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ์ด๋•Œ ํ™•์žฅ๋œ network ์‚ฌ์šฉ์œผ๋กœ ์š”๊ตฌํ•˜๋Š” data์˜ ์–‘์ด ๋งŽ์•„์ ธ ์ปดํ“จํŒ…์— ๋ถ€๋‹ด์„ ์ค„ ์ˆ˜ ์žˆ์œผ๋ฉฐ ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค๊ณ„ ์‹œ์— ์ธก์ • ์žก์Œ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ์˜ ๊ฐ•์ธํ•œ ์ƒํƒœ ์ถ”์ •์„ ๋ณด์žฅํ•  ์ˆ˜ ์—†๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” network๋ฅผ ํ•˜๋‚˜๋งŒ ๊ตฌ์„ฑํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์œผ๋ฉฐ, ์ธก์ • ์žก์Œ์— ๊ฐ•์ธํ•˜๋„๋ก ์‹œ์Šคํ…œ์— ์ ์‘์ ์œผ๋กœ ๊ด€์ธก๊ธฐ ์ด๋“์„ ์„ ํƒํ•˜๋Š” Deep Q-Network ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ ์ ˆํ•œ ๊ด€์ธก ์ด๋“์„ ์‚ฌ์šฉํ•œ ํ•™์Šต์„ ํ†ตํ•œ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์œผ๋กœ ์‹ ์†ํ•œ ๊ณ ์žฅ ํŒ๋‹จ์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ํ•™์Šต๊ณผ ๋ชจ์˜์‹คํ—˜์—๋Š” ๊ฐ๊ฐ Python๊ณผ Matlab์„ ์ด์šฉํ•˜์˜€๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 2.1์ ˆ์—์„œ 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ๊ณ„ํ†ต ๋ชจ๋ธ๊ณผ ์™ธ๋ž€์„ ์ •์˜ํ•˜๊ณ , 2.2์ ˆ์—์„œ๋Š” Deep Q-Network์— ๋Œ€ํ•ด์„œ ์†Œ๊ฐœํ•˜๊ณ , 2.3์ ˆ์—์„œ๋Š” Deep Q-Network ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. 2.4์ ˆ์—์„œ๋Š” ํ•™์Šต์„ ์œ„ํ•œ ์ƒํƒœ(state), ํ–‰๋™(action), ๋ณด์ƒ(reward)์„ 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด์„œ ์ •์˜ํ•œ๋‹ค. 2.5์ ˆ์—์„œ๋Š” ๋™์š”๋ฐฉ์ •์‹์˜ ์ƒํƒœ data๋กœ๋ถ€ํ„ฐ Deep Q-Network ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ณ  ํ•™์Šต๋œ network๋กœ๋ถ€ํ„ฐ ์„ค๊ณ„๋œ ๊ด€์ธก๊ธฐ๊ฐ€ ์ธก์ • ์žก์Œ์— ๊ฐ•์ธํ•˜๊ฒŒ ๊ณ ์žฅ ํŒ๋ณ„์ด ๊ฐ€๋Šฅํ•จ์„ ๋ชจ์˜์‹คํ—˜์„ ํ†ตํ•ด ํ™•์ธํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฐ๋ก ์—์„œ ๋…ผ๋ฌธ์˜ ๋์„ ๋งบ๋Š”๋‹ค.

2. ๋ณธ ๋ก 

2.1 ์‹œ์Šคํ…œ ๋ชจ๋ธ ๋ฐ ์™ธ๋ž€ ์ •์˜

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ๋ฆผ 1๊ณผ ๊ฐ™์€ 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ(Single Machine Infinite Bus System, SMIB)์„ ๊ณ ๋ คํ•œ๋‹ค. ๋ฌดํ•œ๋ชจ์„ ์—์„œ์˜ ์ „์••์˜ ํฌ๊ธฐ๋Š” ์ผ์ •ํ•˜๋ฉฐ ์œ„์ƒ์€ 0์ด๊ณ  ์–ธ์ œ๋‚˜ ์ถฉ๋ถ„ํ•œ ์–‘์˜ ๋ฐœ์ „๋Ÿ‰๊ณผ ๋ถ€ํ•˜๋Ÿ‰์„ ๊ฐ€์ง„๋‹ค (3).

๊ทธ๋ฆผ. 1. 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ

Fig. 1. Single Machine Infinite Bus System

../../Resources/kiee/KIEE.2020.69.7.1095/fig1.png

์ „๋ ฅ์‹œ์Šคํ…œ์˜ ๋™๊ธฐ ํƒˆ์กฐ ํ•ด์„์€ ์™ธ๋ž€์— ๋Œ€ํ•œ ๋™๊ธฐ๋ฐœ์ „๊ธฐ์˜ ๋™๊ธฐํ™” ์—ฌ๋ถ€๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์œผ๋กœ 2๊ณ„ ๋ฏธ๋ถ„๋ฐฉ์ •์‹์ธ ๋™์š”๋ฐฉ์ •์‹์„ ํ†ตํ•ด ํ•ด์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ณ ์žฅ์œผ๋กœ ์ธํ•œ ๊ณผ๋„ํ˜„์ƒ ์ดํ›„ ๊ณ„ํ†ต์˜ ์ƒํƒœ๊ฐ€ ์ƒˆ๋กœ์šด ํ‰ํ˜•์ ์œผ๋กœ ์ˆ˜๋ ดํ•˜๋Š” ๊ณผ์ •์„ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ๋™์š”๋ฐฉ์ •์‹์„ ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์•„๋ž˜ ์‹๊ณผ ๊ฐ™๋‹ค (1).

(1)
\begin{align*} \dot\delta =\omega_{\triangle},\:\\ \dot\omega_{\triangle}=\dfrac{\pi f_{0}}{H}(P_{m}-P_{e}(\delta)+P_{d}-\dfrac{D}{\omega_{0}}\omega_{\triangle}). \end{align*}

์œ„ ์‹์—์„œ $\delta$๋Š” ์ „๋ ฅ๊ฐ, $\omega_{\triangle}$์€ ๋™๊ธฐ ์ฃผํŒŒ์ˆ˜์— ๋Œ€ํ•œ ๊ฐ์ฃผํŒŒ์ˆ˜ ํŽธ์ฐจ, $H$๋Š” ๋‹จ์œ„ ๊ด€์„ฑ ๊ณ„์ˆ˜, $f_{0}$์™€ $\omega_{0}$๋Š” ๊ฐ๊ฐ ๋™๊ธฐ ์ฃผํŒŒ์ˆ˜์™€ ๋™๊ธฐ ๊ฐ์ฃผํŒŒ์ˆ˜, $P_{m}$์€ ๋ฐœ์ „๊ธฐ์— ์ธ๊ฐ€๋˜๋Š” ๊ธฐ๊ณ„์  ์ž…๋ ฅ, $P_{e}$๋Š” ๋ฐœ์ „๊ธฐ์˜ ์ „๊ธฐ์  ์ถœ๋ ฅ, $P_{d}$๋Š” ๊ณ ์žฅ ๋ฐœ์ƒ์œผ๋กœ ์ธํ•œ ์ถ”๊ฐ€์ ์ธ ์™ธ๋ž€์ด๊ณ  $D$๋Š” ๋Œํ•‘ ๊ณ„์ˆ˜๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ 0 ~ 2 [${pu}$] ๋ฒ”์œ„์˜ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ์‹(1)์—์„œ ๋ฐœ์ „๊ธฐ์˜ ์ „๊ธฐ์  ์ถœ๋ ฅ $P_{e}$๋Š” ์‹(2)์™€ ๊ฐ™๋‹ค (2).

(2)
$P_{e}(\delta)=\dfrac{| E | | V |}{X}\sin(\delta)=: P_{\max}\sin(\delta).$

์œ„ ์‹์—์„œ $X$๋Š” ๋ฐœ์ „๊ธฐ์™€ ๋ฌดํ•œ๋ชจ์„  ์‚ฌ์ด์˜ ๋ฆฌ์•กํ„ด์Šค, $E$๋Š” ๋ฐœ์ „๊ธฐ ์œ ๊ธฐ๊ธฐ์ „๋ ฅ, $V$๋Š” ๋ฌดํ•œ๋ชจ์„  ์ „์••์ด๋‹ค. ์„ ๋กœ์— ๊ณ ์žฅ์ด ๋ฐœ์ƒํ•˜๊ธฐ ์ „์—๋Š” ๋ฐœ์ „๊ธฐ์˜ ๊ธฐ๊ณ„์  ์ถœ๋ ฅ๊ณผ ์ „๊ธฐ์  ์ถœ๋ ฅ์ด ํ‰ํ˜•์„ ์ด๋ฃจ๊ณ  ์žˆ๋‹ค.

์„ ๋กœ์— ๊ณ ์žฅ์ด ๋ฐœ์ƒํ•˜๋ฉด ๋ฐœ์ „๊ธฐ์™€ ๋ฌดํ•œ๋ชจ์„  ์‚ฌ์ด์˜ ๋ฆฌ์•กํ„ด์Šค $X$๊ฐ€ ๋ณ€ํ•˜๊ฒŒ ๋˜๊ณ  ๊ทธ ๊ฐ’์€ ๊ณ ์žฅ ๋ฐœ์ƒ ์œ„์น˜์— ๋”ฐ๋ผ์„œ ๋‹ฌ๋ผ์ง„๋‹ค. ๋ฐœ์ „ ๋ชจ์„ ์—์„œ ๊ณ ์žฅ ์ง€์ ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ ๋น„์œจ์„ $\lambda$($0\le\lambda\le 1$)๋ผ๊ณ  ํ•  ๋•Œ, 3์ƒ ๋‹จ๋ฝ ๊ณ ์žฅ ํ›„์˜ ๋ฆฌ์•กํ„ด์Šค ๊ฐ’ $X_{post}$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์“ธ ์ˆ˜ ์žˆ๋‹ค (1).

(3)
$X_{post}= X_{s}+X_{L1}+ X_{s}X_{L1}/(\lambda X_{L2}).$

์œ„ ์‹์—์„œ $X_{s}$๋Š” ๋ณ€์••๊ธฐ ๋ฐ ๋ฐœ์ „๊ธฐ $d$์ถ• ๊ณผ๋„ ๋ฆฌ์•กํ„ด์Šค์ด๊ณ  $X_{L1}$๊ณผ $X_{L2}$๋Š” ๊ฐ๊ฐ ๊ฑด์ „์„ ๋กœ ๋ฐ ๊ณ ์žฅ์„ ๋กœ์˜ ๋ฆฌ์•กํ„ด์Šค์ด๋‹ค. ๊ณ ์žฅ ์ „ํ›„์˜ ๋ฐœ์ „๊ธฐ ์œ ๊ธฐ๊ธฐ์ „๋ ฅ $E$์™€ ๋ฌดํ•œ๋ชจ์„  ์ „์•• $V$๊ฐ€ ๋™์ผํ•˜๋‹ค๊ณ  ํ•œ๋‹ค๋ฉด ์ƒ์ˆ˜ $d$์— ๋Œ€ํ•ด์„œ ์™ธ๋ž€ $P_{d}$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

(4)
$P_{d}= d\sin(\delta).$

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ณ ์žฅ์œผ๋กœ ์ธํ•œ $P_{\max}$์˜ ๋ณ€ํ™”๋Ÿ‰์„ ์™ธ๋ž€์˜ ํฌ๊ธฐ $d$๋กœ ๊ฐ€์ •ํ•˜๊ณ  ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ์ถ”์ •ํ•œ๋‹ค. ๊ณ ์žฅ ์ „์˜ ๋ฆฌ์•กํ„ด์Šค๋ฅผ $X_{pre}$๋ผ๊ณ  ํ•œ๋‹ค๋ฉด $P_{\max}$์˜ ๋ณ€ํ™”๋Ÿ‰ $d$๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

(5)
$d =\dfrac{| E | | V |}{X_{pre}}-\dfrac{| E | | V |}{X_{post}}$

2.2 Deep Q-Network(DQN)

๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ธ Deep Q-Network๋Š” ์ฃผ์–ด์ง„ ํ™˜๊ฒฝ์—์„œ ํ˜„์žฌ ์ƒํƒœ(state)์™€ ํ–‰๋™(action)์„ ์ž…๋ ฅ์œผ๋กœ ์ฃผ๋ฉด ๋‹ค์Œ ํ–‰๋™์— ๋Œ€ํ•œ ๊ธฐ๋Œ“๊ฐ’์„ ์ถœ๋ ฅํ•ด์ฃผ๋Š” $Q$ ํ•จ์ˆ˜๋ฅผ ํ•™์Šต์‹œ์ผœ ์ตœ๋Œ€์˜ ๋ณด์ƒ(reward)์„ ๋ฐ›๊ฑฐ๋‚˜ ์ตœ์†Œ์˜ ๋ฒŒ์น™์„ ๋ฐ›๋„๋ก ์—์ด์ „ํŠธ์˜ ํ–‰๋™์„ ๊ฒฐ์ •ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค (15).

DQN์—์„œ ์ž…๋ ฅ์€ ํ˜„์žฌ ์ƒํƒœ์ด๊ณ  ์ถœ๋ ฅ์€ ํ˜„์žฌ ์ƒํƒœ์—์„œ ์ทจํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์Œ ํ–‰๋™์— ๋Œ€ํ•œ ๊ธฐ๋Œ“๊ฐ’๋“ค์ด๋‹ค. Deep neural network์˜ ํ•™์Šต์€ ์•„๋ž˜์˜ ์‹๊ณผ ๊ฐ™์€ ๋น„์šฉํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค (18).

(6)
$(r+\gamma\begin{aligned}\max \\^{a'}\end{aligned}Q(s',\: a'|\bar{\theta})-Q(s,\: a |\theta))^{2}.$

์‹(6)์—์„œ $s$์™€ $s'$์€ ๊ฐ๊ฐ ํ˜„์žฌ ์ƒํƒœ์™€ ๋‹ค์Œ ์ƒํƒœ, $a$์™€ $a'$์€ ๊ฐ๊ฐ ํ˜„์žฌ ํ–‰๋™๊ณผ ๋‹ค์Œ ์ƒํƒœ์—์„œ ์ทจํ•  ์ˆ˜ ์žˆ๋Š” ํ–‰๋™, $r$์€ ํ˜„์žฌ ํ–‰๋™์œผ๋กœ ์ธํ•ด ๋ฐ›์€ ๋ณด์ƒ์ด๋‹ค. $\gamma$๋Š” ๊ฐ๊ฐ€์œจ(discount factor)์ด๋ฉฐ ์ด๋ฅผ ํ†ตํ•ด ๋ณด์ƒ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ๊ณ  ๊ฐ๊ฐ€์œจ์ด 0์— ๊ฐ€๊น๋‹ค๋ฉด ์ง€๊ธˆ ๋‹น์žฅ์˜ ๋ณด์ƒ๋งŒ ๊ณ ๋ คํ•œ๋‹ค๋Š” ์˜๋ฏธ์ด๋ฉฐ, ๊ฐ๊ฐ€์œจ์ด 1์— ๊ฐ€๊น๋‹ค๋ฉด ๋จผ ๋ฏธ๋ž˜์˜ ๋ณด์ƒ๊นŒ์ง€ ๊ณ ๋ คํ•œ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค. $\theta$์™€ $\bar{\theta}$๋Š” ๊ฐ€์ค‘์น˜๋กœ ์ด๋ฃจ์–ด์ง„ ํ•จ์ˆ˜์ด๊ณ  ๊ฐ๊ฐ main network์™€ target network์— ํฌํ•จ๋œ๋‹ค. ํ•™์Šต์— ์˜ํ•ด ๊ฐ€์ค‘์น˜๊ฐ€ ์ฆ‰์‹œ update๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•˜๋Š” Non-stationary targets ๋ฌธ์ œ๋ฅผ ๋…๋ฆฝ์ ์ธ target network๋ฅผ ๋งŒ๋“ค์–ด ํ•ด๊ฒฐํ•˜์˜€๋‹ค (19).

ํ•œํŽธ, ๊ฐ•ํ™”ํ•™์Šต์—์„œ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ์ˆœ์ฐจ์ ์œผ๋กœ ์ˆ˜์ง‘๋˜๋ฉฐ, ์ˆœ์ฐจ์ ์ธ ๋ฐ์ดํ„ฐ๋Š” ๊ทผ์ ‘ํ•œ ๊ฒƒ๋“ค๋ผ๋ฆฌ ๋†’์€ ์—ฐ๊ด€์„ฑ(correlation)์„ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค. ์ด๋ฅผ Correlation between samples ๋ฌธ์ œ๋ผ๊ณ  ํ•˜๋ฉฐ, ์„ ํ˜• ํšŒ๊ท€(linear regression) ์‹œ์— ๊ทธ๋ฆผ 2์˜ (b)์™€ ๊ฐ™์ด network์˜ ํ•™์Šต์ด ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰๋˜๋Š” ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ฆผ. 2. ์„ ํ˜• ํšŒ๊ท€

Fig. 2. Linear regression

../../Resources/kiee/KIEE.2020.69.7.1095/fig2.png

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ์™„ํ™”์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ Google์˜ Deep Mind Team์ด ๋ฐœํ‘œํ•œ replay memory ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ฐ ์‹œ๊ฐ„ step๋ณ„๋กœ ์–ป์€ data samples๋ฅผ ์‹(7)๊ณผ ๊ฐ™์ด tuple ํ˜•ํƒœ๋กœ data set์— ์ €์žฅํ•˜๊ฒŒ ๋œ๋‹ค. ์ด๋•Œ, data set์„ ๋ฉ”๋ชจ๋ฆฌ์— ๋ฌดํ•œํžˆ ์ €์žฅํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ๋ฅผ ๊ณ ์ •ํ•˜๊ณ  FIFO(first in first out) ๋ฐฉ์‹์œผ๋กœ ์ €์žฅํ•œ๋‹ค.

(7)
$D_{train}={r}{and}([s_{t},\: a_{t},\: r_{t},\: s_{t+1}]).$

์œ„ ์‹์—์„œ $D_{train}$์€ ์‹(6)์— ์‚ฌ์šฉํ•  data set์ด๋ฉฐ, ์ž„์˜์˜ set์„ ์„ ํƒํ•˜์—ฌ network๋ฅผ ํ•™์Šต ์‹œํ‚จ๋‹ค. ์ด ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜๋ฉด ๊ทธ๋ฆผ 2์˜ (c)์™€ ๊ฐ™์ด ์ž„์˜์˜ sampling์„ ํ†ตํ•ด minibatch๋ฅผ ๊ตฌ์„ฑํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค (19).

DNN์„ ํ†ตํ•ด ๊ตฌํ•œ ๊ธฐ๋Œ“๊ฐ’ $Q$๋ฅผ ์ด์šฉํ•˜์—ฌ ์ตœ์ ์˜ ํ–‰๋™์„ ์„ ํƒํ•˜๋Š” ์ •์ฑ…(policy) $\pi(s)$๋Š” ์‹(8)๊ณผ ๊ฐ™๋‹ค. ์ด๋•Œ, ์ง€์—ญ ์ตœ์ ๊ฐ’์— ๋น ์ ธ ์ „์—ญ ์ตœ์ ๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์—†์„ ํ™•๋ฅ ์ด ์ปค์ง€๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์—ก์‹ค๋ก  ํƒ์š•($\epsilon$-greedy) ์ •์ฑ…์„ ์ถ”๊ฐ€๋กœ ์‚ฌ์šฉํ•œ๋‹ค (15).

(8)
$\pi(s)=\begin{cases} \begin{aligned}\arg\max Q(s,\: a){for}\epsilon\le N \\^{a}\end{aligned}&\\ {r}{and}(a){for}\epsilon > N .& \end{cases}$

์œ„ ์‹์œผ๋กœ๋ถ€ํ„ฐ ๋„์ถœ๋˜๋Š” ๊ฒฐ๊ณผ ๊ฐ’์€ ์–‘์˜ ์ •์ˆ˜์ด๋ฉฐ ์„ ํƒ ๊ฐ€๋Šฅํ•œ ํ–‰๋™๋“ค์— ๋Œ€์‘๋œ๋‹ค. $N$์€ ์ผ์ •ํ•œ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ์ƒ์ˆ˜์ด๊ณ , $\epsilon$์€ ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ์„œ ์ ์  ์ž‘์•„์ง€๋Š” ๋ณ€์ˆ˜์ด๋‹ค. ์ฆ‰, ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” ๋ฌด์ž‘์œ„ํ•œ ํ–‰๋™์„ ์ทจํ•จ์œผ๋กœ์จ ๋‹ค์–‘ํ•œ ๊ฒฝํ—˜์„ ์ฃผ์–ด ์ „์—ญ ์ตœ์ ๊ฐ’์„ ์ฐพ๋Š” ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•œ๋‹ค (15).

์ตœ์ข…์ ์œผ๋กœ DQN์˜ ํ•™์Šต๊ณผ ์ตœ์ ์˜ ํ–‰๋™์„ ์„ ํƒํ•˜๋Š” ๊ณผ์ •์„ ํ๋ฆ„๋„๋กœ ์ •๋ฆฌํ•˜๋ฉด ๊ทธ๋ฆผ 3๊ณผ ๊ฐ™๋‹ค.

๊ทธ๋ฆผ. 3. Deep Q-Network ํ๋ฆ„๋„

Fig. 3. Deep Q-Network flowchart

../../Resources/kiee/KIEE.2020.69.7.1095/fig3.png

2.3 DQN ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ ์„ค๊ณ„

๋™์š”๋ฐฉ์ •์‹์œผ๋กœ ํ‘œํ˜„๋œ 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ๊ณผ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋Š” ๊ทธ๋ฆผ 3์˜ Environment์— ํ•ด๋‹นํ•˜๋ฉฐ ์™ธ๋ž€์˜ ํฌ๊ธฐ $d$๋ฅผ ์ถ”์ •ํ•˜๋Š” DQN ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ ์ „๋ ฅ๊ฐ $\delta$๋Š” ์ธก์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค (8).

(9)
$$\begin{array}{l} \dot{\delta}=\hat{\omega}_{\Delta}+l_{a 1}(\bar{\delta}-\hat{\delta}) \\ \dot{\hat{\omega}}_{\Delta}=\frac{\pi f_{0}}{H}\left(P_{m}-\left(P_{\max }-\hat{d}\right) \sin (\delta)-\frac{D}{\omega_{0}} \hat{\omega}_{\Delta}\right)+l_{a 2}(\bar{\delta}-\hat{\delta}) \\ \dot{\hat{d}}=l_{a 3}(\bar{\delta}-\hat{\delta}) \end{array}$$

์œ„ ์‹์—์„œ $\hat\delta$, $\hat\omega_{\triangle}$, $\hat d$์€ ๊ฐ๊ฐ ๋™์š”๋ฐฉ์ •์‹์˜ ์ƒํƒœ ๋ฐ ์™ธ๋ž€ ์ถ”์ •๊ฐ’์ด๊ณ  $\bar{\delta}$๋Š” ์ธก์ • ์žก์Œ์„ ํฌํ•จํ•œ ์ „๋ ฅ๊ฐ์ด๋‹ค. $l_{a1}$, $l_{a2}$, $l_{a3}$๋Š” ๊ด€์ธก๊ธฐ ์ด๋“์œผ๋กœ ์‹(8)์— ์˜ํ•ด ๊ฒฐ์ •๋œ ๊ฐ’์— ๋”ฐ๋ผ์„œ ์„ ํƒ๋˜๋Š” ์ผ๋ จ์˜ ํ–‰๋™์ด๋‹ค.

์‹œ์Šคํ…œ์— ์™ธ๋ž€์ด ์ธ๊ฐ€๋  ๋•Œ ๊ด€์ธก๊ธฐ ์ด๋“์ด ์ถฉ๋ถ„ํžˆ ํฌ๋‹ค๋ฉด ๊ด€์ธก๊ธฐ๋Š” ์™ธ๋ž€์— ๊ฐ•์ธํ•˜๊ฒŒ ์ƒํƒœ๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค (11,12). ํ•˜์ง€๋งŒ ์ถœ๋ ฅ์— ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•  ๊ฒฝ์šฐ, ๊ด€์ธก๊ธฐ ์ด๋“์„ ํฌ๊ฒŒ ์„ค๊ณ„ํ•œ๋‹ค๋ฉด ๊ด€์ธก๊ธฐ๋Š” ์ธก์ • ์žก์Œ์ด ํฌํ•จ๋œ ์ƒํƒœ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ ์ƒํƒœ์™€์˜ ์˜ค์ฐจ๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐ€์ง„๋‹ค (13).

2.4 ์ƒํƒœ, ํ–‰๋™, ๋ณด์ƒ ์ •์˜

๋ณธ ์ ˆ์—์„œ๋Š” 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด์„œ DQN ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๋ฅผ ์„ค๊ณ„ํ•  ๋•Œ, ํ•™์Šต์— ํ•„์š”ํ•œ data set์„ ๋ชจ์œผ๊ธฐ ์œ„ํ•œ ์ƒํƒœ(state), ํ–‰๋™(action), ๋ณด์ƒ(reward)์„ ์ •์˜ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ DQN์—์„œ ์ƒํƒœ $s$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

(10)
$$s=|\bar{\delta}-\hat{\delta}|$$

๋‹ค์Œ์œผ๋กœ Environment๋ฅผ ๋ณ€ํ™”์‹œํ‚ฌ ํ–‰๋™์„ ์ •์˜ํ•œ๋‹ค. ์•ž์—์„œ ์–ธ๊ธ‰ํ•œ ๋ฐ”์™€ ๊ฐ™์ด ์‹œ์Šคํ…œ์— ์™ธ๋ž€์ด ์ธ๊ฐ€๋  ๋•Œ, ์ƒํƒœ ์ถ”์ • ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ด€์ธก๊ธฐ ์ด๋“์ด ์ปค์•ผํ•˜์ง€๋งŒ ์ถœ๋ ฅ์— ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•  ๊ฒฝ์šฐ, ๊ด€์ธก ์˜ค์ฐจ๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐ€์ง„๋‹ค. ๋”ฐ๋ผ์„œ ์™ธ๋ž€๊ณผ ์ธก์ • ์žก์Œ์— ๋ชจ๋‘ ๊ฐ•์ธํ•œ ๊ด€์ธก๊ธฐ ์„ค๊ณ„๋ฅผ ์œ„ํ•ด์„œ๋Š” ์ƒํ™ฉ์— ๋งž๋Š” ๊ด€์ธก๊ธฐ ์ด๋“ ์„ ํƒ์ด ์ค‘์š”ํ•˜๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์„ ํƒ ๊ฐ€๋Šฅํ•œ ํ–‰๋™ ์ง‘ํ•ฉ $A$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ถ”์ • ์†๋„์— ๋”ฐ๋ผ ๋ฏธ๋ฆฌ ์„ค๊ณ„๋œ ๊ด€์ธก๊ธฐ ์ด๋“์œผ๋กœ ๊ฒฐ์ •ํ•œ๋‹ค.

(11)
$A =[L_{pole_{1}}L_{pole_{2}}... L_{pole_{p}}].$

์œ„ ์‹์—์„œ $L_{pole_{p}}$๋Š” ๊ด€์ธก ์˜ค์ฐจ ์‹œ์Šคํ…œ์˜ ๊ทน์ ์„ $s= -p$ ์‚ผ์ค‘๊ทผ์ด ๋˜๋„๋ก ํ–ˆ์„ ๋•Œ์˜ ๊ด€์ธก ์ด๋“ ํ–‰๋ ฌ์ด๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ ์ƒํƒœ์— ๋”ฐ๋ฅธ ํ–‰๋™์„ ํ‰๊ฐ€ํ•  ๋ณด์ƒ์„ ์ •์˜ํ•œ๋‹ค. $\bar{\delta}$๊ฐ€ ์ธก์ • ์žก์Œ์„ ํฌํ•จํ•œ ์ „๋ ฅ๊ฐ์ด๊ณ  ์‹คํ—˜์„ ํ†ตํ•ด ์ธก์ • ์žก์Œ์˜ ์ตœ๋Œ€ ํฌ๊ธฐ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๊ณ  ํ•  ๋•Œ, ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ ์ƒ์ˆ˜ $\nu$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

(12)
$\nu =\max(|\bar{\delta}-\hat{\delta}|).$

์ด๋•Œ ๊ด€์ธก ์˜ค์ฐจ $|\bar{\delta}-\hat{\delta}|$๊ฐ€ ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ ์ƒ์ˆ˜ $\nu$๋ณด๋‹ค ์ž‘์œผ๋ฉด ๊ด€์ธก๊ธฐ ์ด๋“์€ ์ž‘์•„์ ธ์•ผํ•˜๊ณ  ๋ฐ˜๋Œ€์ผ ๊ฒฝ์šฐ ๊ด€์ธก๊ธฐ ์ด๋“์ด ์ปค์ ธ์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์ •์ฑ…์— ๋”ฐ๋ผ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ ์ ˆํ•œ ์กฐ๊ฑด๋ฌธ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ•ํ™”ํ•™์Šต์„ ์œ„ํ•œ ๋ณด์ƒ์„ ์„ค๊ณ„ํ•œ๋‹ค. ๋ณด์ƒ์€ ํ˜„์žฌ ํ–‰๋™์— ๋Œ€ํ•ด Environment๋กœ๋ถ€ํ„ฐ ์ข‹๊ณ  ๋‚˜์จ์˜ ํ‰๊ฐ€๋ฅผ ์ˆ˜์น˜์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•œ ๊ฒƒ์œผ๋กœ, ํ˜„์žฌ ์ƒํƒœ์™€ ๋‹ค์Œ ์ƒํƒœ ๊ทธ๋ฆฌ๊ณ  ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ ์ƒ์ˆ˜ $\nu$์˜ ํฌ๊ธฐ ๊ด€๊ณ„์— ๋”ฐ๋ผ์„œ ๋ณด์ƒ์„ ๋‚˜๋ˆ„์–ด ์„ค๊ณ„ํ•œ๋‹ค.

๋ณด์ƒ ์ •์ฑ…์— ๋”ฐ๋ผ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” $r_{1}$์„ ํ†ตํ•ด์„œ ํ˜„์žฌ ๊ด€์ธก ์˜ค์ฐจ๊ฐ€ $\nu$๋ณด๋‹ค ํด ๋•Œ ์ƒ๋Œ€์ ์œผ๋กœ ํฐ ๊ด€์ธก๊ธฐ ์ด๋“์„ ์„ ํƒํ•˜๋„๋ก ํ•˜์˜€์œผ๋ฉฐ, ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ์—๋Š” ์ž‘์€ ๊ด€์ธก๊ธฐ ์ด๋“์„ ์„ ํƒํ•˜๋„๋ก ํ•˜์˜€๋‹ค. $r_{2}$๋ฅผ ํ†ตํ•ด์„œ ํ˜„์žฌ ํ–‰๋™์— ๋”ฐ๋ฅธ ๋‹ค์Œ ๊ด€์ธก ์˜ค์ฐจ๊ฐ€ ์—ฌ์ „ํžˆ $\nu$๋ณด๋‹ค ํด ๋•Œ ํฐ ๊ด€์ธก๊ธฐ ์ด๋“์„ ์„ ํƒํ•˜๋„๋ก ํ•˜์˜€์œผ๋ฉฐ, ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ์—๋Š” ์—ฌ์ „ํžˆ ์ž‘์€ ๊ด€์ธก๊ธฐ ์ด๋“์„ ์„ ํƒํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ์ด๋•Œ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ๋ณด์ƒ ๊ฐ’์„ ์กฐ์ •ํ•˜์˜€๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ์„ธ ๊ฐ€์ง€ ํ–‰๋™์— ๋Œ€ํ•ด ์ด๋ฅผ ์ˆ˜์‹์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์œผ๋ฉฐ ์ตœ์ข…์ ์œผ๋กœ ๋ฐ›๊ฒŒ ๋  ๋ณด์ƒ์€ $r_{1}$๊ณผ $r_{2}$๋ฅผ ๋”ํ•˜์—ฌ ์Œ์˜ ๋ถ€ํ˜ธ๋ฅผ ์ทจํ•œ ๊ฐ’์ด๋‹ค.

(13a)
$r_{1}=\begin{cases} \dfrac{\xi_{1}}{T_{1}(\mu)}&{for}s\ge\nu \\ \dfrac{\xi_{2}}{T_{2}(\mu)}ยท\alpha &{for}s <\nu . \end{cases}$

(13b)
$r_{2}=\begin{cases} \dfrac{\xi_{3}}{T_{3}(\mu)}&{for}s'\ge\nu \\ \dfrac{\xi_{4}}{T_{4}(\mu)}ยท\beta &{for}s'<\nu . \end{cases}$

(13c)
$r = -(r_{1}+ r_{2}).$

์œ„ ์‹์—์„œ $\mu$๋Š” ์‹(8)๋กœ๋ถ€ํ„ฐ ๋„์ถœ๋œ ์–‘์˜ ์ •์ˆ˜๊ฐ’ $a$์ด๊ณ  $T_{n}(\mu)$($n=1,\:2,\:3,\:4$)์€ ์Šค์นผ๋ผ ๊ฐ’์œผ๋กœ $T_{n}$์˜ $\mu$๋ฒˆ์งธ ๊ฐ’์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ ์„ค๊ณ„์ž๊ฐ€ ๊ฒฐ์ •ํ•œ๋‹ค. $\xi_{1}$, $\xi_{2}$, $\xi_{3}$, $\xi_{4}$ ๋˜ํ•œ ์„ค๊ณ„์ž๊ฐ€ ๊ฒฐ์ •ํ•œ ์ƒ์ˆ˜ ๊ฐ’์ด๋ฉฐ, ๋ณด์ƒ $r_{1}$๊ณผ $r_{2}$์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•œ ์ƒ์ˆ˜ $\alpha$์™€ $\beta$๋ฅผ ์ด์šฉํ•˜์˜€๋‹ค. ๋ณ€์ˆ˜ ๊ฐ’๋“ค์„ ์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ ์ ˆ์—์„œ ์‹คํ—˜์„ ํ†ตํ•ด ์ถ”๊ฐ€ ์„ค๋ช…ํ•œ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ ์„ ํƒ๋œ ํ–‰๋™์— ๋Œ€ํ•ด์„œ ์‹(13)๊ณผ ๊ฐ™์ด ๋ณด์ƒ์„ ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์„ค๊ณ„์ž๊ฐ€ ์›ํ•˜๋Š” ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ ์ ˆํ•œ ๋ณด์ƒ ์ •์ฑ…์— ๋”ฐ๋ฅธ ๋ณด์ƒ ์„ค๊ณ„๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.

๋‹ค์Œ ์ ˆ์—์„œ๋Š” ํ˜„์žฌ ์ƒํƒœ์™€ ๋‹ค์Œ ์ƒํƒœ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ชจ์˜์‹คํ—˜์—์„œ ์‚ฌ์šฉํ•  ๋ณด์ƒ์„ ์ •์˜ํ•˜๊ณ  DQN ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ์˜ ์„ฑ๋Šฅ ํ™•์ธ์„ ์œ„ํ•œ ๋ชจ์˜์‹คํ—˜์„ ์ง„ํ–‰ํ•œ๋‹ค.

2.5 ๋ชจ์˜์‹คํ—˜

๋ณธ ์ ˆ์—์„œ๋Š” ํ‘œ 1์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง„ 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ์„ค๊ณ„ํ•œ ๊ด€์ธก๊ธฐ์˜ ์™ธ๋ž€ ์ถ”์ • ์„ฑ๋Šฅ์„ ๋ชจ์˜ํ•œ๋‹ค. ๋ฐœ์ „ ๋ชจ์„ ์—์„œ ๊ณ ์žฅ ์ง€์ ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ ๋น„์œจ($\lambda$)์€ 50[$\%$]์ด๊ณ  1.027์ดˆ์— 3์ƒ ๋‹จ๋ฝ ๊ณ ์žฅ์ด ๋ฐœ์ƒํ•˜๊ณ  ๊ณ ์žฅ ๋ฐœ์ƒ ์•ฝ 0.39์ดˆ ์•ˆ์— ์ฐจ๋‹จ๊ธฐ๊ฐ€ ์ •์ƒ ๋™์ž‘ํ•œ ์ƒํ™ฉ์„ ๊ณ ๋ คํ•œ๋‹ค. ์ด๋•Œ ์™ธ๋ž€ $d$์˜ ํฌ๊ธฐ๋Š” 1.15[${pu}$]์ด๊ณ  ์‹œ์Šคํ…œ์˜ ์ดˆ๊ธฐ ์ƒํƒœ๋Š” $[0.46055 \quad 0]^{T}$์ด๋‹ค.

ํ‘œ 1. 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ ๋ชจ์˜์‹คํ—˜ ํŒŒ๋ผ๋ฏธํ„ฐ

Table 1. Simulation Parameters of SMIB System

$P_{\max}$

$1.8[{pu}]$

$| V |$

$1[{pu}]$

$P_{m}$

$0.8[{pu}]$

$\omega_{0}$

$120\pi[{rad}/\sec]$

$D$

$12.5$

$f_{0}$

$60[{Hz}]$

$H$

$5[{MJ}/{MVA}]$

$X_{s}$

$j0.5[{ohm}]$

$| E |$

$1.17[{pu}]$

$X_{L1},\: X_{L2}$

$j0.3[{ohm}]$

์ธก์ • ์žก์Œ์€ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ์ด๋ฉฐ, ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ ์ƒ์ˆ˜ $\nu$์˜ ๊ฐ’์€ $3.8e-3$์ด๋‹ค. DQN์˜ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์„ ํƒ ๊ฐ€๋Šฅํ•œ ํ–‰๋™ ์ง‘ํ•ฉ($A$)๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•˜์˜€๋‹ค. ์ด๋•Œ ๊ณ ์žฅ์œผ๋กœ ์ธํ•œ ์™ธ๋ž€์ด ์‹œ์Šคํ…œ์— ์ธ๊ฐ€๋˜์—ˆ์„ ๋•Œ, ์™ธ๋ž€์˜ ์ถ”์ •์น˜๊ฐ€ ๊ณ„ํ†ต ์ฃผํŒŒ์ˆ˜์ธ 60[Hz]์˜ 4 ์‚ฌ์ดํด ์ด๋‚ด๋กœ ์‹ค์ œ ์™ธ๋ž€์— ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ด€์ธก๊ธฐ์˜ ์ตœ๊ณ  ์ด๋“์„ ์ •ํ–ˆ์œผ๋ฉฐ, ์ธก์ • ์žก์Œ์— ์˜ํ•œ ๊ด€์ธก ์˜ค์ฐจ ์ฆ๊ฐ€์˜ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๊ด€์ธก๊ธฐ์˜ ์ตœ์†Œ ์ด๋“์„ ์ •ํ•˜์˜€๋‹ค.

(14)
$A =[L_{pole_{10}} \quad L_{pole_{30}} \quad L_{pole_{120}}].$

๋ณด์ƒ์€ ์‹(13)์„ ์ด์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ฐ ๋ณ€์ˆ˜๋“ค์€ ๋ณด์ƒ ์ •์ฑ…์— ์˜ํ•ด์„œ ์‹คํ—˜์ ์œผ๋กœ ์ •ํ•ด์ง„ ๊ฐ’์ด๋‹ค. ์„ ํƒ๋œ ํ–‰๋™์— ๋Œ€ํ•ด ์ตœ์ข…์ ์œผ๋กœ ์Œ์˜ ๊ฐ’์œผ๋กœ ์ •ํ–ˆ์œผ๋ฉฐ, ํ•™์Šต์— ์˜ํ•ด ๋ณด์ƒ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก, ์ฆ‰, ๊ด€์ธก๊ธฐ ์ถ”์ • ์˜ค์ฐจ๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ณด์ƒ์— ์‚ฌ์šฉํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์˜€๋‹ค. ์ด๋•Œ, $\mu$๋Š” 1๊ณผ 3์‚ฌ์ด์˜ ์ •์ˆ˜ ๊ฐ’์ด๊ณ  $\mu$์— ๋”ฐ๋ฅธ ์Šค์นผ๋ผ $T_{n}(\mu)$($n=1,\:2,\:3,\:4$)์˜ ๊ฐ’์€ ํ‘œ 2์™€ ๊ฐ™๋‹ค. ๋˜ํ•œ ์ƒ์ˆ˜ $\xi_{1}$, $\xi_{2}$, $\xi_{3}$, $\xi_{4}$๋Š” ๊ฐ๊ฐ 1000, 1000, 2000, 1๋กœ ์ •ํ•˜์˜€์œผ๋ฉฐ, ๋ณด์ƒ $r_{1}$๊ณผ $r_{2}$์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•œ $\alpha$์™€ $\beta$ ๊ฐ’์€ ๊ฐ๊ฐ 0.5, 130์œผ๋กœ ์ •ํ•˜์˜€๋‹ค.

ํ‘œ 2. ๋ณด์ƒ ์„ค๊ณ„ ํŒŒ๋ผ๋ฏธํ„ฐ

Table 2. Parameters for reward design

$\mu = 1$

$\mu = 2$

$\mu = 3$

$T_{1}(\mu)$

$5$

$20$

$20$

$T_{2}(\mu)$

$500$

$10$

$5$

$T_{3}(\mu)$

$10$

$200$

$2000$

$T_{4}(\mu)$

$200$

$10$

$1$

ํ•œํŽธ, DNN์˜ ์ดˆ๊ธฐํ™”์—๋Š” Xavier initializer๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ์ตœ์ ํ™”์—๋Š” Adam(Adaptive Moment Estimation) optimizer๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋˜ํ•œ ํ™œ์„ฑํ™”์—๋Š” ์ถœ๋ ฅ์ธต์„ ์ œ์™ธํ•˜๊ณ  ReLu๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํ•™์Šต์„ ์œ„ํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํ‘œ 3๊ณผ ๊ฐ™๋‹ค.

ํ‘œ 3. ํ•™์Šต ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’

Table 3. Learning hyper-parameters and their values

minibatch size

$128$

discount factor

$0.99$

replay memory size

$100000$

learning rate

$0.002$

replay start size

$5000$

initial exploration($\epsilon$)

$1$

target network

update frequency

$5000$

final exploration($N$)

$0.1$

ํ•™์Šต์ด ์ง„ํ–‰๋œ Deep Q-Network๋ฅผ ์ด์šฉํ•˜์—ฌ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ(Proposed Obs.)๋ฅผ ์„ค๊ณ„ํ•˜์˜€๊ณ , ์ถ”์ • ์„ฑ๋Šฅ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์ฐธ๊ณ ๋ฌธํ—Œ (11)์„ ์ฐธ๊ณ ํ•˜์—ฌ ์™ธ๋ž€ ์ถ”์ •์น˜๊ฐ€ ๊ณ„ํ†ต ์ฃผํŒŒ์ˆ˜์˜ 5 ์ฃผ๊ธฐ ์ด๋‚ด๋กœ ์™ธ๋ž€์— ์ˆ˜๋ ดํ•˜๊ธฐ ์œ„ํ•œ ๊ด€์ธก๊ธฐ (9)์˜ ํŠน์„ฑ๋‹คํ•ญ์‹์ด $(s+400)$$(s+100\pm j200)$์ด ๋˜๋„๋ก ๊ณ ์ด๋“ ๊ด€์ธก๊ธฐ(Conven- tional Obs. 1)๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ธก์ • ์žก์Œ์˜ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๊ด€์ธก๊ธฐ (9)์˜ ํŠน์„ฑ๋‹คํ•ญ์‹์ด $(s+10)^{3}$์ด ๋˜๋„๋ก ์ €์ด๋“ ๊ด€์ธก๊ธฐ(Conventional Obs. 2)๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค.

๊ทธ๋ฆผ. 4. Conventional Obs. 1์˜ ์ƒํƒœ ์ถ”์ • ์„ฑ๋Šฅ

Fig. 4. State estimation performance of Conventional Obs. 1

../../Resources/kiee/KIEE.2020.69.7.1095/fig4.png

๊ทธ๋ฆผ. 5. Conventional Obs. 2์˜ ์ƒํƒœ ์ถ”์ • ์„ฑ๋Šฅ

Fig. 5. State estimation performance of Conventional Obs. 2

../../Resources/kiee/KIEE.2020.69.7.1095/fig5.png

๊ทธ๋ฆผ. 6. Proposed Obs.์˜ ์ƒํƒœ ์ถ”์ • ์„ฑ๋Šฅ

Fig. 6. State estimation performance of Proposed Obs.

../../Resources/kiee/KIEE.2020.69.7.1095/fig6.png

๊ทธ๋ฆผ. 7. Proposed Obs.์˜ ๊ฐ ์‹œ๊ฐ„ step์—์„œ์˜ ๊ด€์ธก ์ด๋“

Fig. 7. Observer gains at each time step of Proposed Obs.

../../Resources/kiee/KIEE.2020.69.7.1095/fig7.png

๊ทธ๋ฆผ 4 ~ 6์— $\delta$, $\omega_{\triangle}$, $d$์™€ ๊ฐ ๊ด€์ธก๊ธฐ์˜ ์ถ”์ •์น˜๋ฅผ ํ•จ๊ป˜ ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค. ๊ทธ๋ฆผ 4 ~ 6์˜ $\delta$์™€ ๊ทธ ์ถ”์ •์น˜์— ๋Œ€ํ•œ ๊ทธ๋ž˜ํ”„์—์„œ ๋ถ€๋ถ„ ํ™•๋Œ€๋œ ๊ทธ๋ž˜ํ”„์˜ y์ถ• ๋ฒ”์œ„๋Š” 26.3 ~ 26.6[deg]์ด๋‹ค. ๊ทธ๋ฆผ 4๋Š” ๊ณ ์ด๋“ ๊ด€์ธก๊ธฐ๋ฅผ ํ†ตํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. ์ œ์•ˆ๋œ ๊ด€์ธก๊ธฐ๋ณด๋‹ค ๋Š๋ฆฐ ์ถ”์ • ์†๋„๋กœ ์„ค๊ณ„๋œ ๊ด€์ธก๊ธฐ์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ธก์ • ์žก์Œ์˜ ์˜ํ–ฅ์œผ๋กœ ์‹ค์ œ ์ƒํƒœ์™€ ์™ธ๋ž€์„ ์ž˜ ์ถ”์ •ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค.

๊ทธ๋ฆผ 5๋Š” ์ €์ด๋“ ๊ด€์ธก๊ธฐ์˜ ์ถ”์ •์น˜๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ ๊ด€์ธก ์ด๋“์ด ๋‚ฎ์•„์„œ ์ธก์ • ์žก์Œ์˜ ์˜ํ–ฅ์„ ํฌ๊ฒŒ ๋ฐ›์ง€ ์•Š์ง€๋งŒ, ์‹œ์Šคํ…œ์— ์™ธ๋ž€์ด ์ธ๊ฐ€๋˜์—ˆ์„ ๋•Œ ์™ธ๋ž€ ์ถ”์ •์ด ๋งค์šฐ ๋Š๋ฆฐ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ฆผ 6์€ ์ œ์•ˆํ•˜๋Š” ๊ด€์ธก๊ธฐ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฐ๊ณผ์ด๋ฉฐ ์‹ค์ œ ์ƒํƒœ์™€ ์™ธ๋ž€์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ถ”์ •ํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค. ๋˜ํ•œ ์™ธ๋ž€์˜ ์ถ”์ •์น˜๊ฐ€ ๊ณ„ํ†ต ์ฃผํŒŒ์ˆ˜์˜ 4 ์ฃผ๊ธฐ ์ด๋‚ด๋กœ ์‹ค์ œ ์™ธ๋ž€์— ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ ๊ทธ๋ฆผ 7์€ ์ œ์•ˆํ•˜๋Š” ๊ด€์ธก๊ธฐ์˜ ๊ฐ ์‹œ๊ฐ„ step์—์„œ ์„ ํƒ๋œ ํ–‰๋™์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ, ์„ ํƒ๋œ ํ–‰๋™์ด $L_{pole_{10}}$์ผ ๋•Œ 1, $L_{pole_{30}}$์ผ ๋•Œ 2 ๊ทธ๋ฆฌ๊ณ  $L_{pole_{120}}$์ผ ๋•Œ 3์˜ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ํ•™์Šต์— ์˜ํ•ด์„œ ์„ค๊ณ„๋œ ๊ด€์ธก๊ธฐ๊ฐ€ ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•  ๋•Œ์—๋„ ์ ์ ˆํ•œ ๊ด€์ธก ์ด๋“์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ DQN ๊ธฐ๋ฐ˜ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ๊ฐ€ ๊ฐ•์ธํ•œ ์ƒํƒœ ์ถ”์ •๊ณผ ์™ธ๋ž€ ์ถ”์ •์ด ๊ฐ€๋Šฅํ•˜๊ณ  ์‹ค์ œ ์„ ๋กœ์˜ ๊ณ ์žฅํŒ๋ณ„์— ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

3. ๊ฒฐ ๋ก 

๋ณธ ๋…ผ๋ฌธ์€ ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•  ๋•Œ 1๊ธฐ ๋ฌดํ•œ๋ชจ์„  ์‹œ์Šคํ…œ์˜ ์„ ๋กœ ๊ณ ์žฅ์„ ํŒ๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ•์ธํ•œ ๊ด€์ธก๊ธฐ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ธ Deep Q-Network๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ด€์ธก๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ๊ณผ์ •์„ ์„ค๋ช…ํ•˜์˜€๊ณ , Deep Q-Network ํ•™์Šต์—๋Š” Python์˜ tensorflow๋ฅผ ์ด์šฉํ•˜์˜€์œผ๋ฉฐ, ํ•™์Šต๋œ network๋ฅผ ์ด์šฉํ•˜์—ฌ ์„ ๋กœ ๊ณ ์žฅ์‹œ ์™ธ๋ž€ ์ถ”์ • ๋ชจ์˜์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.

๋ชจ์˜์‹คํ—˜์œผ๋กœ๋ถ€ํ„ฐ ์ธก์ • ์žก์Œ์ด ์กด์žฌํ•  ๋•Œ, ๊ณ ์ด๋“ ๊ด€์ธก๊ธฐ๋Š” ์ƒํƒœ ์ถ”์ •์ด ์ œ๋Œ€๋กœ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ์ž‘์€ ํฌ๊ธฐ์˜ ์ด๋“์„ ๊ฐ–๋Š” ๊ด€์ธก๊ธฐ๋Š” ๋งค์šฐ ๋Š๋ฆฐ ์ถ”์ • ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ฐ˜๋ฉด ์ œ์•ˆํ•˜๋Š” DQN ๊ธฐ๋ฐ˜ ๊ด€์ธก๊ธฐ๋Š” ์™ธ๋ž€์˜ ์ถ”์ •์น˜๊ฐ€ 60[Hz]์ธ ๊ณ„ํ†ต ์‹ ํ˜ธ์˜ 4์ฃผ๊ธฐ ์ด๋‚ด์— ์‹ค์ œ ์™ธ๋ž€์„ ์ž˜ ์ถ”์ •ํ•˜์˜€๋‹ค. ์ด๋•Œ ์ธก์ • ์žก์Œ๋งŒ ์กด์žฌํ•  ๊ฒฝ์šฐ ์ž‘์€ ๊ด€์ธก๊ธฐ ์ด๋“์„, ์‹œ์Šคํ…œ ์™ธ๋ž€์ด ์ธ๊ฐ€๋˜์—ˆ์„ ๊ฒฝ์šฐ ํฐ ๊ด€์ธก๊ธฐ ์ด๋“์„ ์ ์‘์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

ํ–ฅํ›„ ๋ณด๋‹ค ์ผ๋ฐ˜์ ์ธ ๋‹ค๊ธฐ๊ณ„ํ†ต์— ๋Œ€ํ•œ ๊ด€์ธก๊ธฐ ํ™œ์šฉ ์—ฐ๊ตฌ์™€ DQN ๊ด€์ธก๊ธฐ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋ณด์ƒ ๊ฒฐ์ • ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•  ์˜ˆ์ •์ด๋‹ค.

Acknowledgements

This research was supported by Korea Electric Power Corporation (Grant number: R17XA05-2).

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2019R1F1A1058543).

References

1 
A. R. Bergen, V. Vittal, 2000, Power System Analysis, 2nd ed., Prentice HallGoogle Search
2 
H. Saadat, 2002, Power System Analysis. 2nd ed., McGraw-HillGoogle Search
3 
G. W. Kim, S. H. Hyun, Feb 2005, Power System Analysis Using MATLAB 1, UUPGoogle Search
4 
J. D. Glover, T. J. Overbye, M. S. Sarma, 2016, Power System Analysis & Design, 6th ed., Cengage LearningGoogle Search
5 
S. R. Nam, J. K. Hong, S. H. Kang, J. K. Park, 2004, Analysis of characteristic frequency along fault distance on a transmission line, KIEE Trans., Vol. 53a, No. 8, pp. 432-437Google Search
6 
D. G. Lee, S. H. Kang, 2010, Distance relaying algorithm using a DFT-based modified phasor estimation method, KIEE Trans., Vol. 59, No. 8, pp. 1360-1365DOI
7 
A. P. Sakis Meliopoulos, G. J. Cokkinides, P. Myrda, Y. Liu, F. Rui, L. Sun, R. Huang, Z. Tan, 2017, Dynamic state estimation-based protection: Status and Promise, IEEE Trans. Power Delivery, Vol. 32, No. 1, pp. 320-330DOI
8 
E. Farantatos, R. Huang, G. J. Cokkinides, Aug 2016, A predictive generator out-of-step protection and transient stability moni- toring scheme enabled by a distributed dynamic state estimator, IEEE Trans. Power Del., Vol. 31, No. 4, pp. 1826-1835DOI
9 
S. K. Kim, Sep 2019, Proportional-type Non-linear Excitation Controller with power angle reference estimator for single-machine infinite-bus power system, IET Gener. Transm. Distrib., Vol. 13, No. 18, pp. 4029-4036DOI
10 
D. G. Yoon, T. W. Kim, S. K. Kim, Jan 2007, Nonlinear input-output feedback linearization control of a single machine infinite bus power system, Journal of Control, Automation and Systems Engineering, Vol. 13, No. 1, pp. 1-5DOI
11 
S. Y. Jang, J. W. Kim, Y. I. Son, S. R. Nam, S. H. Kang, 2019, A Study on PI Observer Design for Line Fault Detection of a Single Machine Infinite Bus System, KIEE Trans., Vol. 68, No. 10, pp. 1184-1188Google Search
12 
Y. I. Son, I. H. Kim, K. S. Choi, H. Shim, 2015, Robust Cascade Control of Electric Motor Drives using Dual Reduced-Order PI Observer, IEEE Transactions on Industrial Electronics, Vol. 62, pp. 3672-3682DOI
13 
G. F. Franlin, J. D. Powell, A. Emami-Naeini, 2010, Feedback Control of Dynamic Systems, 6nd Ed., PEARSONGoogle Search
14 
S. Y. Jang, Y. I. Son, S. H. Kang, 2019, Design of a Rein- forcement Learning-Based Disturbance Observer for Line Fault Detection of a Single Machine Infinite Bus System, KIEE Trans., Vol. 68, No. 9, pp. 1060-1066Google Search
15 
R. S. Sutton, Aug 1991, Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin, Vol. 2, No. 4, pp. 160-163DOI
16 
C. J. C. H. Watkins, P. Dayan, May 1992, Q-learning, Machine Learning, Vol. 8, No. 3-4, pp. 279Google Search
17 
R. S. Sutton, A. G. Barto, 1998, Reinforcement learning: An introduction, MIT pressDOI
18 
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, Marc G. Bellemare, A. Graves, M. Riedmiller, Andreas K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Feb 2015, Human-level control though deep reinforcement learning, Nature, Vol. 518, pp. 529-533DOI
19 
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Pan- neershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, Jan 2016, Mastering the game of Go with deep neural networks and tree search, Nature, Vol. 529, pp. 484-489DOI

์ €์ž์†Œ๊ฐœ

์–‘์„ ์ง(Sun Jick Yang)
../../Resources/kiee/KIEE.2020.69.7.1095/au1.png

He received the B.S. degree from Myongji University, Korea, in 2019, where he is currently working toward the M.S. degree.

His current research interests are robust and adaptive control of electric machines using artificial intelligence and observers.

์žฅ์ˆ˜์˜(Su Young Jang)
../../Resources/kiee/KIEE.2020.69.7.1095/au2.png

He received the B.S. and M.S. degrees from Myongji University, Korea, in 2018 and 2020, respectively.

His current research interests are robust and adaptive control of electrical machines using artificial intelligence.

์†์˜์ต(Young Ik Son)
../../Resources/kiee/KIEE.2020.69.7.1095/au3.png

He received the B.S., M.S., and Ph.D. degrees from Seoul National University, Korea, in 1995, 1997 and 2002, respectively.

He was a visiting scholar at Cornell University (2007~2008) and University of Connecticut (2016~2017).

Since 2003, he has been with the Department of Electrical Engineering at Myongji University, Korea, where he is currently a professor.

His research interests include robust controller design and its application to industrial electronics.