• ๋Œ€ํ•œ์ „๊ธฐํ•™ํšŒ
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ๋‹จ์ฒด์ด์—ฐํ•ฉํšŒ
  • ํ•œ๊ตญํ•™์ˆ ์ง€์ธ์šฉ์ƒ‰์ธ
  • Scopus
  • crossref
  • orcid

  1. (Dept. of Electrical and Computer Engineering, Inha University, Incheon, Korea)



Reinforcement Learning, Double Inverted Pendulum, Sim-to-Real Learning, Recovery Property

1. ์„œ ๋ก 

๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์€ ์ œ์–ด๊ณตํ•™์ ์ธ ์ธก๋ฉด์—์„œ ๋ถˆ์•ˆ์ •ํ•œ ๋™ํŠน์„ฑ๊ณผ ๋น„์„ ํ˜• ๋ชจ๋ธ ๋ฐฉ์ •์‹, ๊ทธ๋ฆฌ๊ณ  ๋น„์ตœ์†Œ ์œ„์ƒ์ด๋ผ๋Š” ๋‚œ๋„ ๋†’์€ ํŠน์„ฑ์„ ๋ชจ๋‘ ํ•จ์œ ํ•˜๋Š” ์‹œ์Šคํ…œ์ด๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ ๋•Œ๋ฌธ์— ํ•ด๋‹น ์‹œ์Šคํ…œ์€ ์˜ค๋žœ ๊ธฐ๊ฐ„ ๋‹ค์–‘ํ•œ ์ œ์–ด ๊ธฐ๋ฒ•์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ํ…Œ์ŠคํŠธ๋ฒ ๋“œ๋กœ์จ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜์–ด ์™”๋‹ค. ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์„ ํ™œ์šฉํ•œ ์ฃผ์š” ์—ฐ๊ตฌ ๋ถ„์•ผ๋Š” ์ง„์ž๋ฅผ ๋„๋ฆฝ์‹œํ‚ค๊ธฐ ์œ„ํ•œ swing-up ์ œ์–ด์ด๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ œ์–ด ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜์—ˆ๋‹ค(1-2). ํŠนํžˆ ์ง„์ž์˜ ๋‹จ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•œ ํ˜•ํƒœ์ธ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ฒฝ์šฐ์—๋Š” swing-up ์ œ์–ด ๋ฌธ์ œ ์ž์ฒด์˜ ๋‚œ๋„๊ฐ€ ๋†’์•„ 2007๋…„์— ์™€์„œ์•ผ Graichen์— ์˜ํ•ด 2์ž์œ ๋„ ๊ตฌ์กฐ์˜ ํšจ๊ณผ์ ์ธ swing-up ์ œ์–ด ๊ธฐ๋ฒ•์ด ์ œ์‹œ๋˜์—ˆ๋‹ค(3).

๋˜ํ•œ, ์ตœ๊ทผ ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์˜ ๊ธ‰๊ฒฉํ•œ ๋ฐœ์ „์— ๋”ฐ๋ผ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•œ ๊ฐ•ํ™”ํ•™์Šต์„ ์ œ์–ด๊ณตํ•™ ๋ถ„์•ผ์— ์ ์šฉํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค(4). ๊ฐ•ํ™”ํ•™์Šต์€ ์—์ด์ „ํŠธ๊ฐ€ ๊ด€์ธกํ•œ ํ™˜๊ฒฝ์˜ ์ƒํƒœ ์ •๋ณด์— ๋”ฐ๋ผ ์ž์‹ ์˜ ํ–‰๋™ ์ •์ฑ…์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํ–‰๋™์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ๋กœ ์ธํ•ด ๋ณ€ํ™”ํ•œ ํ™˜๊ฒฝ์œผ๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง€๋Š” ๋ณด์ƒ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ์ž์‹ ์˜ ํ–‰๋™ ์ •์ฑ…์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ฐœ์„ ํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค. ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ๋Š” ์ฃผ์–ด์ง„ ์‹œ์Šคํ…œ์˜ ์ƒํƒœ ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ•™์Šต๋œ ํ–‰๋™ ์ •์ฑ…์— ๋”ฐ๋ฅธ ์ตœ์ ์˜ ํ–‰๋™, ์ฆ‰ ์ œ์–ด๋Ÿ‰์„ ์ถœ๋ ฅํ•˜๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์œตํ•ฉ์ ์ธ ์—ฐ๊ตฌ ๋ถ„์•ผ์— ์žˆ์–ด์„œ๋„ ์—ฌ์ „ํžˆ ๋„๋ฆฝ์ง„์ž ๋ฐ ๋‹ค๋‹จ ๋„๋ฆฝ์ง„์ž๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ์˜ ํšจ์šฉ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•œ ํ…Œ์ŠคํŠธ๋ฒ ๋“œ๋กœ์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๊ธฐ์กด์˜ ์ „ํ†ต์ ์ธ ์ œ์–ด๊ธฐ๋ฅผ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์•ž์„œ ์–ธ๊ธ‰ํ•œ swing-up ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ฑฐ๋‚˜(5-6), ํ˜น์€ ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆ๋˜๋Š” ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต ๊ธฐ๋ฒ•์˜ ์„ฑ๋Šฅ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ๋ฐœํžˆ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค(7-8).

ํ•˜์ง€๋งŒ ๊ฐ•ํ™”ํ•™์Šต์—์„œ ํ•™์Šต์˜ ์ฃผ์ฒด๊ฐ€ ๋˜๋Š” ์—์ด์ „ํŠธ๊ฐ€ ์‹ค๋ฌผ ์‹œ์Šคํ…œ๊ณผ ์ง์ ‘ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ, ๋ช‡๊ฐ€์ง€ ๋ฌธ์ œ์ ์ด ๋ฐœ์ƒํ•œ๋‹ค. ์—ฌ๊ธฐ์—” ์ƒํ˜ธ์ž‘์šฉ์— ์š”๊ตฌ๋˜๋Š” ๋ฌผ๋ฆฌ์ ์ธ ์‹œ๊ฐ„์˜ ์†Œ์š”์™€ ๋ฐ์ดํ„ฐ ํš๋“ ๋น„์šฉ์˜ ์ฆ๊ฐ€, ๊ทธ๋ฆฌ๊ณ  ์‹คํ—˜ ์ค‘ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌผ๋ฆฌ์ ์ธ ์œ„ํ—˜ ๋“ฑ์˜ ์š”์†Œ๋“ค์ด ํฌํ•จ๋œ๋‹ค(9). ์ƒ๊ธฐ๋œ ๋ฌธ์ œ์ ๋“ค๋กœ ์ธํ•ด ๊ฐ•ํ™”ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ œ์–ด๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ์ฃผ๋กœ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์˜ ๋™ํŠน์„ฑ์„ ๋ฌ˜์‚ฌํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹์˜ ์‹คํ—˜์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค(10). ์ด๋ ‡๊ฒŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ํ•™์Šต์ด ์ด๋ฃจ์–ด์ง€๊ณ , ํ•™์Šต์ด ์™„๋ฃŒ๋œ ํ›„ ์ด๋ฅผ ์‹ค์ œ ์‹œ์Šคํ…œ์— ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹์„ Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์ด๋ผ๊ณ  ํ†ต์นญํ•œ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์—๋Š” ํ•œ ๊ฐ€์ง€ ํฐ ๋ฌธ์ œ์ ์ด ์กด์žฌํ•˜๋Š”๋ฐ, ์ด๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ ๊ฐ„์—๋Š” ํ•ญ์ƒ ๊ฐ„๊ทน, ์ฆ‰ ํ˜„์‹ค ๊ฒฉ์ฐจ(reality gap)๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋‘ ํ™˜๊ฒฝ ๊ฐ„์˜ ํ˜„์‹ค ๊ฒฉ์ฐจ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ ์›ํ™œํ•˜๊ฒŒ ๋™์ž‘ํ•˜์ง€ ์•Š๊ฑฐ๋‚˜, ๋™์ž‘์˜ ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค(11). ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ €์ž๋“ค์ด ์†ํ•œ ์—ฐ๊ตฌ์‹ค์—์„œ ์˜ค๋žœ๊ธฐ๊ฐ„ ์—ฐ๊ตฌํ–ˆ๋˜ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์ œ์–ด๊ณตํ•™ ๋ฐ ๊ธฐ๊ตฌํ•™์  ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ, ์‹ค์ œ ์‹œ์Šคํ…œ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์— ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ๊ณผ ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•˜๋„๋ก ์„ค๊ณ„ํ•˜์—ฌ ์ด ๊ฒฉ์ฐจ๋ฅผ ์ตœ์†Œํ™” ํ•œ๋‹ค. ํ•ด๋‹น ์‹œ์Šคํ…œ์— Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ ์šฉํ•  ๊ฒฝ์šฐ ํ˜„์‹ค ๊ฒฉ์ฐจ๋กœ ์ธํ•œ ์„ฑ๋Šฅ ์ €ํ•˜์˜ ๊ฑฑ์ • ์—†์ด, ๋ฌผ๋ฆฌ์ ์ธ ์ œ์•ฝ์œผ๋กœ๋ถ€ํ„ฐ ์ž์œ ๋กœ์šด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ํ™œ์šฉํ•ด ํญ๋„“์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ทจ๋“ํ•˜๊ณ  ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋œ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋Š” ๊ธฐ์กด์˜ ์ „ํ†ต์ ์ธ ์ œ์–ด ๊ธฐ๋ฒ•์œผ๋กœ๋Š” ๋„๋‹ฌํ•˜๊ธฐ ์–ด๋ ค์› ๋˜ ์ƒˆ๋กœ์šด ์ œ์–ด ๋ฐฉ์‹์˜ ๊ตฌํ˜„์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ „ํ†ต์ ์ธ ์ œ์–ด ๊ธฐ๋ฒ•์œผ๋กœ๋Š” ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋˜ ๋ฌธ์ œ๋ฅผ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์ด ์ œ์‹œ๋œ๋‹ค.

์ด๋ฅผ ๋’ท๋ฐ›์นจ ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ swing-up ์ œ์–ด์—์„œ ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” 2์ž์œ ๋„ ์ œ์–ด ๊ธฐ๋ฒ•(3)์œผ๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•œ ์ œ์–ด ๋™์ž‘์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. 2์žฅ์—์„œ๋Š” ์•ž์„œ ์–ธ๊ธ‰ํ•œ ์ œ์–ด๊ธฐ๋ฒ•์˜ ํ•œ๊ณ„์™€ ์ด๋ฅผ Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์œผ๋กœ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์•ˆ์— ๋Œ€ํ•˜์—ฌ ๊ตฌ์ฒด์ ์œผ๋กœ ์„œ์ˆ ํ•œ๋‹ค. ์ด์–ด์ง€๋Š” 3์žฅ์—์„œ๋Š” Sim-to-Real ํ•™์Šต์„ ์œ„ํ•ด ํ˜„์‹ค ๊ฒฉ์ฐจ๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” 2๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ์„ค๊ณ„ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ดํ›„ 4์žฅ์—์„œ ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ์ˆ ํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ 5์žฅ์—์„œ ๊ฒฐ๋ก ์„ ๋‹ค๋ฃจ๋Š” ๊ตฌ์„ฑ์„ ๊ฐ–๋Š”๋‹ค.

2. Recovery ํŠน์„ฑ์„ ๊ฐ–๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ

์„œ๋ก ์—์„œ ์–ธ๊ธ‰๋œ Graichen์ด ์ œ์‹œํ•œ ์ œ์–ด ๊ธฐ๋ฒ•์€ ์˜คํ”„๋ผ์ธ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ swing-up ๊ถค์ ์„ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•˜์—ฌ ์ด๋ฅผ ์•ž๋จน์ž„(feedforward) ํ˜•ํƒœ๋กœ ์‹œ์Šคํ…œ์— ์ธ๊ฐ€ํ•˜๊ณ , ํ•ด๋‹น ๊ถค์ ๊ณผ์˜ ์˜ค์ฐจ๋ฅผ ๋˜๋จน์ž„(feedback) ์ œ์–ด๋ฅผ ํ†ตํ•ด ๋ณด์ •ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค. ์ด๋Ÿฌํ•œ 2์ž์œ ๋„ ์ œ์–ด ๊ธฐ๋ฒ•์„ ํ†ตํ•ด 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ ˆ์ผ ๊ธธ์ด ์ œ์•ฝ์„ ๊ณ ๋ คํ•˜๋ฉด์„œ๋„ swing-up ์ œ์–ด๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ์ œ์–ด ๋ฐฉ์‹์„ ๋„์ž…ํ•˜์˜€๋‹ค. 2013๋…„์—๋Š” ๋‹ค๋ฅธ ์—ฐ๊ตฌ์ž๊ฐ€ ๋™์ผํ•œ ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ๊ตฌ์กฐ์ ์œผ๋กœ ๋” ๋†’์€ ๋‚œ๋„๋ฅผ ๊ฐ–๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ swing-up ์ œ์–ด๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ํ•ด๋‹น ์ œ์–ด ๋ฐฉ์‹์˜ ์šฐ์ˆ˜์„ฑ์„ ๋‹ค์‹œ ํ•œ๋ฒˆ ๊ฒ€์ฆํ•˜์˜€๋‹ค(12).

ํ•˜์ง€๋งŒ ํ•ด๋‹น ์ œ์–ด ๊ธฐ๋ฒ•์—๋Š” ์น˜๋ช…์ ์ธ ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค. ์ด๋Š” ๊ฐ•ํ•œ ์™ธ๋ž€์ด ์ธ๊ฐ€๋  ๊ฒฝ์šฐ ์ œ์–ด๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ƒํƒœ์— ์ด๋ฅด๊ฒŒ ๋œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ผ์ • ์ˆ˜์ค€์˜ ์™ธ๋ž€์— ๋Œ€ํ•ด์„œ๋Š” ๋˜๋จน์ž„ ์ œ์–ด์˜ ๋ณด์ •์„ ํ†ตํ•ด ๊ฐ•๊ฑด์„ฑ์„ ๊ฐ–๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ๊ฐ•ํ•œ ์™ธ๋ž€์„ ์ธ๊ฐ€ํ•˜๊ฒŒ ๋  ๊ฒฝ์šฐ ์‹œ์Šคํ…œ์ด ๋ฏธ๋ฆฌ ๊ตฌํ•ด๋‘์—ˆ๋˜ ์„ ํ–‰ ๊ถค์ ๊ณผ ์•„์˜ˆ ๊ถค๊ฐ€ ๋‹ฌ๋ผ์ง€๋ฉฐ ์•ž๋จน์ž„ ์ œ์–ด๊ฐ€ ๋ฌด์˜๋ฏธํ•ด์ง€๊ฒŒ ๋œ๋‹ค. ์ด๋Š” ๋˜๋จน์ž„ ์ œ์–ด๋กœ๋„ ๋ณด์ •ํ•  ์ˆ˜ ์—†๋Š” ์ƒํƒœ๊ฐ€ ๋˜์–ด ๊ฒฐ๊ตญ ์ œ์–ด ๋ถˆ๋Šฅ ์ƒํƒœ์— ์ด๋ฅด๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฐ ์ƒํƒœ์— ์ด๋ฅด๊ฒŒ ๋  ๊ฒฝ์šฐ, ๊ธฐ์กด์˜ 2์ž์œ ๋„ ์ œ์–ด ๊ธฐ๋ฒ•์œผ๋กœ๋Š” ๋‹ค์‹œ swing-up ๋™์ž‘์„ ํ•  ์ˆ˜ ์—†๊ฒŒ ๋œ๋‹ค. ์„ ํ–‰ ๊ถค์ ์€ ์˜คํ”„๋ผ์ธ ์ƒํ™ฉ์—์„œ ๋ฏธ๋ฆฌ ์‚ฐ์ถœ๋˜๋Š” ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์—, ์‹œ์Šคํ…œ์ด ๋™์ž‘ํ•˜๋Š” ๋„์ค‘์—๋Š” ๋‹ค์‹œ ๊ถค์ ์„ ๊ตฌํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋ณธ ์—ฐ๊ตฌ๋Š” Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ์ž์‹ ์ด ๊ฒฝํ—˜ํ•ด๋ณธ ์ƒํƒœ ์ •๋ณด์™€ ๊ทธ ๋‹น์‹œ์˜ ๋ณด์ƒ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํ–‰๋™์ •์ฑ…์„ ๊ฐœ์„ ํ•œ๋‹ค. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์ตœ๋Œ€ํ•œ ๋งŽ์€ ์ƒํƒœ ์ •๋ณด๋ฅผ ์ถ•์ ํ•˜๊ณ , ๊ฐ ์ƒํƒœ๋งˆ๋‹ค ์ตœ์„ ์˜ ์ œ์–ด๋Ÿ‰์„ ๋„์ถœํ•˜๋Š” ์ˆ˜์ค€๊นŒ์ง€ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค. ํ•ด๋‹น ์‹œ์ ๊นŒ์ง€ ํ•™์Šต๋œ ์ œ์–ด๊ธฐ๋Š” ์–ด๋– ํ•œ ์ƒํƒœ์— ๋„๋‹ฌํ•ด๋„ ์›ํ•˜๋Š” ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋Š” ๋งˆ์น˜ ๋ฏธ๋กœ ์ฐพ๊ธฐ ๋ฌธ์ œ์™€ ๋™์ผํ•˜๊ฒŒ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ชฉ์ ์ง€๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ๋Š” ๋ฏธ๋กœ๊ฐ€ ์กด์žฌํ•  ๋•Œ, ํ•™์Šต์˜ ์‹œ์ž‘ ์ง€์ ์„ ๋ฏธ๋กœ์˜ ๋ฌด์ž‘์œ„ํ•œ ๊ณณ์œผ๋กœ ๋ฐฐ์น˜์‹œํ‚ค๊ณ  ๋ชฉ์ ์ง€๋ฅผ ํƒ์ƒ‰ํ•˜๋„๋ก ํ•™์Šต์„ ๋ฐ˜๋ณตํ•œ๋‹ค๋ฉด, ํ•™์Šต์ด ์™„๋ฃŒ๋œ ์ดํ›„์—๋Š” ์–ด๋–ค ์ง€์ ์—์„œ ํƒ์ƒ‰์„ ์‹œ์ž‘ํ•˜๋”๋ผ๋„ ๋ฐ”๋กœ ๋ชฉ์ ์ง€๋ฅผ ์ฐพ์•„ ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ๋„๋ฆฝ์ง„์ž ๋˜ํ•œ ์–ด๋–ค ์ƒํƒœ์— ๋„๋‹ฌํ•˜๋”๋ผ๋„ swing-up ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค. ๋„๋ฆฝ์ง„์ž์— ๊ฐ•ํ•œ ์™ธ๋ž€์ด ์ธ๊ฐ€๋˜์—ˆ์„ ๊ฒฝ์šฐ์—๋„ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” ์ด๋ฅผ ๋‹จ์ˆœํžˆ ํ™˜๊ฒฝ์˜ ์ƒํƒœ ์ •๋ณด๊ฐ€ ๋ณ€ํ™”ํ–ˆ๋‹ค๊ณ  ์ธ์‹ํ•œ ๋’ค, ํ•ด๋‹น ์‹œ์ ์— ์•Œ๋งž์€ ์ œ์–ด๋Ÿ‰์„ ์ถœ๋ ฅํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ swing-up ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

์ƒ๊ธฐ๋œ ๋ฐฉ์‹์˜ ํ•™์Šต์ด ์ด๋ฃจ์–ด์ง€๊ธฐ ์œ„ํ•ด์„œ๋Š” Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์ด ํ•„์ˆ˜์ ์œผ๋กœ ์š”๊ตฌ๋œ๋‹ค. ์•ž์„œ ๋น„์œ ๋ฅผ ๋“ค์—ˆ๋˜ ๋ฐฉ์‹์˜ ํ•™์Šต์„ ์œ„ํ•ด์„  ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ์ตœ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ์ƒํƒœ๋ฅผ ๊ฒฝํ—˜ํ•˜๋Š” ๊ฒƒ์ด ์š”๊ตฌ๋˜๋Š”๋ฐ, ์‹ค๋ฌผ ์‹œ์Šคํ…œ๋งŒ์„ ์‚ฌ์šฉํ•œ ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฌผ๋ฆฌ์ ์ธ ์ œ์•ฝ์กฐ๊ฑด์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ํ˜„์‹ค์˜ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์—์„œ๋Š” ์ค‘๋ ฅ์— ์˜ํ•ด ๋ชจ๋“  ์ง„์ž๊ฐ€ ๋ฐ”๋‹ฅ์„ ํ–ฅํ•œ ์ƒํƒœ ์™ธ์—๋Š”, ์—ฐ๊ตฌ์ž๊ฐ€ ๊ฐ ์ง„์ž๋“ค์˜ ๊ฐ๋„์™€ ๊ฐ์†๋„๋ฅผ ์ž„์˜๋กœ ์ดˆ๊ธฐํ™” ํ•  ์ˆ˜ ์—†๋‹ค. ์ด๋กœ ์ธํ•ด ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒํƒœ์˜ ๋ฒ”์œ„์—๋Š” ํ•œ๊ณ„๊ฐ€ ์ƒ๊ธฐ๊ณ , ๊ฒฝํ—˜ํ•ด๋ณด์ง€ ๋ชปํ•œ ์ƒํƒœ์— ๋Œ€ํ•ด์„œ๋Š” ํ•™์Šต์ด ์ด๋ฃจ์–ด์ง€์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋“  ์ƒํ™ฉ์— ๋Œ€์‘ํ•  ์ˆ˜ ์—†๊ฒŒ ๋˜๋ฉฐ, ์ด๋Š” ๊ฒฐ๊ตญ ์™ธ๋ž€์ด ์ธ๊ฐ€๋œ ์ƒํ™ฉ์—๋„ ์™„๋ฒฝํ•˜๊ฒŒ ๋Œ€์ฒ˜ํ•  ์ˆ˜ ์—†๊ฒŒ ๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์•ผ๊ธฐํ•œ๋‹ค.

ํ•˜์ง€๋งŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๊ตฌ์„ฑ๋œ ํ™˜๊ฒฝ์€ ๊ทธ๋Ÿฌํ•œ ๋ฌผ๋ฆฌ์ ์ธ ์ œ์•ฝ์œผ๋กœ๋ถ€ํ„ฐ ์ž์œ ๋กœ์›Œ์ง„๋‹ค. ์ด๋Ÿฌํ•œ ํ™˜๊ฒฝ์—์„œ๋Š” ๋งค๋ฒˆ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์‹œ์ž‘ํ•  ๋•Œ ๋งˆ๋‹ค ๋‘ ์ง„์ž์˜ ๊ฐ๋„์™€ ๊ฐ์†๋„, ๋‚˜์•„๊ฐ€ ๋Œ€์ฐจ์˜ ์œ„์น˜์™€ ๊ฐ€์†๋„๊นŒ์ง€. ์ƒํƒœ ์ •๋ณด์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  ๊ฐ’์„ ์—ฐ๊ตฌ์ž๊ฐ€ ์ž„์˜๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ™˜๊ฒฝ์˜ ํŠน์„ฑ์„ ํ™œ์šฉํ•˜๊ฒŒ ๋˜๋ฉด, ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์—์„œ๋Š” ํ•œ๋ฒˆ๋„ ๊ฒช์ง€ ๋ชปํ–ˆ์„๋ฒ•ํ•œ ์ƒํ™ฉ์— ๋Œ€ํ•ด์„œ๋„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์ž์œ ๋กญ๊ฒŒ ํ•™์Šต์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” ๊ด‘๋ฒ”์œ„ํ•œ ์ƒํƒœ ์ •๋ณด๋ฅผ ์ถ•์ ํ•˜๊ณ , ๊ทธ์— ๋Œ€ํ•œ ํ–‰๋™์„ ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์ด ๋งค์šฐ ์šฉ์ดํ•ด์ง„๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ•ํ•œ ์™ธ๋ž€์ด ์ธ๊ฐ€๋œ ์ƒํ™ฉ์„ ๋งž์ดํ•˜๋”๋ผ๋„ ์ด๋ฏธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ์—์„œ ๊ฒฝํ—˜ํ–ˆ๋˜ ์ƒํƒœ ์ •๋ณด์— ํ•ด๋‹นํ•  ํ™•๋ฅ ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์—, ์ œ์–ด ๋ถˆ๋Šฅ ์ƒํƒœ์— ๋น ์ง€์ง€ ์•Š๊ณ  ์„ฑ๊ณต์ ์ธ ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ๊ฐ–๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ๊ฐ€ โ€˜Recovery ํŠน์„ฑโ€™์„ ๊ฐ–๋Š”๋‹ค๊ณ  ๋ช…๋ช…ํ•œ๋‹ค. ์ด๋Š” ๊ธฐ์กด์˜ ์ œ์–ด๊ธฐ์—์„œ๋Š” ๊ฐ•ํ•œ ์™ธ๋ž€์„ ์ธ๊ฐ€ํ–ˆ์„ ์‹œ ๋ถˆ์•ˆ์ •ํ•œ ์ƒํƒœ๋กœ ์ฒœ์ด๋˜์–ด ์ œ์–ด๊ฐ€ ๋ถˆ๋Šฅํ•ด์ง€๋Š”๋ฐ ๋ฐ˜ํ•ด, Sim-to-Real ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•ด ๊ตฌํ˜„๋œ ์ œ์–ด๊ธฐ๋Š” ๋ถˆ์•ˆ์ •ํ•œ ์ƒํƒœ์— ์ด๋ฅธ ๋’ค์—๋„ ๋‹ค์‹œ ์•ˆ์ •ํ•œ ์ƒํƒœ๋กœ โ€˜ํšŒ๋ณตโ€™ํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์„ฑ์„ ๊ฐ€์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

ํ•˜์ง€๋งŒ ์ƒ๊ธฐ๋œ Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์˜ ๋…ํŠนํ•œ ์ด์ ์„ ์ œ๋Œ€๋กœ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ์„œ๋ก ์—์„œ ์–ธ๊ธ‰๋œ ํ˜„์‹ค ๊ฒฉ์ฐจ๋ฅผ ์ตœ์†Œํ™” ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋„๋ก ์‹ค์ œ ์‹œ์Šคํ…œ์˜ ๋ชจ๋ธ ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ์ „์ œ๊ฐ€ ๊ฐ•๋ ฅํ•˜๊ฒŒ ์š”๊ตฌ๋œ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ํ•™์Šต์ด ์™„๋ฒฝํ•˜๊ฒŒ ์ด๋ฃจ์–ด์กŒ๋‹ค๊ณ  ํ•˜๋”๋ผ๋„, ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ๋Š” ๋™์  ํŠน์„ฑ์ด ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚œ๋‹ค๋ฉด ์ด๋Š” ์ „ํ˜€ ํšจ์šฉ์„ฑ์ด ์—†๋Š” ์ œ์–ด๊ธฐ๊ฐ€ ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ํ›„์ˆ ๋  3์žฅ์—์„œ๋Š” ์‹ค์ œ ์‹œ์Šคํ…œ์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ ๋ฐฉ์ •์‹์„ ๊ตฌํ•˜๊ณ , ๊ทธ์™€ ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•˜๋„๋ก ๊ธฐ๊ตฌ์  ๊ตฌ์กฐ๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค.

3. ๋ชจ๋ธ ์ •ํ•ฉ์„ฑ์ด ๋†’์€ 2๋‹จ ๋„๋ฆฝ์ง„์ž ๊ตฌ์กฐ

3.1 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ๋ฐฉ์ •์‹

๊ทธ๋ฆผ 1์€ ์‹คํ—˜์— ์‚ฌ์šฉ๋œ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ์  ๊ฐœ๋…๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๊ทธ๋ฆผ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ณ€์ˆ˜๋“ค์€ SI ๋‹จ์œ„๊ณ„๋ฅผ ์‚ฌ์šฉํ•จ์„ ๊ฐ€์ •ํ•˜๊ณ , ์„ธ๋ถ€์ ์ธ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $M$์€ ๋Œ€์ฐจ(cart)์˜ ์งˆ๋Ÿ‰, $m_{1}$, $m_{2}$๋Š” ๊ฐ๊ฐ 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ์งˆ๋Ÿ‰์„ ์˜๋ฏธํ•˜๋ฉฐ $l_{1}$, $l_{2}$๋Š” ๊ฐ๊ฐ 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•์œผ๋กœ๋ถ€ํ„ฐ ๋ฌด๊ฒŒ ์ค‘์‹ฌ๊นŒ์ง€์˜ ๊ธธ์ด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. $\theta_{1}$์€ 1๋‹จ ์ง„์ž์˜ ํšŒ์ „ ๋ณ€์œ„๋กœ์จ

๊ทธ๋ฆผ. 1. 2๋‹จ ๋„๋ฆฝ์ง„์ž ๊ธฐ๊ตฌ์  ๊ฐœ๋…๋„

Fig. 1. Mechanical conceptual diagram of a double inverted pendulum

../../Resources/kiee/KIEE.2023.72.12.1705/fig1.png

์ง€๋ฉด์— ๋Œ€ํ•œ ๋ฒ•์„ ๊ณผ ์ด๋ฃจ๋Š” ๊ฐ์ด๋ฉฐ, $\theta_{2}$๋Š” 2๋‹จ ์ง„์ž๊ฐ€ 1๋‹จ ์ง„์ž์™€ ์ด๋ฃจ๋Š” ์ƒ๋Œ€์ ์ธ ํšŒ์ „๋ณ€์œ„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , $L_{1}$์€ 1๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๋ถ€ํ„ฐ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๊นŒ์ง€์˜ ๊ธธ์ด๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  $c_{1}$๊ณผ $c_{2}$๋Š” 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋งˆ์ฐฐ๊ณ„์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, $y$๋Š” ๋Œ€์ฐจ์˜ ์ดˆ๊ธฐ์œ„์น˜๋กœ๋ถ€ํ„ฐ์˜ ๋ณ€์œ„, $u$๋Š” ๋Œ€์ฐจ์˜ ๊ฐ€์†๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋˜ํ•œ, $i$,$j$,$k$๋Š” ๋ ˆ์ผ์˜ ์ค‘์‹ฌ์ ์„ ์›์ ์œผ๋กœ ํ•œ ์ง๊ฐ์ขŒํ‘œ๊ณ„์˜ ๊ฐ ์ขŒํ‘œ์ถ•์„ ์˜๋ฏธํ•œ๋‹ค.

2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ์€ Euler-Lagrange equation์„ ์ด์šฉํ•˜์—ฌ ์œ ๋„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹ (1)๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(1)
\begin{align*} \left[\begin{aligned}n_{1}\\n_{2}\end{aligned}\right]\ddot y +\begin{bmatrix}m_{11}&m_{12}\\m_{21}&m_{22}\end{bmatrix}\left[\begin{aligned}\ddot\theta_{1}\\\ddot\theta_{2}\end{aligned}\right]+\left[\begin{aligned}r_{1}\\r_{2}\end{aligned}\right]& = 0. \end{align*}

์œ„ ์‹์˜ ๊ฐ ์š”์†Œ๋Š” ์‹(2)์™€ ๊ฐ™๋‹ค.

(2)
\begin{align*} & n_{1}& = & h_{1}\cos(\theta_{1})+h_{2}\cos(\theta_{1}+\theta_{2}),\:\\ &{n}_{2}& = &{h}_{2}\cos(\theta_{1}+\theta_{2}),\:\\ &{m}_{11}& = &{h}_{3}+{h}_{6}+2{h}_{4}\cos(\theta_{2}),\:\\ &{m}_{12}& = &{h}_{6}+{h}_{4}\cos(\theta_{2}),\:\\ &{m}_{21}& = &{h}_{6}+{h}_{4}\cos(\theta_{2}),\:\\ &{m}_{22}& = &{h}_{6},\:\\ &{r}_{1}& = & -{h}_{4}\sin(\theta_{2})(2\dot\theta_{1}\dot\theta_{2}+\dot\theta_{2}^{2})-{h}_{5}\sin\theta_{1}\\ & & & -{h}_{7}\sin(\theta_{1}+\theta_{2})+{c}_{1}\dot\theta_{1},\:\\ &{r}_{2}& = &{h}_{4}\sin(\theta_{2})\dot\theta_{1}^{2}-{h}_{7}\sin(\theta_{1}+\theta_{2})+{c}_{2}\dot\theta_{2}. \end{align*}

$h_{1}$ ~ $h_{7}$์€ ์‹ (3)์˜ ํ˜•ํƒœ๋กœ ์ •์˜๋˜๊ณ , ์—ฌ๊ธฐ์„œ $g$๋Š” ์ค‘๋ ฅ๊ฐ€์†๋„ 9.81[m/${s}^{2}$]๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

(3)
\begin{align*} & h_{1}& = & m_{1}l_{1}+m_{2}L_{1},\:\\ & h_{2}& = & m_{2}l_{2},\:\\ & h_{3}& = & I_{1}+m_{1}l_{1}^{2}+m_{2}L_{1}^{2},\:\\ & h_{4}& = & m_{2}L_{1}l_{2},\:\\ & h_{5}& = & g(m_{1}l_{1}+m_{2}L_{1)},\:\\ & h_{6}& = & I_{2}+m_{2}l_{2}^{2},\:\\ & h_{7}& = & gm_{2}l_{2}. \end{align*}

์‹ (1)์„ ์žฌ๋ฐฐ์—ด ํ•˜๋ฉด ์‹ (4)์˜ ํ˜•ํƒœ๋กœ ๋‹ค์‹œ ํ‘œ๊ธฐํ•  ์ˆ˜ ์žˆ๊ณ ,

(4)
\begin{align*} &\left[\begin{aligned}\ddot\theta_{1}\\\ddot\theta_{2}\end{aligned}\right]= -\left[\begin{matrix} m_{11}&m_{12}\\m_{21}&m_{22}\end{matrix}\right]^{-1}\left\{\left[\begin{aligned}n_{1}\\n_{2}\end{aligned}\right]\ddot y +\left[\begin{aligned}r_{1}\\r_{2}\end{aligned}\right]\right\}. \end{align*}

์œ„ ์‹์„ ์ „๊ฐœํ•˜๊ฒŒ ๋˜๋ฉด ์‹ (5)๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ์‹ (5)์—์„œ ์ƒํƒœ ๋ฒกํ„ฐ๋ฅผ $x_{1}=y$, $x_{2}=\theta_{1}$, $x_{3}=\theta_{2}$, $x_{4}=\dot y$, $x_{5}=\dot\theta_{1}$, $x_{6}=\dot\theta_{2}$๋กœ ์ •์˜ํ•˜๊ณ  $\ddot y$์„ ๊ฐ€์†๋„ $u$๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด, ์ตœ์ข…์ ์œผ๋กœ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ชจ๋ธ๋ฐฉ์ •์‹์€ ์‹ (6)๊ณผ ๊ฐ™์€ ๋น„์„ ํ˜• ์ƒํƒœ๋ฐฉ์ •์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(5)
\begin{align*} &\ddot\theta_{1}& = &\dfrac{(-m_{22}n_{1}+ m_{12}n_{2})\ddot y +(-m_{22}r_{1}+ m_{12}r_{2})}{\Phi},\:\\ &\ddot\theta_{2}& = &\dfrac{(m_{21}n_{1}- m_{11}n_{2})\ddot y +(m_{21}r_{1}- m_{11}r_{2})}{\Phi},\:\\ &\Phi & = & m_{11}m_{22}- m_{12}m_{21}. \end{align*}

(6)
\begin{align*} \underline BRACE\dot x\left[\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\dot x_{1}\\\dot x_{2}\end{aligned}\\\dot x_{3}\end{aligned}\\\dot x_{4}\end{aligned}\\\dot x_{5}\end{aligned}\\\dot x_{6}\end{aligned}\end{aligned}\right]& = &\underline BRACE f(x,\:u)\left[\begin{aligned}\left .\left .\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}x_{4}\\x_{5}\end{aligned}\\x_{6}\end{aligned}\\u\end{aligned}\\\dfrac{(-m_{22}n_{1}+ m_{12}n_{2})u +(-m_{22}r_{1}+ m_{12}r_{2})}{\Phi}\\\dfrac{(m_{21}n_{1}- m_{11}n_{2})u +(m_{21}r_{1}- m_{11}r_{2})}{\Phi}\end{aligned}\end{aligned}\right]\right .\end{aligned}\right . \end{align*}

์ƒ๊ธฐ๋œ ๋ชจ๋ธ๋ฐฉ์ •์‹์—์„œ 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž๋Š” ๊ฐ ์ค‘์‹ฌ์ ์—์„œ $i$์ถ• ๋ฐฉํ–ฅ์˜ ํšŒ์ „์ถ•์„ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๋Š” ํšŒ์ „๋งŒ์ด ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ๋˜ํ•œ ๋Œ€์ฐจ๋Š” $j$์ถ• ๋ฐฉํ–ฅ์˜ ์ˆ˜ํ‰์šด๋™๋งŒ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ณ , ๊ทธ ์ด์™ธ์˜ ์ˆ˜ํ‰์šด๋™๊ณผ ํšŒ์ „์šด๋™์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ํ›„์ˆ  ๋  2์ ˆ์—์„œ์˜ ๊ธฐ๊ตฌ๋ถ€ ๊ตฌ์กฐ๋Š” ์œ„ ๊ฐ€์ •์— ์ตœ๋Œ€ํ•œ ๋ถ€ํ•ฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ์ •ํ•ฉ์„ฑ์„ ์ตœ๋Œ€ํ™” ์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค.

3.2 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ๋ถ€ ๋ฐ ๊ตฌ๋™๋ถ€

๊ทธ๋ฆผ 1์€ ์‹คํ—˜์— ์‚ฌ์šฉ๋œ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ์  ๊ฐœ๋…๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๊ทธ๋ฆผ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ณ€์ˆ˜๋“ค์€ SI ๋‹จ์œ„๊ณ„๋ฅผ ์‚ฌ์šฉํ•จ์„ ๊ฐ€์ •ํ•˜๊ณ , ์„ธ๋ถ€์ ์ธ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $M$์€ ๋Œ€์ฐจ(cart)์˜ ์งˆ๋Ÿ‰, $m_{1}$, $m_{2}$๋Š” ๊ฐ๊ฐ 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ์งˆ๋Ÿ‰์„ ์˜๋ฏธํ•˜๋ฉฐ $l_{1}$, $l_{2}$๋Š” ๊ฐ๊ฐ 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•์œผ๋กœ๋ถ€ํ„ฐ ๋ฌด๊ฒŒ ์ค‘์‹ฌ๊นŒ์ง€์˜ ๊ธธ์ด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. $\theta_{1}$์€ 1๋‹จ ์ง„์ž์˜ ํšŒ์ „ ๋ณ€์œ„๋กœ์จ

๊ทธ๋ฆผ. 1. 2๋‹จ ๋„๋ฆฝ์ง„์ž ๊ธฐ๊ตฌ์  ๊ฐœ๋…๋„

Fig. 1. Mechanical conceptual diagram of a double inverted pendulum

../../Resources/kiee/KIEE.2023.72.12.1705/fig1.png

์ง€๋ฉด์— ๋Œ€ํ•œ ๋ฒ•์„ ๊ณผ ์ด๋ฃจ๋Š” ๊ฐ์ด๋ฉฐ, $\theta_{2}$๋Š” 2๋‹จ ์ง„์ž๊ฐ€ 1๋‹จ ์ง„์ž์™€ ์ด๋ฃจ๋Š” ์ƒ๋Œ€์ ์ธ ํšŒ์ „๋ณ€์œ„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , $L_{1}$์€ 1๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๋ถ€ํ„ฐ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๊นŒ์ง€์˜ ๊ธธ์ด๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  $c_{1}$๊ณผ $c_{2}$๋Š” 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋งˆ์ฐฐ๊ณ„์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, $y$๋Š” ๋Œ€์ฐจ์˜ ์ดˆ๊ธฐ์œ„์น˜๋กœ๋ถ€ํ„ฐ์˜ ๋ณ€์œ„, $u$๋Š” ๋Œ€์ฐจ์˜ ๊ฐ€์†๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋˜ํ•œ, $i$,$j$,$k$๋Š” ๋ ˆ์ผ์˜ ์ค‘์‹ฌ์ ์„ ์›์ ์œผ๋กœ ํ•œ ์ง๊ฐ์ขŒํ‘œ๊ณ„์˜ ๊ฐ ์ขŒํ‘œ์ถ•์„ ์˜๋ฏธํ•œ๋‹ค.

2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ์€ Euler-Lagrange equation์„ ์ด์šฉํ•˜์—ฌ ์œ ๋„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹ (1)๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(1)
\begin{align*} \left[\begin{aligned}n_{1}\\n_{2}\end{aligned}\right]\ddot y +\begin{bmatrix}m_{11}&m_{12}\\m_{21}&m_{22}\end{bmatrix}\left[\begin{aligned}\ddot\theta_{1}\\\ddot\theta_{2}\end{aligned}\right]+\left[\begin{aligned}r_{1}\\r_{2}\end{aligned}\right]& = 0. \end{align*}

์œ„ ์‹์˜ ๊ฐ ์š”์†Œ๋Š” ์‹(2)์™€ ๊ฐ™๋‹ค.

(2)
\begin{align*} & n_{1}& = & h_{1}\cos(\theta_{1})+h_{2}\cos(\theta_{1}+\theta_{2}),\:\\ &{n}_{2}& = &{h}_{2}\cos(\theta_{1}+\theta_{2}),\:\\ &{m}_{11}& = &{h}_{3}+{h}_{6}+2{h}_{4}\cos(\theta_{2}),\:\\ &{m}_{12}& = &{h}_{6}+{h}_{4}\cos(\theta_{2}),\:\\ &{m}_{21}& = &{h}_{6}+{h}_{4}\cos(\theta_{2}),\:\\ &{m}_{22}& = &{h}_{6},\:\\ &{r}_{1}& = & -{h}_{4}\sin(\theta_{2})(2\dot\theta_{1}\dot\theta_{2}+\dot\theta_{2}^{2})-{h}_{5}\sin\theta_{1}\\ & & & -{h}_{7}\sin(\theta_{1}+\theta_{2})+{c}_{1}\dot\theta_{1},\:\\ &{r}_{2}& = &{h}_{4}\sin(\theta_{2})\dot\theta_{1}^{2}-{h}_{7}\sin(\theta_{1}+\theta_{2})+{c}_{2}\dot\theta_{2}. \end{align*}

$h_{1}$ ~ $h_{7}$์€ ์‹ (3)์˜ ํ˜•ํƒœ๋กœ ์ •์˜๋˜๊ณ , ์—ฌ๊ธฐ์„œ $g$๋Š” ์ค‘๋ ฅ๊ฐ€์†๋„ 9.81[m/${s}^{2}$]๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

(3)
\begin{align*} & h_{1}& = & m_{1}l_{1}+m_{2}L_{1},\:\\ & h_{2}& = & m_{2}l_{2},\:\\ & h_{3}& = & I_{1}+m_{1}l_{1}^{2}+m_{2}L_{1}^{2},\:\\ & h_{4}& = & m_{2}L_{1}l_{2},\:\\ & h_{5}& = & g(m_{1}l_{1}+m_{2}L_{1}),\:\\ & h_{6}& = & I_{2}+m_{2}l_{2}^{2},\:\\ & h_{7}& = & gm_{2}l_{2}. \end{align*}

์‹ (1)์„ ์žฌ๋ฐฐ์—ด ํ•˜๋ฉด ์‹ (4)์˜ ํ˜•ํƒœ๋กœ ๋‹ค์‹œ ํ‘œ๊ธฐํ•  ์ˆ˜ ์žˆ๊ณ ,

(4)
\begin{align*} &\left[\begin{aligned}\ddot\theta_{1}\\\ddot\theta_{2}\end{aligned}\right]= -\left[\begin{matrix} m_{11}&m_{12}\\m_{21}&m_{22}\end{matrix}\right]^{-1}\left\{\left[\begin{aligned}n_{1}\\n_{2}\end{aligned}\right]\ddot y +\left[\begin{aligned}r_{1}\\r_{2}\end{aligned}\right]\right\}. \end{align*}

์œ„ ์‹์„ ์ „๊ฐœํ•˜๊ฒŒ ๋˜๋ฉด ์‹ (5)๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ์‹ (5)์—์„œ ์ƒํƒœ ๋ฒกํ„ฐ๋ฅผ $x_{1}=y$, $x_{2}=\theta_{1}$, $x_{3}=\theta_{2}$, $x_{4}=\dot y$, $x_{5}=\dot\theta_{1}$, $x_{6}=\dot\theta_{2}$๋กœ ์ •์˜ํ•˜๊ณ  $\ddot y$์„ ๊ฐ€์†๋„ $u$๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด, ์ตœ์ข…์ ์œผ๋กœ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ชจ๋ธ๋ฐฉ์ •์‹์€ ์‹ (6)๊ณผ ๊ฐ™์€ ๋น„์„ ํ˜• ์ƒํƒœ๋ฐฉ์ •์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(5)
\begin{align*} &\ddot\theta_{1}& = &\dfrac{(-m_{22}n_{1}+ m_{12}n_{2})\ddot y +(-m_{22}r_{1}+ m_{12}r_{2})}{\Phi},\:\\ &\ddot\theta_{2}& = &\dfrac{(m_{21}n_{1}- m_{11}n_{2})\ddot y +(m_{21}r_{1}- m_{11}r_{2})}{\Phi},\:\\ &\Phi & = & m_{11}m_{22}- m_{12}m_{21}. \end{align*}

(6)
\begin{align*} \underline BRACE\dot x\left[\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\dot x_{1}\\\dot x_{2}\end{aligned}\\\dot x_{3}\end{aligned}\\\dot x_{4}\end{aligned}\\\dot x_{5}\end{aligned}\\\dot x_{6}\end{aligned}\end{aligned}\right]& = &\underline BRACE f(x,\:u)\left[\begin{aligned}\left .\left .\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}\begin{aligned}x_{4}\\x_{5}\end{aligned}\\x_{6}\end{aligned}\\u\end{aligned}\\\dfrac{(-m_{22}n_{1}+ m_{12}n_{2})u +(-m_{22}r_{1}+ m_{12}r_{2})}{\Phi}\\\dfrac{(m_{21}n_{1}- m_{11}n_{2})u +(m_{21}r_{1}- m_{11}r_{2})}{\Phi}\end{aligned}\end{aligned}\right]\right .\end{aligned}\right . \end{align*}

์ƒ๊ธฐ๋œ ๋ชจ๋ธ๋ฐฉ์ •์‹์—์„œ 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž๋Š” ๊ฐ ์ค‘์‹ฌ์ ์—์„œ $i$์ถ• ๋ฐฉํ–ฅ์˜ ํšŒ์ „์ถ•์„ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๋Š” ํšŒ์ „๋งŒ์ด ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ๋˜ํ•œ ๋Œ€์ฐจ๋Š” $j$์ถ• ๋ฐฉํ–ฅ์˜ ์ˆ˜ํ‰์šด๋™๋งŒ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ณ , ๊ทธ ์ด์™ธ์˜ ์ˆ˜ํ‰์šด๋™๊ณผ ํšŒ์ „์šด๋™์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ํ›„์ˆ  ๋  2์ ˆ์—์„œ์˜ ๊ธฐ๊ตฌ๋ถ€ ๊ตฌ์กฐ๋Š” ์œ„ ๊ฐ€์ •์— ์ตœ๋Œ€ํ•œ ๋ถ€ํ•ฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ์ •ํ•ฉ์„ฑ์„ ์ตœ๋Œ€ํ™” ์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค.

3.2 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ๋ถ€ ๋ฐ ๊ตฌ๋™๋ถ€

Sim-to-Real ํ•™์Šต์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ Sim์— ํ•ด๋‹นํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์ˆ˜ํ•™์  ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ Real์— ํ•ด๋‹นํ•˜๋Š” ์‹ค์ œ ์ง„์ž ์‹œ์Šคํ…œ์˜ ๋™์  ํŠน์„ฑ ๊ฐ„ ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•˜๋„๋ก ์„ค๊ณ„ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋‘˜ ๊ฐ„์˜ ์ •ํ•ฉ์„ฑ์ด ์ข‹์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š”, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ์—์„œ์˜ ํ•™์Šต์ด ์„ฑ๊ณต์ ์ด๋”๋ผ๋„ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์—์„œ ๊ทธ ์„ฑ๋Šฅ์„ ์ œ๋Œ€๋กœ ๋‚ด์ง€ ๋ชปํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋†’์•„์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์‹ค์ œ ์‹œ์Šคํ…œ์„ ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ์˜ ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•˜๋„๋ก ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ์‹ค์ œ ์‹œ์Šคํ…œ์˜ ๋™์ž‘์ด ๋ชจ๋ธ ๋ฐฉ์ •์‹์—์„œ ์‚ฌ์šฉ๋œ ๊ฐ€์ •๊ณผ ๋ถ€ํ•ฉํ•˜๋Š” ์›€์ง์ž„๋งŒ์„ ๊ฐ–๋„๋ก ์„ค๊ณ„ํ•ด์•ผํ•œ๋‹ค. ๋ชจ๋ธ ๋ฐฉ์ •์‹์—์„œ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ์š”์†Œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์˜ ๋™์  ์‘๋‹ต๊ฐ„์— ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ €์ž๋“ค์ด ์†ํ•œ ์—ฐ๊ตฌ์‹ค์—์„œ๋Š” ์˜ค๋žœ ๊ธฐ๊ฐ„ ๋‹ค์–‘ํ•œ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์„ ์ง์ ‘ ์ œ์ž‘ํ•˜๋ฉฐ ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•œ ๊ธฐ๊ตฌ์  ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ ๋ฐ” ์žˆ๋‹ค(13). ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•ด๋‹น ๊ตฌ์กฐ์—์„œ ๋” ๊ฐœ์„ ๋œ ํ˜•ํƒœ์˜ ๊ธฐ๊ตฌ๋ถ€์™€ ๊ตฌ๋™๋ถ€ ๊ตฌ์กฐ๋ฅผ ์„ค๊ณ„ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ๊ณผ ์‹ค์ œ ์‹œ์Šคํ…œ์˜ ์‘๋‹ต ์ •ํ•ฉ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ์‹œ์Šคํ…œ์˜ ํ˜„์‹ค ๊ฒฉ์ฐจ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์•ˆ์„ ์ œ์‹œํ•œ๋‹ค. ์ œ์•ˆ๋˜๋Š” 2๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ๊ธฐ๊ตฌ์  ๊ตฌ์กฐ๋Š” ๊ทธ๋ฆผ 2์— ๋ณด์ด๋Š” ๋ฐ”์™€ ๊ฐ™๋‹ค.

๊ทธ๋ฆผ. 2. 2๋‹จ ๋„๋ฆฝ์ง„์ž ๊ธฐ๊ตฌ์  ๊ตฌ์กฐ

Fig. 2. The mechanical structure of a double inverted pendulum

../../Resources/kiee/KIEE.2023.72.12.1705/fig2.png

3.2.1 ๊ตฌ๋™๋ถ€ ์„ค๊ณ„

๊ทธ๋ฆผ 3์€ ๊ธฐ์กด (13)์—์„œ ์ œ์•ˆํ–ˆ๋˜ ๊ตฌ์กฐ๋กœ, ํ’€๋ฆฌ์™€ ๊ตฌ๋™๋ถ€๊ฐ€ ๊ฒฐํ•ฉ๋œ ํ˜•ํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ํ•ด๋‹น ๊ตฌ์กฐ๋Š” ๊ฐ์†๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ BLDC ๋ชจํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง์ ‘ ํ’€๋ฆฌ๋ฅผ ๊ตฌ๋™ํ•จ์œผ๋กœ์จ ๋ฐฑ๋ž˜์‹œ๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋ฐฑ๋ž˜์‹œ๋กœ ์ธํ•œ limit cycle ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ํ˜•ํƒœ์ด๋‹ค. ๋˜ํ•œ, ํ’€๋ฆฌ์— ์žฅ์ฐฉ๋œ ํƒ€์ด๋ฐ ๋ฒจํŠธ์˜ ์žฅ๋ ฅ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด 2๊ฐœ์˜ ๋ฒ ์–ด๋ง์„ ์‚ฌ์šฉํ•ด 2์ค‘์œผ๋กœ ์ง€์ง€ํ•˜์—ฌ ๋ฒจํŠธ์˜ ์žฅ๋ ฅ์ด ํ’€๋ฆฌ๋ฅผ ๊ด€ํ†ตํ•˜๋Š” ์ถ•์—๋งŒ ์ „๋‹ฌ๋˜๊ฒŒ ํ•จ์œผ๋กœ์จ ๋ชจํ„ฐ์— ๊ฐ€ํ•ด์ง€๋Š” ๋ถ€ํ•˜๋ฅผ ์ œ๊ฑฐํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค.

๊ทธ๋ฆผ. 3. 3D ํ”„๋ฆฐํŠธ ๊ตฌ์กฐ๋ฌผ์„ ์‚ฌ์šฉํ•œ ๊ตฌ๋™๋ถ€ ๊ตฌ์กฐ

Fig. 3. Driving structure using 3D printed framework

../../Resources/kiee/KIEE.2023.72.12.1705/fig3.png

ํ•˜์ง€๋งŒ ํ•ด๋‹น ๊ตฌ์กฐ๋Š” ์ „์ฒด์ ์ธ ๊ตฌ๋™๋ถ€๋ฅผ ๊ฐ์‹ธ๊ณ  ์žˆ๋Š” ์†Œ์žฌ๊ฐ€ 3D ํ”„๋ฆฐํ„ฐ์—์„œ ์‚ฌ์šฉ๋˜๋Š” PLA ์†Œ์žฌ๋กœ, ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์€ ๊ฐ•๋„๋กœ ์ธํ•ด ํŒŒ์†์ด ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ๋ณ€ํ˜•๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ๋ฐœ์ƒํ•˜์˜€๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆผ 4์™€ ๊ฐ™์ด ํ•ด๋‹น ๊ตฌ์กฐ๋ฌผ ์ „์ฒด๋ฅผ ๊ฐ•์„ฑ

๊ทธ๋ฆผ. 4. ์•Œ๋ฃจ๋ฏธ๋Š„ ํ•ฉํŒ ๊ตฌ์กฐ๋ฌผ์„ ์‚ฌ์šฉํ•œ ๊ตฌ๋™๋ถ€ ๊ตฌ์กฐ

Fig. 4. Driving structure using an aluminum composite panel

../../Resources/kiee/KIEE.2023.72.12.1705/fig4.png

์ด ๋†’์€ ์•Œ๋ฃจ๋ฏธ๋Š„ ํ•ฉ๊ธˆ ์†Œ์žฌ(์•Œ๋ฃจ๋ฏธ๋Š„ 6061) ํŒ์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ๊ตฌ์กฐ๋ฌผ์˜ ์†์ƒ ๋ฐ ๋ณ€ํ˜•์„ ์ œ๊ฑฐํ•จ์œผ๋กœ์จ ๋ชจ๋ธ ๋ฐฉ์ •์‹์—์„œ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ์š”์†Œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๋ฐฐ์ œํ•˜์˜€๋‹ค.

3.2.2 ๋Œ€์ฐจ ๋ฐ ๋ ˆ์ผ๋ถ€ ์„ค๊ณ„

๋Œ€์ฐจ์˜ ๋ณ‘์ง„์šด๋™์„ ์œ„ํ•œ ๋„๋ฆฝ์ง„์ž์˜ ๋ ˆ์ผ ๊ตฌ์กฐ๋ฅผ ๊ทธ๋ฆผ 5์— ๋ณด์ด๋Š” V-slotํ˜• 2040 ํ”„๋กœํŒŒ์ผ์—์„œ, ๊ทธ๋ฆผ 6์— ๋‚˜ํƒ€๋‚œ 2๊ฐœ์˜ ์„ ํ˜• ๊ฐ€์ด๋“œ ๋ ˆ์ผ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ตฌ์กฐ๋กœ ๊ฐœ์„ ํ•˜์˜€๋‹ค. ๊ทธ๋ฆผ 5์˜ ๊ตฌ์กฐ์—์„œ๋Š” 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ swing-up ์ œ์–ด๋ฅผ ์œ„ํ•ด ๋Œ€์ฐจ์— ํž˜์„ ์ธ๊ฐ€ํ•˜๋ฉด, ๋Œ€์ฐจ์— ๊ฒฐํ•ฉ๋œ ํ”„๋กœํŒŒ์ผ์ด $\alpha$๋งŒํผ์˜ ๊ฐ๋„๋กœ ๋น„ํ‹€๋ฆผ์„ ๊ฒช๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ ˆ์ผ์˜ ๋น„ํ‹€๋ฆผ ๊ฐ๋„๋กœ ์ธํ•ด ์ง„์ž๊ฐ€ ๊ทธ๋ฆผ 1์—์„œ์˜ $j$์ถ•์„ ์ค‘์‹ฌ์œผ๋กœ ํšŒ์ „ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ์ด๋Š” ์•ž์„œ ์„œ์ˆ  ํ–ˆ๋˜ ๋ชจ๋ธ ๋ฐฉ์ •์‹์˜ ๊ฐ€์ •์— ์ „ํ˜€ ๋ถ€ํ•ฉํ•˜์ง€ ์•Š๋Š” ์š”์†Œ๋กœ ์ž‘์šฉํ•˜๊ฒŒ ๋œ๋‹ค. ํ•ด๋‹น ๊ตฌ์กฐ๋ฅผ ๊ทธ๋ฆผ 6์™€ ๊ฐ™์ด ์„ ํ˜• ๊ฐ€์ด๋“œ ๋ ˆ์ผ๋กœ ๋ณ€๊ฒฝํ•œ ๊ตฌ์กฐ์—์„œ๋Š” ์ƒ๊ธฐ๋œ $\alpha$์™€ ๊ฐ™์€ ๋ ˆ์ผ์˜ ๋น„ํ‹€๋ฆผ ์š”์†Œ๊ฐ€ ์ „ํ˜€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ฒŒ ๋œ๋‹ค. ๋”๋ถˆ์–ด, ๊ฐ€์ด๋“œ ๋ ˆ์ผ์— ๊ฒฐํ•ฉ๋œ ๋Œ€์ฐจ ๋˜ํ•œ $j$์ถ• ๋ฐฉํ–ฅ์˜ ์ˆ˜ํ‰ ์šด๋™๋งŒ์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜๊ณ , ๋น„ํ‹€๋ฆผ์— ์˜ํ•œ ๊ทธ ์™ธ์˜ ์ˆ˜ํ‰์šด๋™๊ณผ ํšŒ์ „์šด๋™์„ ์ œ๊ฑฐํ•˜์—ฌ 3.1์ ˆ์—์„œ ์„œ์ˆ ํ•œ ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ์˜ ์ •ํ•ฉ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

๊ทธ๋ฆผ. 5. 2040 ์•Œ๋ฃจ๋ฏธ๋Š„ ํ”„๋กœํŒŒ์ผ์„ ์ด์šฉํ•œ ๋ ˆ์ผ ๋ฐ ๋Œ€์ฐจ ๊ตฌ์กฐ

Fig. 5. The structure of the rail and cart constructed using 2040 aluminum profile

../../Resources/kiee/KIEE.2023.72.12.1705/fig5.png

๊ทธ๋ฆผ. 6. 3090 ์•Œ๋ฃจ๋ฏธ๋Š„ ํ”„๋กœํŒŒ์ผ๊ณผ ์ด์ค‘ ์„ ํ˜• ๊ฐ€์ด๋“œ ๋ ˆ์ผ์„ ์ด์šฉํ•œ ๋ ˆ์ผ ๋ฐ ๋Œ€์ฐจ ๊ตฌ์กฐ

Fig. 6. The structure of the rail and cart using 3090 aluminum profile and dual linear guide rails

../../Resources/kiee/KIEE.2023.72.12.1705/fig6.png

์ด์™ธ์—๋„ ํšŒ์ „ ์กฐ์ธํŠธ์— ๋ณต๋ ฌ ๋ฒ ์–ด๋ง ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง„์ž์˜ $i$์ถ• ์ค‘์‹ฌ ํšŒ์ „์„ ๊ฐ•์ œ์‹œํ‚ค๊ณ , ๋ฒ ์–ด๋ง์˜ ๊ณ ์ฒด ์ƒํƒœ ์œคํ™œ์ œ๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ ๋Œ€์ฐจ์˜ ์ •์ง€ ๋งˆ์ฐฐ๊ณผ ์ฟจ๋กฑ ๋งˆ์ฐฐ์„ ์ €๊ฐ์‹œํ‚ค๋Š” ๋“ฑ์˜ ์ถ”๊ฐ€์ ์ธ ์„ค๊ณ„๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ํ•ด๋‹น ๋‚ด์šฉ๋“ค์€ ์ฐธ๊ณ ๋ฌธํ—Œ (13)์—์„œ ๊ธฐ์ˆ ๋œ ๋ฐ”์™€ ๋™์ผํ•˜๊ธฐ์— ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ž์„ธํžˆ ๋‹ค๋ฃจ์ง€ ์•Š๊ธฐ๋กœ ํ•œ๋‹ค.

4. ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

๋ณธ ์žฅ์—์„œ๋Š” ์•ž์„œ ์„œ์ˆ ํ•œ ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ ํ•ด๋‹น ๋ชจ๋ธ์— ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•œ 2๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์„ ์ด์šฉํ•˜์—ฌ, Sim-to-Real ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•œ Recovery ํŠน์„ฑ์„ ๊ฐ–๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐœ๋ฐœ ๋ฐ ์‹คํ—˜ ํ™˜๊ฒฝ์œผ๋กœ๋Š” ์ €์ž๊ฐ€ ์ด์ „์— ์ž‘์„ฑํ•œ ๋ฌธํ—Œ (14)์—์„œ ์‚ฌ์šฉํ•œ ํ™˜๊ฒฝ์„ ์ผ๋ถ€ ๋ณ€ํ˜•ํ•˜์—ฌ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด๋•Œ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ํ•™์Šต์„ ์œ„ํ•ด ์ง์ ‘์ ์œผ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ํ™˜๊ฒฝ์€ 3์žฅ์—์„œ ์„œ์ˆ ํ•œ ์ˆ˜ํ•™์  ๋ชจ๋ธ์„ ๋ฐ”ํƒ•์œผ๋กœ Python์ƒ์— ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ํ•ด๋‹น ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ฌผ๋ฆฌ์  ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํ‘œ 1์— ๋‚˜์—ด๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ƒ๋ฏธ๋ถ„ ๋ฐฉ์ •์‹์˜ ํ•ด๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•œ ์†”๋ฒ„๋กœ๋Š” Runge-kutta ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•˜์˜€๋‹ค.

ํ‘œ 1. ์‹คํ—˜์— ์‚ฌ์šฉ๋œ 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ํŒŒ๋ผ๋ฏธํ„ฐ

Table 1. Parameters of the double inverted pendulum used in the experiment

Parameter

Link

$i=1$

$i=2$

$m_{i}$

0.2351 [kg]

0.1452 [kg]

$I_{i}$

0.0012 [kgm2]

0.0010 [kgm2]

$l_{i}$

0.0667 [m]

0.1288 [m]

$L_{i}$

0.1645 [m]

-

$c_{i}$

4.5116e-04

2.9198e-04

๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” ์—ฐ์†์ ์ธ ํ–‰๋™ ๊ณต๊ฐ„์„ ๊ฐ–๋Š” ์‹œ์Šคํ…œ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” SAC(Soft Actor Critic)์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—ฐ์†์ ์ธ ํ–‰๋™ ๊ณต๊ฐ„์„ ์ง€๋‹Œ ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ ๋†’์€ ํšจ์œจ์„ฑ๊ณผ ์•ˆ์ •์„ฑ์ด ์ž…์ฆ๋˜์—ˆ์œผ๋ฉฐ, ์ตœ๋Œ€ ์—”ํŠธ๋กœํ”ผ ํ•ญ์„ ํ•™์Šต ๊ณผ์ •์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ํƒํ—˜์„ ํ†ตํ•œ ํ–‰๋™ ์ •์ฑ…์˜ ๋‹ค์–‘์„ฑ๊ณผ ์•ˆ์ •์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค(15). ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ์žˆ์–ด ์‚ฌ์šฉ๋œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐ’๋“ค์€ ํ‘œ 2์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‘ ๊ฐœ์˜ ํžˆ๋“  ๋ ˆ์ด์–ด์˜ ์œ ๋‹› ์ˆ˜๊ฐ€ 400, 300์œผ๋กœ ๋ณ€๊ฒฝ๋œ ์  ์ด์™ธ์—๋Š” ์ฐธ๊ณ ๋ฌธํ—Œ (15)์˜ ์ €์ž๋“ค์ด ์‚ฌ์šฉํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„๋œ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” Python์ƒ์˜ 2๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ์ง€์†์ ์œผ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ swing-up์„ ํ•˜๊ธฐ์œ„ํ•œ ์ œ์–ด ๊ธฐ๋ฒ•์„ ํ•™์Šตํ•˜๊ฒŒ ๋œ๋‹ค.

ํ‘œ 2. SAC ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์‚ฌ์šฉ๋œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ

Table 2. Hyperparameters used in the SAC algorithm

Parameter

Value

optimizer

Adam(16)

learning rate

3e-04

discount factor ($\gamma$)

0.99

replay buffer size

1e6

number of hidden layer

2

number of hidden units per $1^{{st}}$ layer

400

number of hidden units per $2^{{nd}}$ layer

300

nonlinearity

ReLU

target smoothing coefficient ($\tau$)

0.005

2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ๊ด€์ธก ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ์˜ ์ƒํƒœ ์ •๋ณด๋Š” 3์žฅ์—์„œ ๊ธฐ์ˆ ๋œ ์ƒํƒœ ๋ฐฉ์ •์‹์— ๋”ฐ๋ผ <$y,\:\theta_{1},\:\theta_{2},\:\dot y ,\:\dot\theta_{1},\:\dot\theta_{2}$> ๋กœ ์ด๋ฃจ์–ด์ง„ 6๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ด๋•Œ, $\theta_{1}$๊ณผ $\theta_{2}$๋Š” ์ถ”ํ›„ ์›ํ™œํ•œ ๋ณด์ƒํ•จ์ˆ˜์˜ ์„ค๊ณ„๋ฅผ ์œ„ํ•ด ๋‚˜๋จธ์ง€ ์—ฐ์‚ฐ์„ ์ ์šฉํ•˜์—ฌ $-\pi <\theta <\pi$์˜ ๋ฒ”์œ„๋กœ ์ œํ•œํ•œ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ $\theta_{1}$๊ณผ $\theta_{2}$๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ์ •๊ทœํ™”์™€ ์—ฐ์†์„ฑ์˜ ์ด์ ์„ ์–ป๊ธฐ ์œ„ํ•ด sin($\theta_{i}$), cos($\theta_{i}$)์˜ ํ˜•ํƒœ๋กœ ์žฌ๊ตฌ์„ฑํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ์—๊ฒŒ ์ „๋‹ฌ๋˜๋Š” ์ƒํƒœ ์ •๋ณด์˜ ํ˜•ํƒœ๋Š” <$y,\:\sin(\theta_{1}),\:\cos(\theta_{1}),\:\sin(\theta_{2}),\:\cos(\theta_{2}),\:\dot y ,\:\dot\theta_{1},\:\dot\theta_{2}$> ๋กœ ๊ตฌ์„ฑ๋œ 8๊ฐœ์˜ ๋ฐ์ดํ„ฐ ๋ฌถ์Œ์ด ๋œ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” ํ•ด๋‹น ์ƒํƒœ ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์ž์‹ ์˜ ํ–‰๋™ ์ •์ฑ…์— ๋”ฐ๋ฅธ ํ–‰๋™, ์ฆ‰ ์ œ์–ด๋Ÿ‰์„ ์ถœ๋ ฅํ•œ๋‹ค. ์ด๋•Œ ์ถœ๋ ฅ๋˜๋Š” ์ œ์–ด๋Ÿ‰์€ ๋ชจํ„ฐ์˜ ๊ฐ€์†๋„ ๊ฐ’ $u$์— ํ•ด๋‹นํ•˜๋ฉฐ, ์‹ค์ œ ์‹œ์Šคํ…œ ๊ตฌ๋™๊ธฐ์˜ ์ž‘๋™ ๋Šฅ๋ ฅ์„ ๊ณ ๋ คํ•˜์—ฌ $-15<u<15$์˜ ๊ฐ’์œผ๋กœ ์ œํ•œํ•œ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜์ƒ์˜ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ํ•œ ์—ํ”ผ์†Œ๋“œ์˜ ๊ธธ์ด๋Š” 10์ดˆ๋กœ ์„ค์ •ํ•˜์˜€๊ณ  ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ 1ms ์ฃผ๊ธฐ๋กœ ์—…๋ฐ์ดํŠธ ๋˜๋ฉฐ, ํ•™์Šต ๊ณผ์ •์€ 10ms๋งˆ๋‹ค ์ด๋ฃจ์–ด์ง„๋‹ค. ๋”ฐ๋ผ์„œ ์—์ด์ „ํŠธ๋Š” ํ™˜๊ฒฝ๊ณผ ํ•œ ์—ํ”ผ์†Œ๋“œ๋‹น ์ตœ๋Œ€ 1000๋ฒˆ ์ƒํ˜ธ์ž‘์šฉ์„ ํ•˜๊ฒŒ ๋˜๊ณ , ์ƒํ˜ธ์ž‘์šฉ์ด ์ผ์–ด๋‚˜๋Š” ์ˆœ๊ฐ„๋งˆ๋‹ค ๊ทธ ์‹œ์ ์˜ ๋ณด์ƒ ๊ฐ’์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ž์‹ ์˜ ํ–‰๋™ ์ •์ฑ…์„ ๊ฐœ์„ ํ•œ๋‹ค. ๋ณด์ƒ ๊ฐ’์„ ์‚ฐ์ถœํ•˜๊ธฐ ์œ„ํ•œ ๋ณด์ƒํ•จ์ˆ˜๋Š” ์‹ (8)์˜ ํ˜•ํƒœ๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

(7)
\begin{align*} R_{u}=\exp(-0.03 | u |)\\ R_{y}=\exp(-0.001 y^{2})\\ R_{\theta_{1}}=0.5+0.5\cos(\theta_{1})\\ R_{\theta_{2}}=0.5+0.5\cos(\theta_{2})\\ R_{\dot\theta_{1}}=0.4+0.6\exp(-0.09\left |\dot\theta_{1}\right |)\\ R_{\dot\theta_{2}}=0.4+0.6\exp(-0.09\left |\dot\theta_{2}\right |) \end{align*}

(8)
$Reward = R_{u}\times R_{y}\times R_{\theta_{1}}\times R_{\theta_{2}}\times R_{\dot\theta_{1}}\times R_{\dot\theta_{2}}$

์ƒ๊ธฐ๋œ ๋ณด์ƒํ•จ์ˆ˜๋ฅผ ์ด๋ฃจ๋Š” ๊ฐ๊ฐ์˜ ์š”์†Œ๋Š” ๊ทธ๋ฆผ 7์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋“  ํ•ญ์€ 0์— ์ˆ˜๋ ดํ• ์ˆ˜๋ก ๋ณด์ƒ ๊ฐ’์ด ์ฆ๊ฐ€ํ•˜๋Š” ํŠน์„ฑ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‘ ๊ฐœ์˜ ์ง„์ž๊ฐ€ ๋ชจ๋‘ ๋„๋ฆฝ๋œ ์ƒํƒœ, ์ฆ‰ swing-up์— ์„ฑ๊ณตํ•œ ์ƒํƒœ์—์„œ ์ตœ์†Œํ•œ์˜ ์›€์ง์ž„๋งŒ์„ ์œ ์ง€ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ–‰๋™ ์ •์ฑ…์„ ํ•™์Šตํ•˜๊ฒŒ ๋œ๋‹ค.

๊ทธ๋ฆผ. 7. ๋ณด์ƒํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„

Fig. 7. Reward function graph

../../Resources/kiee/KIEE.2023.72.12.1705/fig7.png

์ถ”๊ฐ€์ ์œผ๋กœ $y$์˜ ๊ฐ’์ด 0.4[m]๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ํ•ด๋‹น ์—ํ”ผ์†Œ๋“œ๋Š” ํ•™์Šต์— ๋„์›€์ด ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ์‹œ์ ์—์„œ ์กฐ๊ธฐ ์ข…๋ฃŒ์‹œํ‚จ๋‹ค. ์ด๋Š” ์ถ”ํ›„ ํ•™์Šต๋œ ์ œ์–ด๊ธฐ๋ฅผ ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์ƒํ™ฉ์—์„œ, ์‹ค์ œ ์‹œ์Šคํ…œ์ด ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ๋ ˆ์ผ์˜ ๋ฒ”์œ„๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ์ƒํ™ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.

์ƒ๊ธฐ๋œ ์กฐ๊ฑด์˜ ์‹คํ—˜ ํ™˜๊ฒฝ์—์„œ ์—ํ”ผ์†Œ๋“œ๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋Š” ๊ทธ๋ฆผ 8์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์•ฝ 1800ํšŒ์˜ ํ•™์Šต์ด ๊ฒฝ๊ณผํ•œ ์‹œ์ ๋ถ€ํ„ฐ ์—ํ”ผ์†Œ๋“œ 10๊ฐœ์˜ ๋ณด์ƒ๊ฐ’ ํ‰๊ท ์ด ํ•™์Šต์— ์ˆ˜๋ ดํ•˜์˜€์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•™์Šต์ด ์ˆ˜๋ ดํ•œ ์ดํ›„์—๋„ ์ผ๋ถ€ ์—ํ”ผ์†Œ๋“œ์—์„œ๋Š” ๋ณด์ƒ์ด ํ˜„์ €ํžˆ ๋‚ฎ๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š” ํ˜„์ƒ์ด ๊ด€์ธก๋˜๋Š”๋ฐ, ์ด๋Š” ํ•ด๋‹น ์‹คํ—˜์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ œ์–ด๊ธฐ๊ฐ€ Recovery ํŠน์„ฑ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋„๋ก ๋ฌด์ž‘์œ„ํ•œ ์ดˆ๊ธฐ ์กฐ๊ฑด์—์„œ ์‹คํ–‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฐ ์—ํ”ผ์†Œ๋“œ๊ฐ€ ์‹œ์ž‘๋  ๋•Œ <$y,\:\theta_{1},\:\theta_{2},\:\dot y ,\:\dot\theta_{1},\:\dot\theta_{2}$>๋กœ ๊ตฌ์„ฑ๋œ ํ™˜๊ฒฝ์˜ ์ƒํƒœ ์ •๋ณด๋Š” ๋ฌด์ž‘์œ„์„ฑ์„ ๊ฐ–๋„๋ก ์ดˆ๊ธฐํ™”ํ•˜์—ฌ, ์—์ด์ „ํŠธ๊ฐ€ ๊ด‘๋ฒ”์œ„ํ•œ ์ƒํƒœ ์ •๋ณด๋ฅผ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜์˜€๋‹ค. ๊ฐ ์ƒํƒœ ์ •๋ณด๊ฐ€ ๋”ฐ๋ฅด๋Š” ๋‚œ์ˆ˜์˜ ๋ฒ”์œ„๋Š” ์‹ (9)์™€ ๊ฐ™๋‹ค.

(9)
\begin{align*} y\sim U(-0.2,\: 0.2)\\ \theta_{1}\sim U(-\pi ,\:\pi)\\ \theta_{2}\sim U(-\pi ,\:\pi)\\ \dot y\sim U(-1,\: 1)\\ \dot\theta_{1}\sim U(-10,\: 10)\\ \dot\theta_{2}\sim U(-20,\: 20) \end{align*}

ํ•˜์ง€๋งŒ ๋ฌด์ž‘์œ„์„ฑ์„ ๊ฐ–๋Š” 6๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ์—ฌ ํ•˜๋‚˜์˜ ์ƒํƒœ ์ •๋ณด๋ฅผ ํ˜•์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ฒฐํ•ฉ๋œ ์ƒํƒœ์ •๋ณด๊ฐ€ ๋ฌผ๋ฆฌ๋ฒ•์น™์„ ๋”ฐ๋ฅด์ง€ ์•Š๋Š” ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฐ ์ƒํƒœ ์ •๋ณด๋ฅผ ์ดˆ๊ธฐ ์กฐ๊ฑด์œผ๋กœ ๊ฐ€์ง€๊ณ  ๋ชจ๋ธ๋ฐฉ์ •์‹์˜ ์—ฐ์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋ฉด, ๋ฌผ๋ฆฌ๋ฒ•์น™์— ์œ„๋ฐฐ๋˜์–ด ๋ฌผ๋ฆฌ์  ์˜๋ฏธ๊ฐ€ ์—†๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๊ฒŒ ๋œ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ์˜ ์ž…์žฅ์—์„œ๋Š” ์ง€๊ธˆ๊นŒ์ง€ ํ•œ๋ฒˆ๋„ ๊ฒฝํ—˜ํ•ด๋ณด์ง€ ๋ชปํ•œ ์ƒํƒœ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ฒŒ ๋˜๋ฏ€๋กœ, ํ•™์Šต๋œ ํ–‰๋™์ •์ฑ…์ด ์•„๋‹Œ ๋ฌด์ž‘์œ„์„ฑ์ด ์ง™์€ ํ–‰๋™์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. ์ด๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์˜๋ฏธ ์—†๋Š” ๋Œ€์ฐจ์˜ ์ด๋™์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์กฐ๊ธฐ ์ข…๋ฃŒ ์กฐ๊ฑด์— ๋น ๋ฅด๊ฒŒ ๋„๋‹ฌํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค. ๋”ฐ๋ผ์„œ ํ•ด๋‹น ์—ํ”ผ์†Œ๋“œ๋Š” ์กฐ๊ธฐ์— ์ข…๋ฃŒ๋˜๋ฉฐ, ์ด๋Ÿฐ ํ˜„์ƒ์€ ๊ทธ๋ฆผ 8์—์„œ ์ตœ์ข… ๋ณด์ƒ์ด ๋‚ฎ์€ ํŠน์ • ์—ํ”ผ์†Œ๋“œ๋“ค๋กœ ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋œ๋‹ค.

๊ทธ๋ฆผ. 8. ํ•™์Šต ๊ฒฐ๊ณผ ๊ทธ๋ž˜ํ”„

Fig. 8. Learning results graph

../../Resources/kiee/KIEE.2023.72.12.1705/fig8.png

๊ทธ๋Ÿฌ๋‚˜ ์ด์™€ ๊ฐ™์€ ๋ฌธ์ œ๋Š” ํ•™์Šต์ด ์™„๋ฃŒ๋œ ์ œ์–ด๊ธฐ๋ฅผ ์‹ค์ œ ์‹œ์Šคํ…œ์— ์ ์šฉํ•  ๋•Œ์—๋Š” ์ „ํ˜€ ๊ณ ๋ คํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” ์š”์†Œ๊ฐ€ ๋œ๋‹ค. ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ๋Š” ๋ฌผ๋ฆฌ์ ์ธ ๋ฒ•์น™์„ ์œ„๋ฐฐํ•˜๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•  ์ˆ˜๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ๊ทธ๋Ÿฌํ•œ ์ƒํƒœ ์ •๋ณด๋ฅผ ๊ด€์ธกํ•˜๋Š” ์ƒํ™ฉ ์ž์ฒด๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ์ •์ƒ์ ์ธ ์ƒํƒœ ์ •๋ณด๋งŒ์„ ๊ฐ€์ง€๊ณ  ํ•™์Šต์— ๊ธฐ๋ฐ˜ํ•œ ์ •ํ™•ํ•œ ์ œ์–ด๋Ÿ‰์„ ์‚ฐ์ถœํ•˜์—ฌ swing-up ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.

๊ทธ๋ฆผ. 9. ์™ธ๋ž€ ์ธ๊ฐ€ ์‹คํ—˜ ๊ฒฐ๊ณผ

Fig. 9. Results of Disturbance Injection Experiment

../../Resources/kiee/KIEE.2023.72.12.1705/fig9.png

๊ทธ๋ฆผ 9๋Š” ์‹ค์ œ 2๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์— ๊ตฌํ˜„๋œ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋ฅผ ์ ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰ํ•œ swing-up ์ œ์–ด์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ทธ๋ž˜ํ”„์ด๋‹ค. ์ดˆ๊ธฐ swing-up ์ œ์–ด์— ์„ฑ๊ณตํ•œ ๋’ค ์„ ํ˜• ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๊ณ  ์žˆ๋Š” ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , ์•ฝ 5์ดˆ ์ดํ›„ ์™ธ๋ถ€์—์„œ ๊ฐ•ํ•œ ์™ธ๋ž€์„ ์ธ๊ฐ€ํ•˜์˜€๋‹ค. ๊ทธ๋ฆผ 9์—์„œ ์ ์„ ์œผ๋กœ ํ‘œ์‹œ๋œ ์‹œ์ ์ด ์™ธ๋ž€์„ ์ธ๊ฐ€ํ•œ ์ˆœ๊ฐ„์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด๋กœ ์ธํ•ด ์‹œ์Šคํ…œ์ด ๋ถˆ์•ˆ์ •ํ•œ ์ƒํƒœ๋กœ ์ฒœ์ด๋˜์—ˆ์ง€๋งŒ, ๊ณง๋ฐ”๋กœ ๋‹ค์‹œ swing-up ์ œ์–ด๋ฅผ ์‹œ๋„ํ•˜์—ฌ ๋„๋ฆฝ์ƒํƒœ๋กœ ํšŒ๋ณตํ•˜๋Š” ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด 2์žฅ์—์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ Recovery ํŠน์„ฑ์„ ๊ฐ–๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ์˜ ์„ฑ๋Šฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์‹คํ—˜์˜ ๊ฒฐ๊ณผ๋ฅผ ์ข€ ๋” ๋ช…ํ™•ํ•˜๊ฒŒ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด, ์‹คํ—˜ ๊ณผ์ •์„ ์˜์ƒ์œผ๋กœ ๊ธฐ๋กํ•˜์—ฌ ์—ฐ๊ตฌ์‹ค Youtube ์ฑ„๋„์— ์—…๋กœ๋“œํ•˜์˜€๋‹ค. ํ•ด๋‹น ์˜์ƒ์€ https://youtu.be/4ELdGB9UYZo ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. (์˜์ƒ ์ œ๋ชฉ : Reinforcement learning control of a double inverted pendulum with good recovery performance, ์ฑ„๋„๋ช… : Embedded Control Lab). ํ•ด๋‹น ์˜์ƒ์—์„œ 2๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์— ์–ด๋– ํ•œ ์™ธ๋ž€์„ ์ธ๊ฐ€ํ•œ ๊ฒฝ์šฐ์—๋„, ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ swing-up ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ฆผ. 10. ์‹คํ—˜ ๊ณผ์ •์„ ๊ธฐ๋กํ•œ Youtube ์˜์ƒ

Fig. 10. YouTube video of the experiment procedure

../../Resources/kiee/KIEE.2023.72.12.1705/fig10.png

5. Conclusion

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ฒ€์ฆํ•˜์˜€๋‹ค. ํŠนํžˆ, ๊ฐ•ํ•œ ์™ธ๋ž€์— ์˜ํ•ด ๋ถˆ์•ˆ์ •ํ•ด์ง„ ์ƒํƒœ์—์„œ๋„ swing-up์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ œ์–ด๊ธฐ๊ฐ€ ์ „ํ†ต์ ์ธ ์ œ์–ด ๊ธฐ๋ฒ•์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๊ณ , ๋ณต์žกํ•œ ์ œ์–ด ๋ฌธ์ œ์— ์žˆ์–ด ํšจ๊ณผ์ ์ธ ํ•ด๊ฒฐ ๋ฐฉ์•ˆ์ด ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

๋˜ํ•œ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฐ„์˜ ์ •ํ•ฉ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ์„ค๊ณ„ ๋ฐฉ์•ˆ์„ ์ œ์‹œํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์˜ ์ฃผ์š” ๋„์ „๊ณผ์ œ์ธ ํ˜„์‹ค ๊ฒฉ์ฐจ๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๊ตฌ์ฒดํ™”ํ•˜์˜€์œผ๋ฉฐ, ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ์˜ ์‹คํ—˜์„ ํ†ตํ•ด ๊ทธ ์œ ํšจ์„ฑ์„ ์ž…์ฆํ•˜์˜€๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 2๋‹จ ๋„๋ฆฝ์ง„์ž์˜ swing-up ์ œ์–ด์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์ง€๋งŒ, ์ด๋ฅผ ํ™•์žฅํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ œ์–ด ๋ฌธ์ œ์— ์ ์šฉํ•˜๋Š” ์—ฐ๊ตฌ๋ฅผ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š” ๋‹ค๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋…ํŠนํ•œ ํŠน์„ฑ์„ ํ™œ์šฉํ•œ ์ฒœ์ด์ œ์–ด์™€ ๊ฐ™์€ ์ƒˆ๋กœ์šด ์ œ์–ด๋ฐฉ์‹์ด ์ œ์‹œ๋˜์—ˆ์œผ๋ฉฐ(17), 3๋‹จ ๋„๋ฆฝ์ง„์ž์™€ ๊ฐ™์ด ๋” ๋‚œ๋„ ๋†’์€ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋„ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค(12). ์ด๋Ÿฌํ•œ ํ™•์žฅ๋œ ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ๋„ ๋ณธ ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉ๋œ ์ ‘๊ทผ ๋ฐฉ์‹์ด ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.

References

1 
N. Muskinja, B. Tovornik, April 2006, Swinging Up and Stabilization of a Real Inverted Pendulum, in IEEE Transactions on Industrial Electronics, Vol. 53, No. 2, pp. 631-639DOI
2 
Y. Otani, T. Kurokami, A. Inoue, Y. Hirashima, 2001, A Swingup Control of an Inverted Pendulum with Cart Position Control, IFAC Proceedings, Vol. 34, pp. 395-400DOI
3 
K. Graicehn, M. Treuer, M. Zeitz, 2007, Swing-up of the Double Pendulum on a Cart by Feedforward and Feedback Control with Experimental Validation, Automatica, Vol. 43, pp. 63-71DOI
4 
J. Kober, J. A. Bagnell, J. Peters, 2013, Reinforcement Learning in Robotics: A Survey, The International Journal of Robotics Research, Vol. 32, pp. 1238-1274DOI
5 
S. Israilov, L. Fu, J. Sรกnchez-Rodrรญguez, F. Fusco, G. Allibert, C. Raufaste, A. Mรฉdรฉric, 2023, Reinforcement Learning Approach to Control an Inverted Pendulum: A General Framework for Educational Purposes, PLoS ONE, Vol. 18, No. e0280071DOI
6 
J. Baek, C. Lee, Y. S. Lee, S. Jeon, S. Han, 2024, Reinforcement Learning to Achieve Real-time Control of Triple Inverted Pendulum, Engineering Applications of Artificial Intelligence, Vol. 128, No. 107518DOI
7 
Y. Gil, J. H. Park, J. Baek, S. Han, 2022, Quantization- aware Pruning Criterion for Industrial Applications, IEEE Transactions on Industrial Electronics, Vol. 69, No. 3, pp. 3203-3213DOI
8 
J. Baek, H. Jun, J. Park, H. Lee, S. Han, 2021, Sparse Variational Deterministic Policy Gradient for Continuous Real-time Control, IEEE Transactions on Industrial Electronics, Vol. 68, No. 10, pp. 9800-9810DOI
9 
G. Dulac-Arnold, D. Mankowitz, T. Hester, 2019, Challenges of Real-world Reinforcement Learning, arXiv preprint arXiv:1904.12901Google Search
10 
W. Zhao, J. P. Queralta, T. Westerlund, 2020, Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey, 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737-744DOI
11 
N. Jakobi, P. Husbands, 1995, Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics, Advances in Artificial Life: Third European Conference on Artificial Life Granada, pp. 704-720Google Search
12 
T. Glรผck, A. Eder, A. Kugi, 2013, Swing-up Control of a Triple Pendulum on a Cart with Experimental Validation, Automatica, Vol. 49, pp. 801-808DOI
13 
D. Ju, C. Choi, J. Jeong, Y. S. Lee, 2022, Design and Parameter Estimation of a Double Inverted Pendulum for Model-based Swing-up Control, Journal of Institute of Control, Robotics and Systems (in Korean), Vol. 28, No. 9, pp. 793-803Google Search
14 
T. Lee, D. Ju, Y. S. Lee, 2023, Development Environment of Reinforcement Learning-based Controllers for Real-world Physical Systems Using LW-RCP, Journal of Institute of Control, Robotics and Systems (in Korean), Vol. 29, No. 7, pp. 543-549Google Search
15 
T. Haarnoja, A. Zhou, P. Abbeel, 2018, Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, International conference on machine learning. PMLR, pp. 1861-1870Google Search
16 
D. P. Kingma., 2014, Adam: A Method for Stochastic Optimization, arXiv preprint arXiv:1412.6980Google Search
17 
J. Jeong, D. Ju, Y. Fujiyama, Y. S. Lee, 2023, Transition Control of a Double Inverted Pendulum Using an LW-RCP, Journal of Institute of Control, Robotics and Systems (in Korean), Vol. 29, No. 9, pp. 694-703Google Search

์ €์ž์†Œ๊ฐœ

์ดํƒœ๊ฑด (Taegun Lee)
../../Resources/kiee/KIEE.2023.72.12.1705/au1.png

He received B.S. degree in electrical engineering from Inha university in 2023.

He is now a M.S. candidate in electrical and computer engineering at Inha university.

His research interests include reinforcement learning, embedded systems and optimal control.

์ฃผ๋„์œค (Doyoon Ju)
../../Resources/kiee/KIEE.2023.72.12.1705/au2.png

He received M.S. degree in electrical and computer engineering from Inha university in 2023.

He is now a Ph.D. candidate in electrical and computer engineering at Inha university.

His research interests include optimal control, embedded systems and reinforcement learning.

์ด์˜์‚ผ (Young Sam Lee)
../../Resources/kiee/KIEE.2023.72.12.1705/au3.png

He received B.S. and M.S. degrees in electrical engineering from Inha University, Incheon, South Korea, in 1999, and the Ph.D. degree in electrical engineering from Seoul National University, South Korea, in 2003.

From 2003 to 2004, he was a Senior Researcher with Samsung Electronics Co. Since 2004, he has been with the Department of Electrical and Computer Engineering, Inha University.

He is the author of four books and more than 60 articles.

His research interests include computer- aided control system designs, rapid control prototyping, control and instrumentation, robot engineering, and embedded systems.