• ๋Œ€ํ•œ์ „๊ธฐํ•™ํšŒ
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ๋‹จ์ฒด์ด์—ฐํ•ฉํšŒ
  • ํ•œ๊ตญํ•™์ˆ ์ง€์ธ์šฉ์ƒ‰์ธ
  • Scopus
  • crossref
  • orcid

  1. (Dept. of Electronics and Electrical Engineering, Dankook University, Republic of Korea.)
  2. (Dept. of Electronics and Electrical Engineering, Dankook University/DMASTA, Republic of Korea.)
  3. (Electronics and Telecommunication Research Institute, Republic of Korea.)



Long-tail Dataset, Real-time Object Detection, Dynamic Fusion, Class Imbalance

1. ์„œ ๋ก 

์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ์ธ์‹์€ ๊ฐ์‹œ, ์ •์ฐฐ, ์ž์œจ์ฃผํ–‰ ๋“ฑ๊ณผ ๊ฐ™์ด ์‹ ์†ํ•˜๊ณ  ์ •ํ™•ํ•œ ์ƒํ™ฉ ํŒŒ์•…์ด ์š”๊ตฌ๋˜๋Š” ๋‹ค์–‘ํ•œ ์‘์šฉ ๋ถ„์•ผ์—์„œ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ์—์„œ๋Š” ๊ฐ์ฒด๋ฅผ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ์ธ์‹ํ•ด์•ผ๋งŒ ์ฃผ๋ณ€ ํ™˜๊ฒฝ์— ์ฆ‰๊ฐ์ ์œผ๋กœ ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ธ์‹ ์ง€์—ฐ์ด๋‚˜ ์„ฑ๋Šฅ ์ €ํ•˜๋Š” ์น˜๋ช…์ ์ธ ์‚ฌ๊ณ ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ๊ทผ์—๋Š” ๊ฒฝ๋Ÿ‰ ๊ตฌ์กฐ์™€ ๊ณ ํšจ์œจ์„ฑ์„ ๋™์‹œ์— ํ™•๋ณดํ•˜๋Š” ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค[1-4].

๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ์„ธํŠธ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํด๋ž˜์Šค ๊ฐ„ ๋นˆ๋„๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅธ ๋กฑ-ํ…Œ์ผ(long-tail) ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์ด ๊ฒฝ์šฐ ์ผ๋ถ€ ํด๋ž˜์Šค๋Š” ๋งŽ์€ ์ƒ˜ํ”Œ์„ ๋ณด์œ ํ•˜๋Š” ๋ฐ˜๋ฉด, ๋‹ค์ˆ˜์˜ ํด๋ž˜์Šค๋Š” ์ ์€ ์ƒ˜ํ”Œ๋งŒ ์กด์žฌํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์˜ ํ•™์Šต ์‹œ ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ๋‹ค์ˆ˜ ์ƒ˜ํ”Œ์„ ์ฐจ์ง€ํ•˜๋Š” ํ—ค๋“œ(head) ํด๋ž˜์Šค๋กœ ํŽธํ–ฅ๋˜๋Š” ๋ฌธ์ œ๋ฅผ ์ดˆ๋ž˜ํ•˜๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ ์†Œ์ˆ˜ ์ƒ˜ํ”Œ์„ ์ฐจ์ง€ํ•˜๋Š” ํ…Œ์ผ(tail) ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ธ์‹ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋œ๋‹ค[5]. ํŠนํžˆ ๋กฑ-ํ…Œ์ผ ๋ถ„ํฌ์—์„œ๋Š” ํ•™์Šต ๊ณผ์ • ์ „๋ฐ˜์—์„œ ํด๋ž˜์Šค ๊ฐ„ ๊ทธ๋ ˆ๋””์–ธํŠธ(gradient) ๊ธฐ์—ฌ๋„๊ฐ€ ๋น„๋Œ€์นญ์ ์œผ๋กœ ๋ˆ„์ ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ชจ๋ธ์˜ ๊ฒฐ์ • ๊ฒฝ๊ณ„(decision boundary)๊ฐ€ ํŠน์ • ๋ฐฉํ–ฅ์œผ๋กœ ์™œ๊ณก๋˜๊ณ  ํ…Œ์ผ ํด๋ž˜์Šค์˜ ํ‘œํ˜„ ํ•™์Šต์ด ์ œํ•œ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค[6-8]. ์ด๋Š” ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ฌ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ๋ณ€ํ™”๋‚˜ ์ž์ฃผ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ƒํ™ฉ์—์„œ์˜ ๋Œ€์‘ ๋Šฅ๋ ฅ์„ ์•…ํ™”์‹œํ‚จ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ๋กฑ-ํ…Œ์ผ ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์€ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ์ธ์‹ ์—ฐ๊ตฌ์—์„œ ํ•ต์‹ฌ ๊ณผ์ œ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค.

์ด๋Ÿฌํ•œ ๋ฐฐ๊ฒฝ์—์„œ ํŠธ๋žœ์Šคํฌ๋จธ(transformer) ๊ธฐ๋ฐ˜์˜ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์ธ RT-DETR (real-time detection transformer)์€ ๋†’์€ ํšจ์œจ์„ฑ๊ณผ ์‹ค์‹œ๊ฐ„์„ฑ์œผ๋กœ ์ธํ•ด ๋‹ค์–‘ํ•œ ์‘์šฉ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค[9]. ํŠนํžˆ, RT-DETR์€ ํšจ์œจ์ ์ธ IoU ์ธ์‹ ์‹ ๋ขฐ๋„(IoU-aware confidence) ํ•™์Šต์„ ์œ„ํ•ด VFL (varifocal loss)์„ ๋ถ„๋ฅ˜ ์†์‹ค๋กœ ์‚ฌ์šฉํ•œ๋‹ค. VFL์€ ๋ถ„๋ฅ˜(classification)๊ณผ ์ง€์—ญํ™”(localization) ๊ฐ„์˜ ๋ถˆ์ผ์น˜๋ฅผ ์™„ํ™”ํ•˜๋Š” ๋ฐ ํšจ๊ณผ์ ์ด์ง€๋งŒ, ํด๋ž˜์Šค ๊ฐ„ ๋นˆ๋„ ์ฐจ์ด๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๋ณด์ •ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ๋Š” ํ—ค๋“œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํŽธํ–ฅ์ด ์—ฌ์ „ํžˆ ์กด์žฌํ•œ๋‹ค[10]. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด SSL (seesaw loss)๊ณผ ๊ฐ™์€ ํด๋ž˜์Šค ๊ฐ„ ๋ถ„ํฌ๋ฅผ ๋ฐ˜์˜ํ•œ ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ๋Œ€์•ˆ์œผ๋กœ ์ œ์•ˆ๋˜์—ˆ๋‹ค[11]. ํ•˜์ง€๋งŒ ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๋ชฉ์ ๊ณผ ๋ฐฉ์‹์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ๋ณ„๋„์˜ ์กฐ์ • ์—†์ด VFL์„ SSL๋กœ ๋‹จ์ˆœ ๋Œ€์ฒดํ•  ๊ฒฝ์šฐ ํ•™์Šต ์ดˆ๊ธฐ์˜ ๋ถˆ์•ˆ์ •์„ฑ, ํ—ค๋“œ-ํ…Œ์ผ ํด๋ž˜์Šค ๊ฐ„ ๋ถˆ๊ท ํ˜•ํ•œ ์ตœ์ ํ™”๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ ์ „์ฒด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ๋‹ค. ํ•œํŽธ, RT-DETR์˜ ๊ตฌ์กฐ์™€ ํ•™์Šต ์ „๋žต์„ ํ™•์žฅํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์˜ ํ™œ์šฉ์„ฑ์„ ๊ฐ•ํ™”ํ•œ RT-DETRv2๊ฐ€ ์ œ์•ˆ๋˜์—ˆ์œผ๋ฉฐ[12], ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ RT-DETRv2 ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์—์„œ์˜ ์„ฑ๋Šฅ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ•™์Šต ์ „๋žต์„ ์„ค๊ณ„ํ•˜๊ณ ์ž ํ•œ๋‹ค.

๋”ฐ๋ผ์„œ ์•ž์„  ๋ถ„์„์— ์ฐฉ์•ˆํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” RT-DETRv2 ๋ชจ๋ธ์˜ ํ•™์Šต ์‹œ ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์˜ ํŽธํ–ฅ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด VFL๊ณผ SSL์„ ๊ฒฐํ•ฉํ•˜๋Š” ์ตœ์ ์˜ ํ•™์Šต ์ „๋žต์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ์‹์€ ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” SSL์˜ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜์—ฌ ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ์œ ๋„ํ•˜๊ณ , ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ๊ฐ€์ค‘ ํ‰๊ท ์œผ๋กœ ๊ฒฐํ•ฉํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐํ•ฉ ๋ฐฉ์‹์€ ํ•™์Šต ์ดˆ๊ธฐ์— VFL์„ ํ†ตํ•ด ๊ฐ์ฒด ๋ถ„๋ฅ˜ ๋ฐ ์œ„์น˜ ์ธ์‹ ์„ฑ๋Šฅ์„ ๋น ๋ฅด๊ฒŒ ๋†’์ด๊ณ , ํ›„๋ฐ˜๋ถ€์—๋Š” SSL์˜ ์žฅ์ ์„ ํšจ๊ณผ์ ์œผ๋กœ ๋ฐ˜์˜ํ•˜์—ฌ ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ์˜ ์ธ์‹ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•œ๋‹ค. ๋˜ํ•œ, ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ€์ค‘ ํ‰๊ท  ๊ฐ€์ค‘์น˜๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜์—ฌ, ํ•™์Šต ๋‹จ๊ณ„๋ณ„๋กœ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๊ฐ€์ค‘์น˜ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ฐพ๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ํŽธํ–ฅ ์™„ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์ฃผ์š” ๊ธฐ์—ฌ ์‚ฌํ•ญ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

1) ์—ํฌํฌ(epoch) ์ง„ํ–‰์— ๋”ฐ๋ผ ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋™์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์ดˆ๊ธฐ ํ•™์Šต ์•ˆ์ „์„ฑ ๋ฐ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ์„ ๋™์‹œ์— ํ™•๋ณดํ•œ๋‹ค.

2) ๊ฐ€์ค‘ ํ‰๊ท ์˜ ์ตœ์  ๊ฐ€์ค‘์น˜ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์‹คํ—˜์„ ํ†ตํ•ด ์ฐพ๊ณ , ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ์˜ ํŽธํ–ฅ ์™„ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

3) ์ œ์•ˆํ•œ ๋ฐฉ์‹์„ RT-DETRv2 ๋ชจ๋ธ ํ•™์Šต์— ์ ์šฉํ•˜๊ณ  LVIS (large vocabulary instance segmentation) ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ๊ธฐ์กด ๋ฐฉ์‹ ๋Œ€๋น„ ํ…Œ์ผ ํด๋ž˜์Šค ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋จ์„ ์ •๋Ÿ‰์  ๋ฐ ์ •์„ฑ์  ๊ฒฐ๊ณผ๋กœ ์ž…์ฆํ•œ๋‹ค.

2. ๊ด€๋ จ ์—ฐ๊ตฌ

์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์ธ RT-DETR์€ ๋ถ„๋ฅ˜๊ธฐ ํ•™์Šต ๋‹จ๊ณ„์—์„œ VFL์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก ์ƒ์ž์˜ IoU(intersection over union) ์ •๋ณด๋ฅผ ๋ถ„๋ฅ˜ ์‹ ๋ขฐ๋„์— ์ง์ ‘ ๋ฐ˜์˜ํ•˜๋Š” IoU ์ธ์‹ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. VFL์€ ๋ถ„๋ฅ˜ ์ ์ˆ˜(score)์™€ IoU์˜ ์ผ์น˜์„ฑ์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ์†์‹ค ํ•จ์ˆ˜๋กœ, IoU ์ ์ˆ˜๋ฅผ ๋ถ„๋ฅ˜ ์ ์ˆ˜์˜ ๊ฐ€์ค‘์น˜๋กœ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์ด IoU๊ฐ€ ๋†’์€ ์˜ˆ์ธก ์ƒ์ž์—๋งŒ ๋†’์€ ๋ถ„๋ฅ˜ ์ ์ˆ˜๋ฅผ ๋ถ€์—ฌํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ถ„๋ฅ˜์™€ ์ง€์—ญํ™” ๊ฐ„์˜ ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ VFL์€ ํด๋ž˜์Šค ๊ฐ„ ์ƒ˜ํ”Œ ์ˆ˜์˜ ์ฐจ์ด์— ์˜ํ•ด ๋ฐœ์ƒํ•˜๋Š” ๊ทผ๋ณธ์ ์ธ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๋ฐ˜์˜ํ•˜์ง€๋Š” ๋ชปํ•˜๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

์ด์™€ ๊ฐ™์€ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ํ—ค๋“œ ํด๋ž˜์Šค์™€ ํ…Œ์ผ ํด๋ž˜์Šค ๊ฐ„ ๊ทธ๋ ˆ๋””์–ธํŠธ ๋ถˆ๊ท ํ˜•์„ ์ง์ ‘์ ์œผ๋กœ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค[13-15]. SSL์€ ํด๋ž˜์Šค๋ณ„ ๋“ฑ์žฅ ๋นˆ๋„์— ๋”ฐ๋ผ ์Œ์˜ ๊ทธ๋ ˆ๋””์–ธํŠธ(negative gradient)๋ฅผ ์กฐ์ •ํ•˜๋Š” ์™„ํ™” ๊ณ„์ˆ˜(mitigation factor)์™€ ํด๋ž˜์Šค ๊ฐ„ ํ˜ผ๋™๋„๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ๋ณด์ƒ ๊ณ„์ˆ˜(compensation factor)๋ฅผ ๋„์ž…ํ•˜์—ฌ ํ…Œ์ผ ํด๋ž˜์Šค๊ฐ€ ํ•™์Šต ๊ณผ์ •์—์„œ ๊ณผ๋„ํ•˜๊ฒŒ ์–ต์ œ๋˜๋Š” ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ ‘๊ทผ์€ ๋กฑ-ํ…Œ์ผ ๋ถ„ํฌ์—์„œ ํ…Œ์ผ ํด๋ž˜์Šค์˜ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ํšจ๊ณผ์ ์ด๋‹ค. ํ•˜์ง€๋งŒ RT-DETR ๊ตฌ์กฐ์—์„œ VFL์„ SSL๋กœ ๋‹จ์ˆœ ๋Œ€์ฒดํ•  ๊ฒฝ์šฐ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. SSL์€ ๋ถ„ํฌ ๊ธฐ๋ฐ˜์˜ ๊ทธ๋ ˆ๋””์–ธํŠธ ๋ณด์ •์— ์ดˆ์ ์„ ๋‘๊ณ  ์žˆ์–ด ์˜ˆ์ธก bbox์˜ IoU ์ •๋ณด๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ด๋กœ ์ธํ•ด RT-DETR์˜ ํ•ต์‹ฌ ์„ค๊ณ„ ์š”์†Œ์ธ IoU ์ธ์‹ ๋ถ„๋ฅ˜ ํŠน์„ฑ์ด ์œ ์ง€๋˜์ง€ ๋ชปํ•œ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์˜ˆ์ธก ์ƒ์ž ํ’ˆ์งˆ๊ณผ ๋ถ„๋ฅ˜ ์ ์ˆ˜ ๊ฐ„์˜ ์ผ๊ด€์„ฑ์ด ์•ฝํ™”๋˜๊ณ , ์ด๋Š” ํด๋ž˜์Šค ๊ฐ„ ๋ณด์ •์€ ์ด๋ฃจ์–ด์ง€๋”๋ผ๋„ ์ „์ฒด mAP(mean average precision)๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋กœ ์ด์–ด์ง„๋‹ค. ์ด๋Š” VFL์ด RT-DETR์˜ ๊ตฌ์กฐ์  ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ, ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‘ ๊ฐœ ์ด์ƒ์˜ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ์ ‘๊ทผ์ด ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. YOLO๋‚˜ SSD์™€ ๊ฐ™์€ ์ฃผ์š” ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ๋“ค์€ ๋ถ„๋ฅ˜ ์†์‹ค๊ณผ ์œ„์น˜ ํšŒ๊ท€ ์†์‹ค์„ ๊ฒฐํ•ฉํ•˜๊ธฐ ์œ„ํ•ด, ๋‘ ์†์‹ค ํ•ญ ์‚ฌ์ด์— ๊ณ ์ •๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ•˜์—ฌ ์„ ํ˜• ๊ฒฐํ•ฉํ•˜๋Š” ๋‹ค์ค‘-ํƒœ์ŠคํŠธ ์†์‹ค(multi-task loss) ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค[16,17]. ์ด์™€ ๊ฐ™์€ ๊ณ ์ • ๊ฐ€์ค‘์น˜ ๊ธฐ๋ฐ˜ ๊ฒฐํ•ฉ ๋ฐฉ์‹์€ ์„ค๊ณ„๊ฐ€ ๋‹จ์ˆœํ•˜๊ณ  ๊ตฌํ˜„์ด ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์ง€๋งŒ, ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ๊ฐ ์†์‹ค ๊ฐ€์ค‘์น˜์˜ ์ค‘์š”๋„๊ฐ€ ๋ณ€ํ™”ํ•˜๋Š” ๋™์  ํ•™์Šต ๊ณผ์ •์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

3. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” RT-DETRv2์— ๊ฒฐํ•ฉ๋œ ๋ถ„๋ฅ˜ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ํ•™์Šต ํŽธํ–ฅ์„ ์™„ํ™”ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด RT-DETRv2 ๊ตฌ์กฐ์—์„œ VFL์˜ IoU ์ธ์‹ ๋ถ„๋ฅ˜๋ฅผ ํ†ตํ•œ ์ดˆ๊ธฐ ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ SSL์˜ ํด๋ž˜์Šค ๋ถ„ํฌ ๊ธฐ๋ฐ˜ ํ…Œ์ผ ํด๋ž˜์Šค ๋ณด์ • ํšจ๊ณผ๊ฐ€ ์„œ๋กœ ์ƒํ˜ธ ๋ณด์™„์ ์ธ ํŠน์„ฑ์„ ๊ฐ€์ง€๋„๋ก ์„ค๊ณ„ํ•œ๋‹ค. ๋จผ์ €, ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ€์ค‘์น˜ ๋น„์œจ์„ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋Š” ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์€ ํ•™์Šต ์ „์ฒด ๊ณผ์ •์—์„œ ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๋น„์ค‘์„ ๋™์ผํ•˜๊ฒŒ ์œ ์ง€ํ•จ์œผ๋กœ์จ VFL์˜ ์ง€์—ญํ™” ํ’ˆ์งˆ ํ•™์Šต๊ณผ SSL์˜ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ณด์ • ํšจ๊ณผ๋ฅผ ๋™์‹œ์— ๋ฐ˜์˜ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์˜ ์†์‹ค ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

(1)
$L(p, q) = L_{bbox} + (1-k)L_{VFL}(p, q) + kL_{SSL}(p, q),$

์—ฌ๊ธฐ์„œ $L_{bbox}$๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€ ์†์‹ค(bounding box regression loss), $L_{VFL}$์€ VFL, $L_{SSL}$์€ SSL๋ฅผ ๊ฐ๊ฐ ๋‚˜ํƒ€๋‚ธ๋‹ค. $p$๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๋ถ„๋ฅ˜ ์ ์ˆ˜์ด๋ฉฐ, $q$๋Š” ๋ชฉํ‘œ ์ ์ˆ˜๋กœ์„œ ์–‘์„ฑ ์ƒ˜ํ”Œ์—์„œ๋Š” ์˜ˆ์ธก ๋ฐ•์Šค์™€ ์‹ค์ œ ๋ฐ•์Šค ๊ฐ„์˜ IoU๊ฐ’, ์Œ์„ฑ ์ƒ˜ํ”Œ์—์„œ๋Š” 0์œผ๋กœ ์ •์˜๋œ๋‹ค. $k$๋Š” ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ๋น„์œจ์„ ์กฐ์ ˆํ•˜๋Š” ๊ฐ€์ค‘ ํ‰๊ท ์˜ ๊ฐ€์ค‘์น˜๋กœ, ํ•™์Šต ์ „์ฒด ๊ณผ์ •์—์„œ ๊ณ ์ •๋œ ๊ฐ’์œผ๋กœ ์œ ์ง€๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์€ ํ•™์Šต ๋‹จ๊ณ„๋ณ„๋กœ ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ์š”๊ตฌํ•˜๋Š” ์ตœ์ ํ™” ๋ฐฉํ–ฅ์ด ์„œ๋กœ ๋‹ค๋ฅธ ์ ์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ, ๊ตฌ์กฐ์  ํ•œ๊ณ„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค. VFL์€ ํ•™์Šต ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ IoU๊ฐ€ ๋†’์€ ์˜ˆ์ธก ์ƒ์ž์— ๋” ๋†’์€ ๋ถ„๋ฅ˜ ์ ์ˆ˜๋ฅผ ๋ถ€์—ฌํ•˜๋„๋ก ์œ ๋„ํ•˜์—ฌ ์ง€์—ญํ™” ์ค‘์‹ฌ์˜ ์•ˆ์ •์ ์ธ ์ดˆ๊ธฐ ํ•™์Šต์„ ์ง€์›ํ•˜๋Š” ๋ฐ˜๋ฉด, SSL์€ ํด๋ž˜์Šค ๋ถ„ํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ…Œ์ผ ํด๋ž˜์Šค ๋ณด์ •์— ์ดˆ์ ์„ ๋‘์–ด ํ…Œ์ผ ํด๋ž˜์Šค์˜ ๊ทธ๋ ˆ๋””์–ธํŠธ๋ฅผ ์ƒ๋Œ€์ ์œผ๋กœ ํฌ๊ฒŒ ๋งŒ๋“œ๋Š” ํŠน์„ฑ์„ ์ง€๋‹Œ๋‹ค. ๋”ฐ๋ผ์„œ, ํ•™์Šต ์ดˆ๋ฐ˜์—๋Š” ์ง€์—ญํ™” ๋Šฅ๋ ฅ์ด ์ถฉ๋ถ„ํžˆ ํ•™์Šต๋˜์ง€ ์•Š์€ ์ƒํƒœ์ด๋ฏ€๋กœ SSL์ด ๋…ธ์ด์ฆˆ๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๋ฉฐ, ์ด๋Ÿฌํ•œ ๋…ธ์ด์ฆˆ๋Š” VFL์ด ์ œ๊ณตํ•˜๋Š” IoU ์ธ์‹ ๋ถ„๋ฅ˜ ์ตœ์ ํ™” ํ๋ฆ„์„ ๋ฐฉํ•ดํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ดˆ๊ธฐ ์ˆ˜๋ ด์„ ๋ฐฉํ•ดํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด๋Ÿฌํ•œ ์ดˆ๊ธฐ ๋ถˆ์•ˆ์ •์„ฑ์€ ์ „์ฒด ํ•™์Šต ๊ณผ์ • ์ „๋ฐ˜์— ๊ฑธ์ณ ์„ฑ๋Šฅ ๊ฐ์†Œ๋กœ ์ด์–ด์ง€๋ฉฐ, ํŠนํžˆ ํ…Œ์ผ ํด๋ž˜์Šค์—์„œ ์ถฉ๋ถ„ํ•œ ๋ณด์ • ํšจ๊ณผ๊ฐ€ ๋‚˜ํƒ€๋‚˜์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ถ„์„์— ์ฐฉ์•ˆํ•˜์—ฌ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” $k$์˜ ๊ฐ’์„ ํ•™์Šต ์ง„ํ–‰์— ๋”ฐ๋ผ ์ ์ง„์ ์œผ๋กœ ์กฐ์ •ํ•˜๋Š” ๋™์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ์‹์€ ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” VFL์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋†’์—ฌ IoU ๊ธฐ๋ฐ˜์˜ ์ง€์—ญํ™” ์„ฑ๋Šฅ์„ ์šฐ์„ ์ ์œผ๋กœ ๊ฐ•ํ™”ํ•˜๊ณ , ํ•™์Šต์ด ์•ˆ์ •ํ™”๋˜๋Š” ํ›„๋ฐ˜์—๋Š” SSL์˜ ๋น„์ค‘์„ ์ ์ง„์ ์œผ๋กœ ์ฆ๊ฐ€์‹œ์ผœ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์‹ (1)์˜ ๊ฐ€์ค‘์น˜ $k$๋Š” ํ•™์Šต ์—ํฌํฌ์— ๋”ฐ๋ผ 0์—์„œ 1๋กœ ์ ์ง„์ ์œผ๋กœ ์ฆ๊ฐ€์‹œ์ผœ SSL์˜ ํด๋ž˜์Šค ๋ถ„ํฌ ๊ธฐ๋ฐ˜ ๋ณด์ • ํšจ๊ณผ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜๋˜๋„๋ก ์„ค๊ณ„ํ•˜์˜€๋‹ค.

๋˜ํ•œ, ๊ฐ€์ค‘ ํ‰๊ท ์˜ ๊ฐ€์ค‘์น˜ $k$์˜ ์ฆ๊ฐ€ ํŒจํ„ด์ด ๋ชจ๋ธ ํ•™์Šต์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•˜๊ณ  ์ตœ์ ์˜ ์Šค์ผ€์ค„๋ง ์ „๋žต์„ ์ฐพ๊ธฐ ์œ„ํ•ด, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ทธ๋ฆผ 1๊ณผ ๊ฐ™์ด ์—ฌ๋Ÿฌ ํ˜•ํƒœ์˜ ์Šค์ผ€์ค„๋ง ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์˜€๋‹ค. ๊ทธ๋ฆผ 1์€ ํ•™์Šต ์—ํฌํฌ ์ง„ํ–‰์— ๋”ฐ๋ผ $k$๊ฐ’์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ์‹์˜ ์ฐจ์ด๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ, ๊ฐ ์Šค์ผ€์ค„๋ง์˜ ์ฆ๊ฐ€ ์†๋„์™€ ๊ธฐ์šธ๊ธฐ ๋ณ€ํ™”์— ๋”ฐ๋ผ VFL์—์„œ SSL๋กœ์˜ ์†์‹ค ํ•จ์ˆ˜์˜ ๋น„์ค‘ ์ „ํ™˜ ์‹œ์ ์ด ๋‹ฌ๋ผ์ง์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋Ÿฌํ•œ ์ฐจ์ด๋Š” ํ•™์Šต ์ดˆ๊ธฐ์™€ ํ›„๋ฐ˜์—์„œ ๋ชจ๋ธ์ด ๋ฐ›๋Š” ํ•™์Šต ์‹ ํ˜ธ์— ์„œ๋กœ ๋‹ค๋ฅธ ์˜ํ–ฅ์„ ์ฃผ๋ฏ€๋กœ, ์Šค์ผ€์ค„๋Ÿฌ ์„ ํƒ์€ ๊ฒฐํ•ฉ ์†์‹ค์˜ ์„ฑ๋Šฅ์— ์ค‘์š”ํ•œ ์š”์†Œ๋กœ ์ž‘์šฉํ•œ๋‹ค. SSL์„ ์ด์šฉํ•œ ํ•™์Šต ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ๋Š” ํ—ค๋“œ ํด๋ž˜์Šค์˜ ๊ทธ๋ ˆ๋””์–ธํŠธ๊ฐ€ ๋น ๋ฅด๊ฒŒ ๋ˆ„์ ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํ…Œ์ผ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ณด์ • ์‹ ํ˜ธ๊ฐ€ ์ถฉ๋ถ„ํžˆ ํ™•๋ณด๋˜์ง€ ์•Š์œผ๋ฉด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ์ด๋Ÿฌํ•œ ์ ์„ ๊ณ ๋ คํ•  ๋•Œ, SSL์˜ ๊ฐ€์ค‘์น˜๋ฅผ 5 ์—ํฌํฌ ์ดํ›„์— ๋น ๋ฅด๊ฒŒ ์ฆ๊ฐ€์‹œ์ผœ ์ดˆ๊ธฐ ๋ณด์ • ์‹ ํ˜ธ๋ฅผ ํ™•๋ณดํ•˜๋„๋ก ์„ค๊ณ„๋œ piece-wise ๊ธฐ๋ฐ˜ ์Šค์ผ€์ค„๋ง์ด ํšจ๊ณผ์ ์ด๋‹ค. ๊ฐ€์žฅ ํšจ๊ณผ์ ์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋™์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์€ ๋‘ ์†์‹ค ํ•จ์ˆ˜์˜ ์ƒํ˜ธ ๋ณด์™„์  ์žฅ์ ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ณ , ์ดˆ๊ธฐ ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ ํ›„๋ฐ˜ ํ•™์Šต์—์„œ์˜ ํด๋ž˜์Šค ๊ท ํ˜•์„ ๋™์‹œ์— ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ฆผ 1. ๋‹ค์–‘ํ•œ $k$ ์Šค์ผ€์ค„๋Ÿฌ ๋ฐฉ์‹์˜ ์—ํฌํฌ ์ง„ํ–‰์— ๋”ฐ๋ฅธ ๋ณ€ํ™” ๋น„๊ต

Fig. 1. Comparison of different k-scheduler strategies over training epochs

../../Resources/kiee/KIEE.2026.75.6.1400/fig1.png

4. ์‹ค ํ—˜

4.1 ์‹คํ—˜ ๋ฐฉ๋ฒ•

๋ณธ ์žฅ์—์„œ๋Š” ์ œ์•ˆํ•œ ์†์‹ค ํ•จ์ˆ˜ ๊ฒฐํ•ฉ ๋ฐฉ์‹์„ RT-DETRv2 ๋ชจ๋ธ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ๊ฐ ์ œ์•ˆ ๋ฐฉ์‹์— ๋”ฐ๋ฅธ ํŽธํ–ฅ ์™„ํ™” ์ •๋„๋ฅผ ์‹คํ—˜์„ ํ†ตํ•ด ๊ฒ€์ฆํ•œ๋‹ค. ์‹คํ—˜์—์„œ๋Š” ๋กฑ-ํ…Œ์ผ ๋ถ„ํฌ ํŠน์„ฑ์„ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ฐ˜์˜ํ•˜๋Š” LVIS ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. LVIS๋Š” 1,203๊ฐœ์˜ ๊ฐ์ฒด ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜๋ฉฐ, ํด๋ž˜์Šค๋ณ„ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•˜์—ฌ ํด๋ž˜์Šค ์ด๋ฏธ์ง€์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ์„œ ๋นˆ์ถœ(frequent), ๋ณดํ†ต(common), ํฌ์†Œ(rare)๋กœ ๊ตฌ๋ถ„๋œ๋‹ค[17]. ํฌ์†Œ ํด๋ž˜์Šค๋Š” ์ด๋ฏธ์ง€ ์ˆ˜๊ฐ€ 1์žฅ ์ด์ƒ 10์žฅ ์ดํ•˜์ธ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ์ •์˜๋˜๋ฉฐ, ๋ณดํ†ต ํด๋ž˜์Šค๋Š” ์ด๋ฏธ์ง€ ์ˆ˜๊ฐ€ 11์žฅ ์ด์ƒ 100์žฅ ๋ฏธ๋งŒ์ธ ์นดํ…Œ๊ณ ๋ฆฌ, ๋นˆ์ถœ ํด๋ž˜์Šค๋Š” ์ด๋ฏธ์ง€ ์ˆ˜๊ฐ€ 100์žฅ ์ด์ƒ์ธ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๊ตฌ๋ถ„๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋กฑ-ํ…Œ์ผ ๋ถ„ํฌ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด LVIS๋Š” ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ํ™˜๊ฒฝ์—์„œ์˜ ๊ฐ์ฒด ์ธ์‹ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฒค์น˜๋งˆํฌ๋กœ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” LVIS ํ•™์Šต ์ด๋ฏธ์ง€ 100,170์žฅ์„ ์‚ฌ์šฉํ•˜์—ฌ RT-DETRv2 ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์˜€์œผ๋ฉฐ, ํ‰๊ฐ€ ๋‹จ๊ณ„์—์„œ๋Š” ๊ฒ€์ฆ ์ด๋ฏธ์ง€ 19,809์žฅ์„ ํ™œ์šฉํ•˜์—ฌ ๋นˆ์ถœ, ๋ณดํ†ต, ํฌ์†Œ ํด๋ž˜์Šค๋ณ„ AP(average precision) ์„ฑ๋Šฅ์„ ์‚ฐ์ถœํ•˜์˜€๋‹ค. AP๋Š” ์ •๋ฐ€๋„-์žฌํ˜„์œจ ๊ณก์„  ์•„๋ž˜์˜ ๋ฉด์ ์œผ๋กœ ์‚ฐ์ถœ๋˜๋Š” ๊ฐ์ฒด ๊ฒ€์ถœ ์„ฑ๋Šฅ ์ง€ํ‘œ์ด๋ฉฐ, $AP_r$์€ ํฌ์†Œ ํด๋ž˜์Šค, $AP_c$๋Š” ๋ณดํ†ต ํด๋ž˜์Šค, $AP_f$๋Š” ๋นˆ์ถœ ํด๋ž˜์Šค์— ๋Œ€ํ•œ AP๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

4.2 ์‹คํ—˜ ๊ฒฐ๊ณผ

ํ‘œ 1์€ ํ•™์Šต ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ SSL์„ ์ •์  ๊ฐ€์ค‘์น˜๋กœ ๊ฒฐํ•ฉํ•˜์˜€์„ ๋•Œ, ๊ฐ€์ค‘์น˜ $k$ ๊ฐ’์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”๋ฅผ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์‹ (1)์— ๋”ฐ๋ผ $k=0$์€ VFL์„ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ์ด๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, $k=0$์ผ ๋•Œ ๊ฐ€์žฅ ๋†’์€ AP๋ฅผ ๊ธฐ๋กํ•˜์˜€์œผ๋ฉฐ, $k$๊ฐ’์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์ „์ฒด AP๊ฐ€ ์ง€์†์ ์œผ๋กœ ๊ฐ์†Œํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ํ•™์Šต ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ๋Š” SSL์ด ์•ˆ์ •์ ์ธ ์ตœ์ ํ™”๋ฅผ ์œ ์ง€ํ•˜์ง€ ๋ชปํ•ด ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์ด ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š์Œ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ด๋Š” ์ •์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์ด ํ•™์Šต ๋‹จ๊ณ„๋ณ„ ์ตœ์ ํ™” ์š”๊ตฌ๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜์—ฌ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ตฌ์กฐ์  ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง„๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.

ํ‘œ 1. ์ •์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์—์„œ ๊ฐ€์ค‘์น˜ $k$ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ํ•™์Šต ์„ฑ๋Šฅ ๋น„๊ต

Table 1. Performance comparison according to static weight $k$ in the training epoch

$k$ $AP$ $AP_r$ $AP_c$ $AP_f$
0.0 34.1 18.6 31.6 43.6
0.1 32.0 15.5 30.7 40.7
0.3 31.0 16.2 28.7 40.1
0.5 28.4 12.8 26.9 36.9
0.7 26.5 10.4 26.3 33.9

์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆํ•œ ๋™์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์˜ ํšจ๊ณผ๋ฅผ ๋‹ค์–‘ํ•œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๋ถ„์„ํ•˜์˜€๋‹ค. ํ•™์Šต ๊ณผ์ •์—์„œ SSL์˜ ๊ธฐ์—ฌ๋„๋ฅผ ์กฐ์ ˆํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆผ 1์˜ ๋„ค ๊ฐ€์ง€ ํ˜•ํƒœ์˜ ์Šค์ผ€์ค„๋ง ํ•จ์ˆ˜๋ฅผ ์‹คํ—˜์— ์ ์šฉํ•˜์˜€์œผ๋ฉฐ, ํ•ด๋‹น ๋น„๊ต ๊ฒฐ๊ณผ๋Š” ํ‘œ 2์— ์ •๋ฆฌ๋˜์–ด ์žˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, tanh ํ•จ์ˆ˜๋Š” ํฌ์†Œ ํด๋ž˜์Šค์—์„œ ๊ฐ€์žฅ ๋†’์€ 25.1 AP๋ฅผ ๊ธฐ๋กํ•˜์˜€์œผ๋‚˜, ๋นˆ์ถœ ํด๋ž˜์Šค ์„ฑ๋Šฅ์ด 37.1๋กœ ํ•˜๋ฝํ•˜์—ฌ ์ „์ฒด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜์—ˆ์Œ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋ฉด linear ํ•จ์ˆ˜๋Š” ๋นˆ์ถœ ํด๋ž˜์Šค์—์„œ 40.5 AP๋กœ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ, ํฌ์†Œ ํด๋ž˜์Šค์—์„œ ์„ฑ๋Šฅ์ด ์ƒ๋Œ€์ ์œผ๋กœ ๊ฐ์†Œํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ์Šค์ผ€์ค„๋ง ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ฐ€ํŒŒ๋ฅผ์ˆ˜๋ก ํฌ์†Œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ณด์ • ํšจ๊ณผ๋Š” ์ปค์ง€์ง€๋งŒ, ๋นˆ์ถœ ํด๋ž˜์Šค์˜ ์„ฑ๋Šฅ์ด ๊ฐ์†Œํ•˜๋Š” trade-off ๊ด€๊ณ„๊ฐ€ ์กด์žฌํ•จ์„ ์˜๋ฏธํ•œ๋‹ค. ์œ„์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉํ–ˆ์„ ๋•Œ, piece-wise ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ํฌ์†Œ ๋ฐ ๋นˆ์ถœ ํด๋ž˜์Šค ๋ชจ๋‘์—์„œ ๊ฐ€์žฅ ์•ˆ์ •์ ์ด๊ณ  ๊ท ํ˜•์ ์ธ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ์Šค์ผ€์ค„๋ง ์ „๋žต์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค.

ํ‘œ 2. $k$ ์Šค์ผ€์ฅด๋Ÿฌ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ ๋น„๊ต

Table 2. Performance comparison of $k$-scheduling methods

Method $AP$ $AP_r$ $AP_c$ $AP_f$
Piece-wise 33.3 22.3 31.5 40.4
Cosine 32.5 20.9 30.5 39.7
Tanh 32.5 25.1 31.1 37.1
Linear 33.1 21.0 31.2 40.5

์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ํ‘œ 3์€ RT-DETRv2 ๋ชจ๋ธ์„ ๋Œ€์ƒ์œผ๋กœ ๋‹ค์–‘ํ•œ ์†์‹ค ํ•จ์ˆ˜ ํ•™์Šต ๋ฐฉ์‹์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋จผ์ € VFL๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ๊ฒฝ์šฐ ์ „์ฒด AP๋Š” 34.1์ด๋ฉฐ ํฌ์†Œ, ๋ณดํ†ต, ๋นˆ์ถœ ํด๋ž˜์Šค์—์„œ ๊ฐ๊ฐ 18.6, 31.6, 43.6์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์ดํ›„, SSL๋ฅผ ๋‹จ์ˆœ ๋Œ€์ฒดํ•˜์—ฌ ํ•™์Šตํ•œ ์„ค์ •์—์„œ๋Š” IoU ์ธ์‹ ๋ถ„๋ฅ˜๊ฐ€ ์œ ์ง€๋˜์ง€ ์•Š์•„ ์ „์ฒด์ ์œผ๋กœ mAP ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Š” VFL๋ฅผ ์ œ๊ฑฐํ•  ๊ฒฝ์šฐ ์ง€์—ญํ™” ์ •๋ณด์™€ ๋ถ„๋ฅ˜ ์‹ ํ˜ธ๊ฐ€ ์ผ์น˜ํ•˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์€ ๋‘ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ๋™์‹œ์— ๋ฐ˜์˜ํ•จ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ธฐ์กด VFL ํ•™์Šต ๋Œ€๋น„ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜์—ˆ๋‹ค. ์ด๋Š” ํ•™์Šต ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ SSL์˜ ํ•™์Šต ๋…ธ์ด์ฆˆ๊ฐ€ VFL์˜ IoU ์ธ์‹ ๋ถ„๋ฅ˜ ์ตœ์ ํ™”๋ฅผ ๋ฐฉํ•ดํ•˜์—ฌ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋ฐ˜๋ฉด, ๋™์  ๊ฒฐํ•ฉ ๋ฐฉ์‹์œผ๋กœ ์ ์šฉํ•œ ๋ชจ๋ธ์€ ์ „์ฒด mAP๋Š” 33.3์œผ๋กœ ๊ฐ์†Œํ•˜์˜€์Œ์—๋„ ํฌ์†Œ ํด๋ž˜์Šค AP๊ฐ€ 22.3์œผ๋กœ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ์ด๋Š” ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์ด ํ•™์Šต ์ดˆ๊ธฐ์— ์ง€์—ญํ™” ์ค‘์‹ฌ ํ•™์Šต์„ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ดํ›„ ๋‹จ๊ณ„์—์„œ SSL ๊ธฐ๋ฐ˜์˜ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ณด์ •์„ ์ ์ง„์ ์œผ๋กœ ํ™œ์„ฑํ™”ํ•จ์œผ๋กœ์จ ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ์˜ ํ…Œ์ผ ํด๋ž˜์Šค์˜ ์„ฑ๋Šฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ–ˆ์Œ์„ ๋ณด์˜€๋‹ค.

ํ‘œ 3. ์†์‹ค ํ•จ์ˆ˜ ๊ฒฐํ•ฉ ๋ฐฉ์‹์— ๋Œ€ํ•œ ์„ฑ๋Šฅ ๋น„๊ต

Table 3. Performance comparison of loss combination methods

Method $AP$ $AP_r$ $AP_c$ $AP_f$
Single (VFL) [10] 34.1 18.6 31.6 43.6
Single (SSL) [11] 18.5 13.3 17.4 21.9
Static ($k=0.1$) 32.0 15.5 30.7 40.7
Dynamic (piece-wise) 33.3 22.3 31.5 40.4

๋˜ํ•œ ํŽธํ–ฅ ์™„ํ™” ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ํด๋ž˜์Šค ๊ฐ„ ๋ถ„ํฌ๋ฅผ ๋ฐ˜์˜ํ•œ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜๋Š” ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ํ‘œ 4๋Š” IRFS(instance-aware repeat factor sampling)[18]์˜ ์„ฑ๋Šฅ ์ธก์ • ๊ฒฐ๊ณผ๋กœ, ์ „์ฒด mAP๊ฐ€ 33.3์—์„œ 34.3์œผ๋กœ ์ƒ์Šนํ•˜์˜€์œผ๋ฉฐ, ํฌ์†Œ ํด๋ž˜์Šค AP ๋˜ํ•œ 22.3์—์„œ 24.5๋กœ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ํ•ด๋‹น ๊ฒฐ๊ณผ๋Š” ๋™์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ• ์ ์šฉ ์‹œ ์ผ์‹œ์ ์œผ๋กœ ํ•˜๋ฝํ–ˆ๋˜ ์ „์ฒด mAP ์„ฑ๋Šฅ์ด IRFS ๊ธฐ๋ฒ•์˜ ๊ฒฐํ•ฉ์„ ํ†ตํ•ด ๊ฐœ์„ ๋˜์—ˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค. ์ด๋Š” IRFS๊ฐ€ SSL์˜ ํ…Œ์ผ ํด๋ž˜์Šค ๋ณด์ • ํšจ๊ณผ๋ฅผ ๋ณด์™„ํ•˜์—ฌ, ๋‘ ๊ธฐ๋ฒ•์ด ์ƒํ˜ธ ๋ณด์™„์ ์œผ๋กœ ์ž‘์šฉํ•จ์œผ๋กœ์จ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ๋”์šฑ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

ํ‘œ 4. ๋กฑ-ํ…Œ์ผ ์™„ํ™” ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์„ฑ๋Šฅ ๋น„๊ต

Table 4. Performance comparison of long-tail mitigation methods

Dynamic IRFS[18] $AP$ $AP_r$ $AP_c$ $AP_f$
34.1 18.6 31.6 43.6
โœ“ 33.3 22.3 31.5 40.4
โœ“ โœ“ 34.3 24.5 32.6 40.4

๋งˆ์ง€๋ง‰์œผ๋กœ ๊ทธ๋ฆผ 2๋Š” ๊ธฐ์กด VFL๋กœ ํ•™์Šต๋œ RT-DETRv2 ๋ชจ๋ธ๊ณผ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๊ฒฐํ•ฉ ๋ชจ๋ธ์˜ ํฌ์†Œ ํด๋ž˜์Šค์˜ ์ธ์‹ ์„ฑ๋Šฅ์˜ ์ •์„ฑ์  ๋น„๊ต๋กœ, ํฌ์†Œ ํด๋ž˜์Šค์ธ heron๊ณผ martini ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ธ์‹ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค. ๊ธฐ์กด VFL ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ํฌ์†Œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ๊ฒŒ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ, ์ •์  ๊ฒฐํ•ฉ ๋ฐฉ์‹ ๋˜ํ•œ SSL์˜ ์ดˆ๊ธฐ ๋ถˆ์•ˆ์ •์„ฑ์œผ๋กœ ์ธํ•ด ๊ฐ์ฒด์˜ ์‹ ๋ขฐ๋„๊ฐ€ ์ถฉ๋ถ„ํžˆ ํ™•๋ณด๋˜์ง€ ๋ชปํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์ธ๋‹ค. ์ด๋Š” VFL์ด ์ฃผ๋กœ IoU ๊ธฐ๋ฐ˜ ํ’ˆ์งˆ ํ•™์Šต์— ์ง‘์ค‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•์ด ์‹ฌํ•œ ํด๋ž˜์Šค์—์„œ๋Š” ์ถฉ๋ถ„ํ•œ ๋ถ„๋ฅ˜ ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ์ œ๊ณตํ•˜์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ์ง€๋‹Œ๋‹ค. ๋ฐ˜๋ฉด, ์ œ์•ˆํ•œ ๋™์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์€ ํ•™์Šต ์ดˆ๊ธฐ์— ์•ˆ์ •์ ์ธ ์ง€์—ญํ™” ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ ๋’ค, ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ SSL์„ ์ ์ง„์ ์œผ๋กœ ํ™œ์„ฑํ™”ํ•จ์œผ๋กœ์จ ํฌ์†Œ ํด๋ž˜์Šค ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋†’์€ ์‹ ๋ขฐ๋„๋ฅผ ์ œ๊ณตํ•˜์˜€๋‹ค. ์ด๋Š” ๋™์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์ด ์ง€์—ญํ™” ์ •ํ™•๋„์™€ ํด๋ž˜์Šค ๊ท ํ˜• ํ•™์Šต์„ ์ˆœ์ฐจ์ ์œผ๋กœ ๋‹ฌ์„ฑํ•จ์œผ๋กœ์จ, ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ์˜ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ–ˆ์Œ์„ ์ •์„ฑ์  ๋น„๊ต ๊ฒฐ๊ณผ๋กœ ๋ณด์—ฌ์ค€๋‹ค.

๊ทธ๋ฆผ 2. ํ•™์Šต ๊ธฐ๋ฒ•์— ๋”ฐ๋ฅธ ํ…Œ์ผ ํด๋ž˜์Šค ์ธ์‹ ์„ฑ๋Šฅ ๋น„๊ต

Fig. 2. Comparison of tail-class recognition across different training method

../../Resources/kiee/KIEE.2026.75.6.1400/fig2.png

๊ฒฐ๊ณผ์ ์œผ๋กœ, VFL ๋‹จ๋… ๋ฐฉ์‹์€ ๋นˆ์ถœ ํด๋ž˜์Šค์— ํŽธํ–ฅ๋œ ํ•™์Šต์œผ๋กœ ์ธํ•ด ํฌ์†Œ ํด๋ž˜์Šค์˜ ์ธ์‹ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋ฉฐ, SSL ๋‹จ๋… ๋Œ€์ฒด๋Š” IoU ์ธ์‹ ๋ถ„๋ฅ˜๊ฐ€ ์œ ์ง€๋˜์ง€ ์•Š์•„ ์ „์ฒด ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ํ•˜๋ฝํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ๋ณด์˜€๋‹ค. ๋ฐ˜๋ฉด, ์ œ์•ˆํ•œ ๋™์  ์†์‹ค ํ•จ์ˆ˜ ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์€ ์ด๋Ÿฌํ•œ ๊ธฐ์กด ๋ฐฉ์‹๋“ค์˜ ํ•œ๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•˜์—ฌ, ๋กฑ-ํ…Œ์ผ ๊ฐ์ฒด ์ธ์‹์—์„œ ์ง€์—ญํ™” ์ •ํ™•๋„์™€ ํด๋ž˜์Šค ๊ท ํ˜• ํ•™์Šต์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์šฐ์ˆ˜ํ•œ ๋ฐฉ๋ฒ•์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.

5. ๊ฒฐ ๋ก 

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” RT-DETRv2 ๋ชจ๋ธ์˜ ๋กฑ-ํ…Œ์ผ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด VFL๊ณผ SSL์„ ๋™์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆํ•œ ๋ฐฉ์‹์€ ์ดˆ๊ธฐ ํ•™์Šต ๋‹จ๊ณ„์—์„œ IoU ์ธ์‹ ๋ถ„๋ฅ˜์˜ ์•ˆ์ •์„ฑ์„ ์œ ์ง€ํ•˜๊ณ , ํ•™์Šต ํ›„๋ฐ˜๋ถ€์—๋Š” SSL์˜ ๋ณด์ • ํšจ๊ณผ๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋ฐ˜์˜ํ•˜์—ฌ ๋กฑ-ํ…Œ์ผ ํŽธํ–ฅ ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆํ•œ ๋™์  ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•์€ ๋ถ„๋ฅ˜๊ณผ ์ง€์—ญํ™” ๊ฐ„์˜ ๊ท ํ˜•์„ ์œ ์ง€ํ•˜๋ฉด์„œ ํฌ์†Œ ํด๋ž˜์Šค์˜ ์„ฑ๋Šฅ์„ ์œ ์˜๋ฏธํ•˜๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ๋˜ํ•œ, IRFS์™€ ๊ฐ™์€ ์ถ”๊ฐ€์ ์ธ ํŽธํ–ฅ ์™„ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํด๋ž˜์Šค ๊ฐ„ ์„ฑ๋Šฅ ํŽธํ–ฅ์„ ๋”์šฑ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์€ ์–ด๋Œ‘ํ„ฐ(adaptor)์™€ ๊ฐ™์€ ๋‹ค๋ฅธ ๋กฑ-ํ…Œ์ผ ์™„ํ™” ๊ธฐ๋ฒ•๊ณผ์˜ ๊ฒฐํ•ฉํ•˜์—ฌ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๊ณ ์ž ํ•œ๋‹ค. ๋˜ํ•œ, ๋นˆ์ถœ ํด๋ž˜์Šค์˜ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ณ  ์ „์ฒด ์„ฑ๋Šฅ ๊ท ํ˜•์„ ์ตœ์ ํ™”ํ•  ์˜ˆ์ •์ด๋‹ค.

Acknowledgements

๋ณธ ๋…ผ๋ฌธ์€ ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ํ•œ๊ตญ์—ฐ๊ตฌ์žฌ๋‹จ์˜ ์ง€์›์„ ๋ฐ›์•„ ์—ฐ๊ตฌ๋˜์—ˆ์Œ(No. RS-2023-00251621). ๋ณธ ๊ณผ์ œ(๊ฒฐ๊ณผ๋ฌผ)๋Š” ๊ต์œก๋ถ€์™€ ๋ถ€์‚ฐ๊ด‘์—ญ์‹œ์˜ ์žฌ์›์œผ๋กœ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ๋ถ€์‚ฐํ˜• ์ง€์—ญํ˜์‹ ์ค‘์‹ฌ ๋Œ€ํ•™์ง€์›์ฒด๊ณ„(RISE)์˜ ์—ฐ๊ตฌ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. (2025-RISE-02-002-003). ๋ณธ ๋…ผ๋ฌธ์€ 2023๋…„ ์ •๋ถ€(๋ฐฉ์œ„์‚ฌ์—…์ฒญ)์˜ ์žฌ์›์œผ๋กœ ๊ตญ๋ฐฉ๊ธฐ์ˆ ์ง„ํฅ์—ฐ๊ตฌ์†Œ์˜ ์ง€์›์„ ๋ฐ›์•ˆ ์—ฐ๊ตฌ์ž„(KRIT-CT-23-021).

References

1 
P. Ge, M. Wan, W. Qian, Y. Xu, X. Kong, G. Gu, "SGA-YOLO: A lightweight real-time object detection network for UAV infrared images," to appear at IEEE Transactions on Intelligent Transportation Systems, 2025. DOI
2 
X. Hua, X. Wang, D. Wang, J. Huang, X. Hu, "Military object real-time detection technology combined with visual salience and psychology," Electronics, vol. 7, no. 10, pp. 216, 2018. DOI
3 
H. Zhang, K. Liu, Z. Gan, G. N. Zhu, 2501.01855, "UAV-DETR: efficient end-to-end object detection for unmanned aerial vehicle imagery," arXiv, 2025. Google Search
4 
Q. Wu, X. Li, K. Wang, H. Bilal, "Regional feature fusion for on-road detection of objects using camera and 3D-LiDAR in high-speed autonomous vehicles," Soft Computing, vol. 27, no. 23, pp. 18195-18213, 2023. DOI
5 
Y. Zhang, B. Kang, B. Hooi, S. Yan, J. Feng, "Deep long-tailed learning: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10795-10816, 2023. DOI
6 
Y. Cui, M. Jia, T. Y. Lin, Y. Song, S. Belongie, "Class-balanced loss based on effective number of samples," pp. 9268-9277, 2019. Google Search
7 
K. Oksuz, B. C. Cam, S. Kalkan, E. Akbas, "Imbalance problems in object detection: A review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3388-3415, 2020. Google Search
8 
K. Cao, C. Wei, A. Gaidon, N. Arechiga, T. Ma, "Learning imbalanced datasets with label-distribution-aware margin loss," 2019. Google Search
9 
Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen, "DETRs beat YOLOs on real-time object detection," pp. 16965-16974, 2024. Google Search
10 
H. Zhang, Y. Wang, F. Dayoub, N. Sunderhauf, "VarifocalNet: An iou-aware dense object detector," pp. 8514-8523, 2021. Google Search
11 
J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, D. Lin, "Seesaw loss for long-tailed instance segmentation," pp. 9695-9704, 2021. Google Search
12 
W. Lv, Y. Zhao, Q. Chang, K. Huang, G. Wang, Y. Liu, 2407.17140, "RT-DETRv2: Improved baseline with bag-of-freebies for real-time detection transformer," arXiv, 2024. Google Search
13 
J. Tan, C. Wang, B. Li, Q. Li, W. Ouyang, C. Yin, J. Yan, "Equalization loss for long-tailed object recognition," pp. 11662-11671, 2020. Google Search
14 
B. Li, Y. Yao, J. Tan, G. Zhang, F. Yu, J. Lu, Y. Luo, "Equalized focal loss for dense long-tailed object detection," pp. 6990-6999, 2022. Google Search
15 
X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, J. Yang, "Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection," Advances in Neural Information Processing Systems, vol. 33, pp. 21002-21012, 2020. Google Search
16 
R. Khanarm, M. Hussain, 2410.17725, "YOLOv11: An overview of the key architectural enhancements," arXiv, 2024. Google Search
17 
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg, "SSD: Single shot multibox detector," Springer International Publishing, Cham, pp. 21-37, 2016. Google Search
18 
A. Gupta, P. Dollar, R. Girshick, "LVIS: A dataset for large vocabulary instance segmentation," pp. 5356-5364, 2019. Google Search
19 
B. Yaman, T. Mahmud, C. H. Liu, 2305.08069, "Instance-aware repeat factor sampling for long-tailed object detection," arXiv, 2023. Google Search

์ €์ž์†Œ๊ฐœ

๊น€์ •ํ˜„ (Jeonghyeon Kim)
../../Resources/kiee/KIEE.2026.75.6.1400/au1.png

He received the B.S. degree in the department of electronics and electrical engineering at Dankook University, in 2025. Currently, he is working toward the M.S. degree in the school of electronics and electrical engineering at Dankook University.

E-mail: jeongh@dankook.ac.kr

๊น€ํ•œ์†” (Han Sol Kim)
../../Resources/kiee/KIEE.2026.75.6.1400/au2.png

He received his B.S. in Electronics and Computer Engineering from Hanyang University, in 2011 and M.S. and Ph.D. in Electrical and Electronic Engineering from Yonsei University, in 2012 and 2018. He was a Senior Engineer at Samsung Electronics, in 2018 and 2019 and an Associate Professor at Korea Maritime and Ocean University, in 2019 and 2023. Since 2023, he has been with Dankook University.

E-mail: hansol@dankook.ac.kr

์ด์ฐฝ์€ (Changeun Lee)
../../Resources/kiee/KIEE.2026.75.6.1400/au3.png

He received his BS and MS degrees in electronics engineering from Hanyang University, Rep. of Korea, in 1996 and 1998, respectively, and Ph. D in information and communication engineering from Chungnam National University, Rep. of Korea, in 2017. From 1998 to 2000, he was a researcher at LG Industry System, Rep. of Korea, where he worked on intelligent building automation systems. Since 2001, he has been with Electronics and Telecommunications Research Institute(ETRI) Rep. of Korea, where he conducted research in the fields of intelligent robot systems and military artificial intelligence. His primary research interests are artificial intelligence, robot software frameworks, and distributed and cooperative unmanned systems.

E-mail: celee@etri.re.kr

์ด๊ด‘์ผ (Kwangil Lee)
../../Resources/kiee/KIEE.2026.75.6.1400/au4.png

He received the B.S., M.S., and Ph.D. degrees in computer science from Chungnam NationalUniversity, Daejeon, South Korea, in 1993, 1996, and 2001, respectively. He was a Senior Engineer with the Electronics and Telecommunications Research Institute, Daejeon, from 2006 to 2017. Since 2017, he has beenwith the Department of Artificial Intelligence, National Korea Maritime and Ocean University, Busan, South Korea, where he is currentlyan Associate Professor. His current research interestsinclude smart ship, e-navigation, and maritime cyber security.

E-mail: leeki@kmou.ac.kr