Title |
Monosyllable Speech Recognition through Facial Movement Analysis |
Authors |
강동원(Kang, Dong-Won) ; 서정우(Seo, Jeong-Woo) ; 최진승(Choi, Jin-Seung) ; 최재봉(Choi, Jae-Bong) ; 탁계래(Tack, Gye-Rae) |
DOI |
https://doi.org/10.5370/KIEE.2014.63.6.813 |
Keywords |
3-D motion capture system ; Facial motion ; Hidden Markov model ; Speech recognition |
Abstract |
The purpose of this study was to extract accurate parameters of facial movement features using 3-D motion capture system in speech recognition technology through lip-reading. Instead of using the features obtained through traditional camera image, the 3-D motion system was used to obtain quantitative data for actual facial movements, and to analyze 11 variables that exhibit particular patterns such as nose, lip, jaw and cheek movements in monosyllable vocalizations. Fourteen subjects, all in 20s of age, were asked to vocalize 11 types of Korean vowel monosyllables for three times with 36 reflective markers on their faces. The obtained facial movement data were then calculated into 11 parameters and presented as patterns for each monosyllable vocalization. The parameter patterns were performed through learning and recognizing process for each monosyllable with speech recognition algorithms with Hidden Markov Model (HMM) and Viterbi algorithm. The accuracy rate of 11 monosyllables recognition was 97.2%, which suggests the possibility of voice recognition of Korean language through quantitative facial movement analysis. |