Search | Korea Science

The Font Recognition of Printed Hangul Documents (인쇄된 한글 문서의 폰트 인식)

Park, Moon-Ho;Shon, Young-Woo;Kim, Seok-Tae;Namkung, Jae-Chan
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.8
- /
- pp.2017-2024
- /
- 1997
The main focus of this paper is the recognition of printed Hangul documents in terms of typeface, character size and character slope for IICS(Intelligent Image Communication System). The fixed-size blocks extracted from documents are analyzed in frequency domain for the typeface classification. The vertical pixel counts and projection profile of bounding box are used for the character size classification and the character slope classification, respectively. The MLP with variable hidden nodes and error back-propagation algorithm is used as typeface classifier, and Mahalanobis distance is used to classify the character size and slope. The experimental results demonstrated the usefulness of proposed system with the mean rate of 95.19% in typeface classification. 97.34% in character size classification, and 89.09% in character slope classification.
PDF

Semi-supervised learning of speech recognizers based on variational autoencoder and unsupervised data augmentation (변분 오토인코더와 비교사 데이터 증강을 이용한 음성인식기 준지도 학습)

Jo, Hyeon Ho;Kang, Byung Ok;Kwon, Oh-Wook
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.6
- /
- pp.578-586
- /
- 2021
We propose a semi-supervised learning method based on Variational AutoEncoder (VAE) and Unsupervised Data Augmentation (UDA) to improve the performance of an end-to-end speech recognizer. In the proposed method, first, the VAE-based augmentation model and the baseline end-to-end speech recognizer are trained using the original speech data. Then, the baseline end-to-end speech recognizer is trained again using data augmented from the learned augmentation model. Finally, the learned augmentation model and end-to-end speech recognizer are re-learned using the UDA-based semi-supervised learning method. As a result of the computer simulation, the augmentation model is shown to improve the Word Error Rate (WER) of the baseline end-to-end speech recognizer, and further improve its performance by combining it with the UDA-based learning method.
https://doi.org/10.7776/ASK.2021.40.6.578 인용 PDF KSCI

A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Google, Naver, and Daum KAKAO APIs (구글, 네이버, 다음 카카오 API 활용앱의 표준어 및 방언 음성인식 기초 성능평가)

Roh, Hee-Kyung;Lee, Kang-Hee
- Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
- /
- v.7 no.12
- /
- pp.819-829
- /
- 2017
In this paper, we describe the current state of speech recognition technology and identify the basic speech recognition technology and algorithms first, and then explain the code flow of API necessary for speech recognition technology. We use the application programming interface (API) of Google, Naver, and Daum KaKao, which have the most famous search engine among the speech recognition APIs, to create a voice recognition app in the Android studio tool. Then, we perform a speech recognition experiment on people's standard words and dialects according to gender, age, and region, and then organize the recognition rates into a table. Experiments were conducted on the Gyeongsang-do, Chungcheong-do, and Jeolla-do provinces where the degree of tongues was severe. And Comparative experiments were also conducted on standardized dialects. Based on the resultant sentences, the accuracy of the sentence is checked based on spacing of words, final consonant, postposition, and words and the number of each error is represented by a number. As a result, we aim to introduce the advantages of each API according to the speech recognition rate, and to establish a basic framework for the most efficient use.
https://doi.org/10.14257/ajmahs.2017.12.22 인용

P300 speller using a new stimulus presentation paradigm (새로운 자극제시방법을 사용한 P300 문자입력기)

Eom, Jin-Sup;Yang, Hye-Ryeon;Park, Mi-Sook;Sohn, Jin-Hun
- Science of Emotion and Sensibility
- /
- v.16 no.1
- /
- pp.107-116
- /
- 2013
In the implementation of a P300 speller, rows and columns paradigm (RCP) is most commonly used. However, the RCP remains subject to adjacency-distraction error and double-flash problems. This study suggests a novel P300 speller stimuli presentation-the sub-block paradigm (SBP) that is likely to solve the problems effectively. Fifteen subjects participated in this experiment where both SBP and RCP were used to implement the P300 speller. Electroencephalography (EEG) activity was recorded from Fz, Cz, Pz, Oz, P3, P4, PO7, and PO8. Each paradigm consisted of a training phase to train a classifier and a testing phase to evaluate the speller. Eighteen characters were used for the target stimuli in the training phase. Additionally, 5 subjects were required to spell 50 characters and the rest of the subjects were to spell 25 characters in the testing phase. Classification accuracy results show that average accuracy was significantly higher in SBP as of 83.73% than that of RCP as of 66.40%. Grand mean event-related potentials (ERPs) at Pz show that positive peak amplitude for the target stimuli was greater in SBP compared to that of RCP. It was found that subjects tended to attend more to the characters in SBP. According to the participants' ratings on how comfortable they were with using each type of paradigm on 7-point Likert scale, most subjects responded 'very difficult' in RCP while responding 'medium' and 'easy' in SBP. The result showed that SBP was felt more comfortable than RCP by the subjects. In sum, the SBP was more correct in P300 speller performance as well as more convenient for users than the RCP. The actual limitations in the study were discussed in the last part of this paper.
PDF

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
- Journal of Intelligence and Information Systems
- /
- v.23 no.2
- /
- pp.71-88
- /
- 2017
Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.
https://doi.org/10.13088/jiis.2017.23.2.071 인용 PDF KSCI

재료 동적영향을 고려한 주냉각재 배관 LBB 적용시 Dynamic Strain Aging의 영향 분석

양준석;박치용;정우태;유기완;김진원
- Proceedings of the Korean Nuclear Society Conference
- /
- 1998.05b
- /
- pp.305-311
- /
- 1998
최근들어 고려된 LBB(Leak Before Break) 적용요건중 동적파괴시힘 절차에는 울진 3&4호기 이후 파단전누설개념이 적용되는 배관이 탄소강으로 제작될 경우. 이 배관이 Dynamic Strain Aging (DSA)에 의해 파괴저항치가 감소되지 않는다는 것이 정량적으로 입증되지 않는 한, 동 배관의 파괴 물성치 결정시 DSA의 영향이 고려되어야 하며, DSA 영향을 평가하기 위해서는 동적과괴시험이 수행되어야 함을 요건화 하고 있다. 본 연구에서는 DSA 효과에 의한 파괴저항(J-R) 특성의 저하가차세대원전 원자로냉각재배관 파단전누설개넘(LBB) 적용시 설계 안전여유도에 영향을 미치지 않는 정도임을 평가하는데 있다. 따라서 ASME Section III에서 탄소강으로 분류하고 있는 강종별 파괴인성 변화를 고찰하고, 차세대원전 주냉각재배관 재료인 SA508 Class la의 최대 파괴인성 감소치를 예측하여, 울진 3&4호기에서 측정된 엘보우용 SA516-Gr.70 강의 DSA 영향 평가 결과와 비교 분석하여 차세대원전 주냉각재배관의 DSA영향을 평가하였다. 도출된 결론으로는 DSA 영향을 고려한 SA508 Class la의 J 및 dJ/dA 값은 극히 보수적으로 추정할 때 50% 이상 감소하는 것으로 예측된다. 이러한 DSA 영향을 고려하였을 경우 배관재 모재의 파괴인성치는 Weld-SAW의 J/T 값 수준으로 감소하였다. 그러나 현 LRB 해석이 가장 낮은 J/T값을 갖는 Weld-SAW Auto의 균열길이 2a인 J/T선도에 의거하여 수행되고 있다는 점을 고려한다면 비록 DSA가 배관재에 영향을 주는 가장 보수적인 값(J 및 dJ/dA값을 50% 이상)을 사용한다고 하더라도 차세대원전 LBB 적용에 문제가 되지 않음을 알 수 있다. 즉 차세대원자로 주냉각재배관에 LBB를 적용하는데는 DSA 영향은 상대적으로 중요하지 않다는 결론을 얻었다. 표면에 수소화물이 농축되어 있는 hydride layer가 형성됨을 관찰하였으며 ～5,000ppm 이상의 경우에는 수소화물의 방향성이 random하였으며 특히, ZIRLO$^{TM}$ 시편의 경우에서는 원주방향으로 길게 이어진 수소화물과 기계적 성질에 치명적인 반경방향의 수소화물이 평행하게 배열된 것을 관찰하였다.하였을 때는 Li$_2$O의 첨가에 의해 치밀화가 주로 일어났고, 반면에 $N_2$-7vol.%H$_2$ 분위기에서 소결하면 Li$_2$O의 첨가에 의해 작은 기공은 소멸되고 큰 기공이 생성되었다.지나치게 모국어의 영향만 강조하고 다른 요인들에 대해서는 다분히 추상적인 언급으로 끝났지만 이 분석을 통 해서 배경어, 목표어, 특히 중간규칙의 역할이 괄목할 만한 것임을 가시적으로 관찰할 수 있 다. 이와 같은 오류분석 방법은 학습자의 모국어 및 관련 외국어의 음운규칙만 알면 어느 학습대상 외국어에라도 적용할 수 있는 보편성을 지니는 것으로 사료된다.없다. 그렇다면 겹의문사를 [-wh]의리를 지 닌 의문사의 병렬로 분석할 수 없다. 예를 들어 누구누구를 [주구-이-ν가] [누구누구-이- ν가]로부터 생성되었다고 볼 수 없다. 그러므로 [-wh] 겹의문사는 복수 의미를 지닐 수 없 다. 그러면 단수 의미는 어떻게 생성되는가\ulcorner 본 논문에서는 표면적 형태에도 불구하고 [-wh]의미의 겹의문사는 병렬적 관계의 합성어가 아니라 내부구조를 지니지 않은 단순한 단어(minimal $X^{0}$ elements)로 가정한다. 즉, [+wh] 의미의 겹의문사는 동일한 구성요 소를 지닌 병렬적 합성어([$[W1]_{XO-}$ $[W1]_{XO}$ ]$_{XO}$)로
PDF

A Network Analysis of the Research Trends in Fingerprints in Korea (네트워크 분석을 활용한 국내 지문인식연구의 동향분석)

Jung, Jinhyo;Lee, Chang-Moo
- Convergence Security Journal
- /
- v.17 no.1
- /
- pp.15-30
- /
- 2017
Since the 1990s, fingerprint recognition has attracted much attention among scholars. There have been numerous studies on fingerprint recognition. However, most of the academic papers have focused mainly on how to make a technical advance of fingerprint recognition. there has been no significant output in the analysis of the research trends in fingerprint recognition. It's essential part to describe the overall structure of fingerprint recognition to make further studies much more efficient and effective. To this end, the primary purpose of this article is to deliver an overview of the research trends on fingerprint recognition based on network analysis. This study analyzed abstracts of the 122 academic journals ranging from 1990 to 2015. For gathering those data, the author took advantage of an academic searchable data base-RISS. After collecting abstracts, cleaning process was carried out and key words were selected by using Krwords and R; co-occurrence symmetric matrix made up of key words was created by Ktitle; and Netminer was employed to analyze closeness centrality. The result achieved from this work included followings: research trends in fingerprint recognition from 1990 to 2000, 2001 to 2005, 2006 to 2010, and 2011 to 2015.
PDF KSCI

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

Kang, Ae Tti;Kang, Young Ok
- Spatial Information Research
- /
- v.23 no.2
- /
- pp.69-81
- /
- 2015
If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.
https://doi.org/10.12672/ksis.2015.23.2.069 인용 PDF KSCI

Acoustic analysis of Korean affricates produced by dysarthric speakers with cerebral palsy (뇌성마비 마비말장애 성인의 파찰음 실현 양상 분석)

Mun, Jihyun;Kim, Sunhee;Chung, Minhwa
- Phonetics and Speech Sciences
- /
- v.13 no.2
- /
- pp.45-55
- /
- 2021
This study aims to analyze the acoustic characteristics of Korean affricates produced by dysarthric speakers with cerebral palsy. Korean fricatives and affricates are the consonants that are prone to errors in dysarthric speech, but previous studies have focused only on fricatives. For this study, three affricates /tɕ, tɕ^h, ͈tɕ/ appearing at word initial and intervocalic positions produced by six mild-moderate male speakers of spastic dysarthria are selected from a QOLT database constructed in 2014. The parameters representing the acoustic characteristics of Korean affricates were extracted by using Praat: frication duration, closure duration, center of gravity, variance, skewness, kurtosis, and central moment. The results are as follows: 1) frication duration of the intervocalic affricates produced by dysarthric speakers was significantly longer than that of the non-disordered speakers; 2) the closure duration of dysarthric speakers was significantly longer; 3) in the case of the center of gravity, there was no significant difference between the two groups; 4) the skewness of the dysarthric speakers was significantly larger; and 5) the central moment of dysarthric speakers was significantly larger. This study investigated the characteristics of the affricates produced by dysarthric speakers and differences with non-disordered speakers.
https://doi.org/10.13064/KSSS.2021.13.2.045 인용 PDF KSCI

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

Suk, Soo-Young;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.6
- /
- pp.250-258
- /
- 2007
Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.
https://doi.org/10.7776/ASK.2007.26.6.250 인용 PDF KSCI

Search Result 213, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)